A Flurry of Activity in the Audio Codec World Is Increasing Coding Efficiency But Could Confuse Some Broadcasters
Just as broadcasters were beginning to understand MPEG-2 Audio Layer II and Layer III, along comes another flotilla of new “standard” audio codecs. These upstarts promise higher quality at lower bit rates, but can create a lot of confusion and incompatibility in the process.
The latest developments represent a third wave of audio coding, incorporating many new and sophisticated concepts. Yet as the coding becomes more complex, the opportunities for variation multiply as well.
An important change that has accelerated the current development is the relatively recent possibility for running these systems (encoders, decoders and tools) in software on general-purpose computers, as opposed to the original environment that largely required dedicated hardware.
Sorting it out
The latest list of new codecs includes MPEG-4 AAC; AAC-LD; AAC-SBR; and PAC.
Beyond this “standards” list are several proprietary schemes that continue to develop in the o-line world, but the codecs mentioned above are of more concern to the broadcast professional, as they are used in dedicated broadcast contribution and distribution links and/or in digital radio broadcast systems.
This entire area of development dates back to around 1995, when the MPEG audio developer community began work on a second generation of codecs that would attempt to optimize audio quality at lower bit rates than previously thought possible, but without the constraint of remaining backward compatible to previous systems. Thus this early work was called “MPEG-2 NBC” for Non-Backwards Compatible.
The result was the first version of Advanced Audio Coding called MPEG-2 AAC, around in 1997. It could provide MP3 audio quality at about half the bit rate.
Meanwhile, Bell Labs was developing its next-gen codec (it had worked with Fraunhofer Gesellschaft and Dolby Labs on MPEG audio codecs). In the midst of this, Lucent Technologies was formed, and it inherited the codec that became known as PAC, for Perceptual Audio Coder.
All of the latest variations of audio codecs can trace their pedigree to one of these two main branches.
Variations on a theme
Earlier codecs were impressive in their ability to present high-quality sound at dramatically reduced bit rates compared to the original PCM signal.
Data rate reductions of 80 percent or more were possible without significant degradation. But there were audible artifacts on occasion, and higher compression ratios were thwarted by an excessive amount of these. Therefore, advanced codec design concentrated on ways to reduce or eliminate these artifacts.
Such techniques included tweaking of filter banks, improvements in stereo signal analysis for joint coding (i.e., reduction of redundant discrete coding of each channel for stereo or multichannel signals where similar audio existed in two or more channels), noise shaping, prediction, coding techniques themselves and bit-stream multiplexing of outputs.
Another fundamental factor in all codec design is the selection of block size. Because these perceptual algorithms rely on analysis of the instantaneous audio spectrum to determine the masking characteristics of the moment, a group of consecutive audio samples must be analyzed. (A single sample does not define a spectrum; the frequency content of sound is determined by the rate of change between samples.)
The longer the series of samples, the more accurate the spectral determination will be. But such long blocks increase the latency of the codec, and worsen its ability to react to transients, which results in temporal smearing of sound – one of the more obnoxious audible artifacts of perceptual codecs.
So advanced codec design abandons the search for the perfect single compromise on block size, and uses multiple block sizes, which can be alternately used when the sound warrants (i.e., shorter blocks for transient passages).
AAC not only uses short (256 samples) and long (2048 samples) block sizes but also two different types of long blocks (sine-function and Kaiser-Bessel Derived) depending on the spectral density of the sound. Such adaptive optimization results in substantially reduction in audible artifacts, allowing higher data compression ratios.
AAC was first released under the MPEG-2 label in 1997, but subsequent refinements were added and the AAC codec was reestablished with such extensions under the MPEG-4 flag in 1999.
In two-way applications, low delay is an important requirement. Like most codecs, as an MPEG-4 AAC encoder’s selected compression ratio is increased, throughput delay also increases.
For example, at 96 kilobits per second, AAC delay is about 100 ms, while at 24 kbps delay extends to over 300 ms. Added to other latency components in a digital audio transmission path, this can begin to cause problems in two-way communications. Therefore a low-delay (LD) version of AAC was developed and included in the MPEG-4 version of the codec standard.
In contrast to standard MPEG-4 AAC, the AAC-LD variant maintains a constant delay of about 20 ms regardless of compression ratio. The tradeoff is a slight reduction in quality at a given bit rate for AAC-LD compared with AAC.
Another variation on AAC is called AAC-SBR, for Spectral Band Replication. This technology is employed mostly in the decoder, where it improves the high-frequency performance of the system. The SBR decoder section examines the lower-frequency elements of the decoded signal and derives a more accurate representation of the high-frequency elements (both harmonic and noise components), thereby improving the perceived audio quality or effective bandwidth of the system at a given bit rate, when compared to decoding without SBR.
A Swedish/German-based company called Coding Technologies has developed SBR, which has been applied so far to both MPEG-2 Audio Layer III and MPEG-2 AAC coding. The resultant products are known as mp3PRO and CT-aacPlus, respectively.
The latter has been adopted as the codec used by XM Satellite Radio and Digital Radio Mondiale. (XM also adds proprietary pre-processing at the encoding side from Seattle-base Neural Audio, which are claimed to improve spatial imaging and intelligibility.)
Meanwhile, the PAC algorithm has been inherited by Ibiquity Digital Corp., where its development and deployment continues. It is generally considered to be in the same general efficiency vs. quality class as MPEG-2 AAC, and is currently the codec employed in the Ibiquity IBOC and Sirius Satellite Radio systems.
Another area of greater complexity in the audio codec environment involves intellectual property rights. The earlier process of standard development and implementation has given way to a hybridized world of mixing truly open standard codecs with proprietary extensions and deploying them in optimized variations across specific product lines. This trend is likely to continue.
Even in the MPEG standard world, changes are afoot in the way intellectual property rights will be handled for MPEG technologies. For example, proposed licenses for MPEG-4 involve a per-user multiplier for the first time.
It is unlikely that these developments will subside, as more energy and funding are focused on digital audio distribution systems in coming years. The capacity for such systems’ decoders to operate in pure software form also allows downloading of upgrades or wholly new codecs to existing devices, further extending the value of continuing development. XM’s recent switch from PAC to CT-aacPlus is a good example of this.
Anyone who thinks that audio codec development is a mature and stable technology clearly is mistaken.