Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now


Digital Audio Codecs Explained

Coding Is Fundamental Component of Digital Production and Broadcast Systems

Coding Is Fundamental Component of Digital Production and Broadcast Systems

Digital audio codecs are the fundamental enabling technology behind all of today’s emerging sound broadcast and distribution formats. As such, understanding their operation, value and respective attributes can be useful information for broadcast professionals, who are currently faced with important and far-reaching choices in the design of their future production and delivery systems.

The pristine quality of digital audio is well-known, but it comes at a price. The amount of data required to produce such fidelity is high. Consider that the same CD that holds about 1-1/4 hours of digital audio could contain an encyclopedia equivalent to several thousand pages of text and rich images.

This is because in order to produce a clear, clean and quiet signal, the CD format generates a 16-bit message representing the momentary state of each audio channel 44,100 times every second, with a resulting data rate of 1.4 million bits (“megabits”) per second, written as Mbps.

This means that every minute of CD-quality stereo audio occupies about 10 million bytes (megabytes or MB) of data. (You do the math: 16 bits x 2 stereo channels x 44,100 samples/sec ÷ 8 bits/byte x 60 sec/min = 10,584,000 bytes).

This 44.1 kHz sampling and 16-bit quantization process used by the CD format, sometimes called “Red Book” after the appearance of the standard document, has in recent times been referred to as “linear” or “uncompressed” digital audio.

The latter term might give you a hint that all this data isn’t really required by the listener for a satisfying, high-quality sonic experience. Research has shown that if you take a Red Book datastream and rearrange it in just the right way, you could eliminate 80 percent or more of it and most listeners would never notice the missing data. You can throw away most of the bits in the signal, as long as you preserve the ones that count.

Knowing which bits to keep is the role of the coding algorithm – the brains of a digital audio compression system or the “codec” (short for coder-decoder) you’ve heard so much about lately. Their obvious advantage is great savings in data storage or transmission bandwidth requirements, without significant aural penalty.

Working their magic

These codecs apply what is called lossy compression, meaning that the data eliminated during encoding is never recovered. Contrast this to lossless compression, the type used by file-packing systems like PKZIP, WinZip or StuffIt, in which all the bits removed in the encoding process are fully reconstructed during the decoding step.

Rather than simply seeking ways to code redundant bit patterns more efficiently (as lossless compression does), lossy compression instead takes advantage of perceptual shortcomings of the end user, and exploits them to adaptively process data in such a way that substantial data reduction can be applied without noticeable effect.

This implies that lossy systems are designed for a particular type of data being processed (e.g., audio), while lossless compression can be applied to any kind of data file. Note that lossless compression systems typically can only reduce file size by a factor of around 2:1 or 3:1 at best, while lossy systems can reduce high-quality audio data bit rates by 10:1 or more. (Audio turns out to be fairly unforgiving in this respect, due to the high acuity of human hearing. For example, video compression ratios can approach 100:1 while maintaining reasonable quality.)

Lossy audio codecs employ the phenomenon of human hearing called masking, by which louder sounds reduce listener’s ability to hear quieter ones at nearby frequencies and times. The codec is programmed to take advantage of this temporary reduction in audibility by reducing the resolution of the digital audio signal, i.e., assigning fewer bits to each audio sample than the 16 that are always used by Red Book audio, as noted in our equation above.

Doing this will necessarily increase the noise and distortion in the signal, but if the codec places these offending signals into the roving zones of desensitivity created by masking, the degradations will generally remain unnoticed. This so — called noise — shaping technique is the key to the codec’s ability to produce audio quality rivaling the CD at a data rate that would produce very ugly sound using linear coding. So instead of requiring 10 MB/min, hi-fi audio can be produced at less than 1 MB/min – an order of magnitude improvement in coding efficiency.

But like any powerful technology, data compression has its limits in application. A particular concern involves multigenerational effects that may occur when an audio signal is subjected to repeated encode/decode cycles of the same or different codecs along its path from source to end user. For this reason it is important to apply these techniques in moderation and with a holistic system view.

Recent changes

Like all things digital, advances continue to occur in the world of codec development.

These highly enabling technologies have attracted some of the best minds in the business, with many corporations and standards organizations working hard to constantly improve performance. Hence the audio quality possible at a given bit rate keeps increasing.

The target has been to match the perceived audio quality of the CD, and today’s latest codecs aim to do this in the range of 64 kbps (i.e., a reduction ratio of over 20:1). While some audio professionals can hear coding artifacts under certain conditions, the real target is the mainstream listener in typical circumstances. The audibility of coding artifacts also varies with the type of audio content. Perhaps counter-intuitively, it is often voice content — not music — that exposes coding artifacts most blatantly.

Today these low-bit rate or LBR codecs are used for both Internet streaming media on computers and for broadcast or downloaded audio on dedicated consumer electronic hardware (such as MP3 players and satellite radios). The main difference between these approaches is that the former involves a bidirectional connection to a computer device, which allows frequent decoder updates to be downloaded; while the latter addresses dedicated hardware via unidirectional or offline connections, implying that the codec’s decoder is typically “locked down” at the factory.

So the codec choice in the case of consumer electronics hardware is more critical and long-lasting, because any subsequent changes could render legacy devices incompatible.

Note that this does not completely rule out future performance improvements in the dedicated hardware case. While the decoder must remain fixed in such consumer products, the broadcaster’s encoder can be continually tweaked to improve its creation of the compressed signal, and as long as its output remains faithful to the standard format, the end user’s experience can be enhanced using the original decoder. Typically this can allow a 20 — to 30 — percent improvement in quality over the life of the format.

It is also possible for newer consumer devices to improve upon their predecessors’ ability to decode the standard signal, using unilateral, decode — only extensions — just as today’s state-of-the-art, DSP-based FM receivers sound better than earlier models, or Dolby Pro-Logic improves performance over the original Dolby Surround, with no change to the content’s encoding format required.

Among current codecs, such is the technique – called Spectral Band Replication or SBR, developed by Coding Technologies – that differentiates mp3PRO from MP3, or aacPlus from AAC.

Another recent buzz in the industry involves the use of proprietary “pre-processing.” Just like regular broadcast audio processing, this is accomplished via a black box in the broadcaster’s transmission chain, but these processors are intended to prepare the signal in special ways to allow it to survive the codec’s encode and decode processes with higher fidelity. A current example is Neural Audio’s Neustar system, currently in use as a preprocessor to the aacPlus codec used in the XM Satellite Radio system.

Horses for courses

Not all codecs are created equal, so choices are important. There are both technical and business differences among them.

On the technical side, each codec is optimized for a certain target function. Although most operate over a range of bit rates and environments, there is always a “sweet spot” of operation.

This presents a peculiar problem for the current HD Radio system, because its developers intend to use a single codec for both the AM and FM systems, but the bit rates of the two systems are widely divergent. The AM system uses 36 kbps (this rate has also been proposed for secondary FM audio services), which puts it squarely in the “dial-up” range of the online world, while the 96 kbps used in the FM system is considered at the low end of the broadband environment.

Not all codecs can operate optimally over this wide a range, and such is the current problem facing Ibiquity’s PAC codec. The solution may require the addition of a second codec to the system, or the choice (or development) of a codec that can adapt well to the range of data rates required.

Business-wise, the licensing fees that implementers pay to include these codecs in their products also can vary widely.

Some codecs are licensed to implementers by their owners unilaterally – i.e., proprietary systems – while others are handled by licensing authorities employed or established by standards bodies. Standards — based systems are licensed to all implementers on reasonable and non — discriminatory (“RAND”) terms, and the format is generally frozen for a substantial amount of time, while proprietary systems have no such intrinsic guarantees. (Proprietary codec owners may voluntarily elect to operate under these terms, however, and often do.)

On the other hand, it is now common for proprietary systems to offer less-costly licensing terms to implementers. So depending on the application, either approach may have merit.