Show of hands: Who can remember the launch of the CD format?
Yes, there are people working in our industry today who weren’t even born when the CD was introduced. But those with their hands up will remember how the CD format set the table for digital audio in general, with its 16-bit/44.1 kHz-sampled, linear PCM approach becoming the standard format (with 48 and 32 kHz variants).
Let’s call that era “Digital Audio 1.0.”
For a good while thereafter we regarded this as the only way digital audio could ever be done, and we got used to its ~1.5 Mbps data rate and its healthy 10 MB per minute recording appetite. (And that was back when 10 MB really meant something, sonny.)
Like most well-established paradigms, there simply seemed to be no way to improve on it.
Nevertheless, two separate developments eventually changed our minds.
The first was perceptual coding (back then it was usually called “data compression”), which allowed substantial reduction in those data rate and storage requirement figures, without much audible penalty. This was a major breakthrough, and although it initially took dedicated hardware to do it (software codecs for PCs came later), the cost savings and other functionality that it enabled were well worth it. Call this “Digital Audio 2.0.”
Meanwhile, as storage became cheaper and computers got faster, it became practical and cost-effective to go in the opposite direction for studio production — or anywhere the transmission of digital audio wasn’t required.
Here the 16-bit/48 kHz barrier was broken, and higher resolution recording was developed, eventually settling on 24-bit/96 kHz as one common new method for linear PCM mastering, and eventually for distribution on DVD and SACD products.
Other formats of the sort were also developed, including Sony’s Direct Stream Digital (DSD), which essentially did away with the concept of fixing a sampling frequency and resolution in the original encoding. Because all these approaches stem from the original linear PCM encoding root, let’s step backward and call it “Digital Audio 1.5.”
Perceptual coding didn’t stand still either, of course. While the studio environment may not have cared, anyone interested in transmitting digital audio still worried about keeping bandwidth down and quality up.
So following the early days of aptX, Dolby AC-1 and MPEG-1 Audio Layer 2 (MP2), incremental improvements like MP3 and AAC came along, and multichannel variations like AC-3 (Dolby Digital) were added.
By this time, these codecs were commonly available in software form, and numerous other formats were developed specifically for the consumer PC and Internet streaming environment. These too went through numerous upgrades over time.
But a notable departure was made with the development of parametric coding, whereby instructions rather than actual audio samples were encoded. The primary example of this is its use in formats like Coding Technologies’ MP3 Pro or AAC+ (or the latter’s standardized form, MPEG-4 HEAAC), which creates the instructions for recreating harmonics in the uppermost octave of the audio signal from the audio samples of lower frequencies.
Another example is MPEG Surround, where spatial instructions are extracted from a multichannel mix and sent as a small data signal alongside coded stereo audio samples. This parametric concept’s mini-breakthrough allowed another substantial reduction in bit-rate requirements — let’s call it “Digital Audio 2.5.”
As noted, at each of these points, many have been tempted to consider that things are as good as they can get. But they’ve been wrong. Conversely, once the next step is taken, the previous generation soon looks so “last year.”
Witness the DAB format, which standardized early with MP2 as its audio codec, but after a period of “what were they thinking” analysis, has recently added HEAAC and MPEG Surround in its own DAB 2.0 update.
Crawl, walk, run
So much for the history lesson.
Here we are at Digital Audio 2.5, and now most of us realize that there will be a next step. Like most of those that have come before, these advances are driven not simply by academic curiosity but by practical needs.
To wit: Today, much coded audio is sent over the Internet, which is hardly as hospitable a transport as a nice, stable T1 or ISDN line (with their guaranteed quality of service or QoS), or even a fixed wireless or broadcast channel. Collisions, congestion and other bandwidth variations come with the territory on such “best-effort delivery” networks, which were never intended to support real-time streaming services.
Yet today the Internet and other “non-QoS” services are used for a huge and growing amount of last-mile delivery of audio to consumers, and even for an increasing amount of contribution and distribution paths by broadcasters.
It’s the latter that’s generating development of what may become Digital Audio 3.0: The era of the “smart codec.”
Ultimately, this could be broadly deployed in the consumer delivery space, as well, but at the moment it looks like just what broadcasters need for cost-effective remote backhaul — or other real-time audio contribution/distribution applications — from anywhere that reasonably broadband Internet connectivity (wired or wireless) or 3G mobile IP links are available.
This new coding generation builds upon the efficiencies of Digital Audio 2.5-era codecs, but adds adaptive control that continually optimizes the encoding based on its monitoring of instantaneous conditions on the network.
An early example of this approach emerged in the Comrex BRIC system (found in the company’s Access line of products), and more recently a second entrant has come from Telos Systems, with its Zephyr IP (or Z/IP) device. Perhaps others are yet to come.
One way that these systems deal with the high jitter and packet loss that arises from network congestion is through the use of relatively long buffers, but these can add considerable delay (on top of the inherent propagation delay through the network). This can pose a problem for many broadcast applications, such as two-way conversations.
The Comrex BRIC allows the user to set the tradeoff between buffer length and delay, while the Telos Z/IP system automatically and dynamically optimizes for minimum delay using a variable-length buffer that tries to add no more delay than it needs to cope with the current network conditions.
To pass audio through such a dynamic buffer without audible artifacts, a variable time-compression algorithm (“squeeze/stretch”) and some other clever concealment techniques are used.
So, like the advances that preceded them, today’s emerging systems use unique combinations of existing and new technologies to achieve their ground-breaking functionality, thereby bringing us to the next level of quality, efficiency and robustness for digital audio. Soon they will make their predecessors look primitive in retrospect.
Of course, someday these new devices will appear archaic themselves from subsequent breakthroughs, and so on. Such is progress.
But for now, these emerging “Digital Audio 3.0” products set the bar pretty high. Surf’s up — get ready for the next wave.