Having taken some time to carefully read through Skip Pizzi’s article “Moving On Up to Digital Audio 3.0” (July 4), I’d like to reply and challenge some statements made in the piece. It made for interesting reading but I believe there were some fundamental omissions in the article, which meant that it stopped considerably short of reflecting the true history of digital audio.
The uninitiated or those new to the industry who read this article would have drawn two possible conclusions: first, that there is only one type of compression algorithm available, i.e. those based on perceptual coding principles; and second, that every new release of algorithm surpassed the previous generation and made the life of the broadcaster better, easier and cheaper.
With regards to the nature of digital audio data compression algorithms, there are fundamentally two coding principles. As Skip pointed out, one of these principles is the perceptual coding technique, which works on the basis of psycho-acoustic masking. This means that if one signal is higher in amplitude and close enough to another signal in the time axis, then the second signal is considered irrelevant and removed.
| Fig. 1: Principles Adopted by MPEG Codecs|
This technique is based largely on the subjective nature of what a computer-generated ear considers to be irrelevant. This technique also is highly processor-intensive and adds considerable latency to an encode/decode process.
However, Skip’s article fails to mention the second family of coding techniques, i.e. those based on Adaptive Differential Pulse Code Modulation principles, which work in the time domain. The apt-X algorithms, which are used extensively throughout the broadcast industry, are key amongst this family.
These algorithms use Predictive Analysis and Backward Adaption to reconstitute an audio signal. Simply put, the technology predicts what will happen based on previous knowledge, subtracts the actual from the predicted and sends only the differential. It is simple, retains audio integrity and offers a low latency.
The apt-X algorithm was launched in 1988 and productized in 1990 in DSP format and then in a PC sound card, so it would be true to say this algorithm was fundamental to Digital Audio 1.0 and 2.0. The suppliers of hard-disk recorders for radio automation and playout systems enjoyed great success with apt-X in the early and mid-90s. Scott Studios, The Management, Computer Concepts, Barrcode, IMD and various other manufacturers installed more than 15,000 systems throughout North America, Europe and Asia.
| Fig. 2: Principles Adopted by apt-X Codecs|
At the same time, suppliers of real-time transport solutions also realized the merits of apt-X as they migrated from analog to digital. Broadcasters could purchase apt-X-based solutions for ISDN, X.21, V.35, T1, E1, RF and satellite delivery from a myriad of manufacturers. These include Harris Intraplex, Moseley, TFT, IDC, RVR, DB Electronica, Comrex, Glensound, Systembase, KW and recently Mayah, Pulsecom and Prodys (Musicam USA).
apt-X on the scene
The first application in which apt-X was deployed was a satellite network in the early ’90s for reporting on NBA games. This point is important, as apt-X has a high tolerance to bit errors and, in the event of a small dropout or partial breakdown in a circuit, can reestablish the connection quickly.
In the same way that perceptual coders did not stand still, amazingly neither did apt-X. Enhanced apt-X was launched in 1999 with even lower latency, increased word depth and greater tolerance to transient content; and apt-X Live in 2006 with increased bandwidth efficiency (8:1) for wireless applications.
Roll on to the mid-2000s and the two fundamental issues challenging broadcasters are alternative transport mediums and increasing the number of audio channels (surround sound).
Most broadcast engineers today are either interested in or currently using IP networks for their audio delivery. Audio-over-IP networking is cost-effective and highly efficient but using perceptual codecs in this application can cause several problems.
The latency introduced by the coding techniques combined with the natural latency of IP networks makes the solution unworkable for any real-time application. Second, perceptual codecs are largely frame-bound and therefore, when using them across a packet-based network, any lost packets will cause an inevitable drop in audio and re-synchronization latency.
Additional algorithms can be layered in for Forward Error Correction, which will go some way to addressing these problems, but they also will introduce even more latency and bring the total delay to more than one second — certainly not a desirable outcome for real-time talkback applications.
Now this takes me to my final point. As Skip outlines in his article, the new stuff is better than the old stuff. However, the new stuff is still struggling to overcome the weaknesses inherent in perceptual coding techniques.
On the other hand, the same fundamental principles that enabled the early versions of apt-X coding to overcome frailties in satellite links now ensure that the new apt-X-based codecs perform magnificently in packet-based IP networks. As apt-X is not frame-bound, packets can be shaped to ensure dropped or lost packets do not adversely affect the audio. Also, in the event of a particularly stressed network, the data stream can re-connect in less than 3 milliseconds using apt-X’s Auto Sync feature.
Skip’s article was a good piece but I felt that in omitting the role that apt-X codecs are currently playing in the new age of Digital Audio 3.0, he presented an imperfect overview of the current situation. I do hope that this commentary will go some way to correcting this oversight.
The apt-X algorithms were there at the beginning of digital audio; have been fundamental to storage and transport throughout recent times and today in the so-called era of Digital Audio 3.0; and they still offer audio quality and performance unparalleled over both synchronous and packetized networks.