Go Behind the Curtain of the Digital Revolution

Understand the principles of digital to get the most out of your audio
Publish date:
Social count:

This is the conclusion of our two-part article about basics of digital technology. The first part appeared in the Dec. 16 issue.

Digital computers have been commercially available for almost 70 years, and yet the digital “revolution” is a relatively new phenomenon, beginning in the early 1980s with the introduction of the IBM PC.

Why the delay? Two words: speed and memory. And those two words lead to a third: cost.

Last time I explained the nature of the problem and mentioned the fact that in the process of writing the piece, I loaded 129,712 individual bits of information into my laptop’s memory, 16,214 bytes in all. A fair amount of digital writing to be sure, but a pittance considering the requirements of recording and storing digital audio. Computing power simply was not up to the task until the microchip was invented, and even then, some tricks were required to pull it off.

Image placeholder title

iStockphoto/ttszCOMPLEX ISSUES
The transmission of information requires some complexity. The more complex the information, the more complex the system needed to send and receive it. In the computer world, a single bit can only relay two different conditions: on or off. OK as far as it goes; but even Paul Revere needed two lamps to get the jump on the British, so clearly, we need more bits; a lot more.

The written word is complex, photographs even more so, music greater yet, and video information blows the doors off all of it. As each new level of information transfer is considered, the “write speed” (or in the analog world, the “bandwidth”) of a system has to increase right along with it. Early attempts at transmitting “content rich” data illustrated the problem.

In the 1950s, Bell Labs had a prototype videophone demonstration at Chicago’s Museum of Science and Industry that fascinated me. The phone worked pretty well, with my little sister’s grainy black and white image filling the screen from the other side of the museum, sticking her tongue out at me. The exhibit pointed out, however, that the videophone required 600 times as much bandwidth as a simple audio phone call. That issue, unavoidable at the time, doomed the analog videophone.

Digital compression, which we’ll discuss, resurrected the videophone idea but still, the speed required to copy and playback large amounts of information, like videophone images or full-fidelity audio, is tremendous, even in compressed digital form. To understand why, we need to revisit the chessboard example from the first article.

Recall that we could identify each square on the board by connecting six binary switches (binary meaning having two states) in a “Power of Twos” configuration, each switch having twice the numerical value of the one preceding it. If we assign a unique bit count to each square, we could store it and recall it with exact precision, time after time. But what if our information was more complex than what six bits could define? What if we wanted to overlay a complex musical tone on our chessboard?

We could, of course, try that, but it is immediately apparent that the 64 squares on the chessboard would capture a woefully inadequate snapshot of the sound wave, made up of not only the fundamental frequency, but several harmonics as well. Using the term from part one of this article, the resolution of a 6-bit system stinks. We need more squares, which means more switches.

Since eight bits is the standard number of bits for a computer byte, we’ll start there.

Eight bits allows for 256 unique switch arrangements (2×2×2×2×2×2×2×2=256). Now we’re talking. By adding just two more bits, we have increased the sample size, as it is called, fourfold. A much more defined chessboard!

A 256-bit sample size gets us started, but we still have to consider how often we need to make that sample, since one snapshot of a piece of audio doesn’t help at all if we are trying to capture an entire song. We have to take multiple snapshots and we have to take a lot of them. That is known as the sample rate.

The sample rate directly correlates to the frequency response of the digital system used to record and playback audio. This is because of the way sound waves are constructed. A low-frequency tone is made up of waves that are long, like rolling waves on a pond. High-frequency waves, though, are spaced very close together. The higher the frequency, the more tightly packed the waves and the more often we have to sample in order to capture instantaneous changes in the waves.

An AT&T/Bell Labs engineer named Harry Nyquist derived the equations which now bear his name to predict how fast telegraph pulses could be sent over a radio channel in the 1930s and ’40s. The Nyquist Sampling Theorem states that a sampling frequency must be roughly twice as high as the highest recovered frequency.

If we are looking to capture and reproduce 20 kHz CD quality audio, then, we need to sample at a rate a bit more than twice that frequency. For that and some other technical reasons, a 44.1 kHz sample rate does the trick.

Since we’re gobbling up computer speed and memory anyway, let’s just go ahead and grab for the brass ring; we’ll increase the sample size from eight bits to a more generous 16 bits. Two to the 16th power slices our chessboard into 65,536 squares, a lot of resolution in anyone’s book. If we capture our complex waveform on that chessboard, and further, sandwich it side by side with 44,000 other chessboards having the same resolution every second, we have created the current standard for broadcast digital audio. And with a simple calculator, we can multiply those two numbers together to determine the impact on our computer’s speed and memory.

The result is a whopping 705,600 bits (or 88,200 bytes) of data to be stored every second! Now in the world of the terabyte (1 trillion bytes), 88,200 bytes per second might sound trifling, but it definitely is not. Consider that the average song is maybe 3.5 minutes long. If we sample and store over 88 kBps (kilobytes per second), that is 34 megabytes of information per song!

I was an early advocate of digital audio and even designed a hard-drive storage system in 1993. The largest drive I was able to buy for the system was 345 MB and it cost $1,600! Recording music with a sample size of 16 bits and a rate of 44.1 kHz was simply beyond the limits of technology in that long ago time (I would have gotten all of 10 songs on the drive before it filled up). I used 8-bit samples at a rate of 32 kHz and was able to store a few hundred songs.

Fortunately, there are a few tricks that we can play with the digital signal to save space and still record and playback at our desired 16bit/44.1 kHz sample rate. Which gets us to compression.

MPEG and ADPCM are two widely used compression algorithms.

MPEG is an acronym for the Motion Picture Experts Group, which is a standing committee of experts from that industry. Beginning in the late 1980s, this group took on the task of determining the best method for compressing the incredible amount of data involved in digitally recording movies and sound tracks into something more manageable. In the process, they created MPEG1, then MPEG1, Layer 3 (which is what we call MP3), and on to MPEG2, MPEG3 and MPEG4; each scheme bringing slightly different attributes to the compression algorithm.

ADPCM stands for Adaptive Differential Pulse Code Modulation. That compression scheme was invented by Bell Labs for compressing voice data over digital subscriber lines. (As an aside, the ADPCM standard was originally based on an 8-bit sample size and a sample rate of 8,000 samples per second, resulting in a 64 kbps data rate. That, in turn, became the DSL — Digital Subscriber Line — bitrate standard.)

Without delving into the specific equations of each method (which is above my pay grade anyway), all of the compression schemes are referred to as “lossy,” which means they lose data (and therefore, fidelity) as they do their work. The lossy compression model was chosen because it was simple and because “lossy” is a relative term. It turns out there is a large amount of audio information in the average musical recording that the average person simply cannot hear very well.

ADPCM and MPEG compression both use pyschoacoustic modeling to determine which specific sounds and frequencies can be “downsampled,” or ignored altogether in the A/D conversion, without noticeable degradation during playback. There is, of course, some disagreement on all of this (who, exactly, is the “average” person; how much is too much when limiting certain high frequency waveforms.), but for the most part, compression works extremely well if used with a tender touch and in any event, like it or not, it is a very necessary fact of life in the digital realm.

So given all of that trouble, is the technology that makes digital broadcasting possible worth the effort? The answer is unequivocally “yes,” and for several reasons.

The first is noise. Noise, like all naturally occurring sound, is analog in nature. Furthermore, it is what is called a “spread spectrum” signal, which means its energy is spread thinly over many different frequencies. The good news about that is that noise is never very problematic, if the main signal is loud enough. The bad news is that it is always there, and is very hard, in fact impossible, to completely remove.

To make matters worse, since noise cannot be eliminated, when copies are made of analog audio, the noise is copied, as well. The main audio level remains fixed from copy to copy, but new noise is added to existing noise with each copy, so after a very few copies (well under ten), the noise is so prevalent, that the copy is worthless. Record companies and copyright holders thought this was pretty cool, but the rest of us hated it.

With digital copying, though, once the series of digital bytes is recorded, they can be “cloned” over and over and over again, with each copy being an exact duplicate of the original. The “exactness” of digital copies is due to the fact that digital signals are not amplitude dependent “waves,” but rather are based on the sequence of ones and zeros in each byte. So long as the sequence remains accurate, the decoding will be accurate, even though some analog noise from the original microphone and wires and amplifiers will always be lurking around in the background.

The second reason digital equipment is preferable to analog is cost. Gordon Moore, one of the founders of Intel, informally suggested what has become known as Moore’s Law over 40 years ago. He predicted (and the prediction has proven to be uncannily accurate) that computer speed and memory would roughly double every 18 months. Another way to state the law would be to say that for a given amount of computer power, the cost would halve every 18 months.

Either way, the cost of digital equipment, which can be “bent” to perform almost any function requiring calculations or processing through the simple expedient of software, has become extremely inexpensive compared to analog equipment, which must be built, one component at a time, for hardware specific applications.

So that is the story of what goes on behind the curtain of the digital revolution as it applies to radio broadcasting. Worth it? Absolutely. Complicated? You bet. In fact, I’ve worked up a sweat explaining it. I think I’ll take a break ... maybe fire up the TEAC reel-to-reel and go have a cup of digitally prepared coffee.

Jim Withers is owner of KYRK(FM) in Corpus Christi, Texas, and a longtime RW contributor. He has four decades of broadcast engineering experience at radio and television stations around the country.