The Ear Is a Stern Judge of Quality, and Radio Must Measure Up
In the world of electronic media, the two primary content types are audio and video. A commonly heard argument holds that in the digital environment, the differences between these two media is narrowing, since they can each be represented by digital signals, and transmitted interchangeably on an appropriate channel.
“Bits is bits,” the saying goes; and the fact is that a digital channel cares little whether a signal it passes is audio, video or text, for that matter. As long as a decoder at the receiving end can make sense of the bit stream, the signal can be successfully passed.
In practical terms, however, a much heftier bit stream is required to represent real-time transmission of video than audio. For example, “broadcast-quality,” standard-definition video (uncompressed) generally is considered to require 270 Mbps, while a CD-quality stereo audio signal requires only about 1.5 Mbps.
Thus, not much has changed from the analog days, when a video signal’s bandwidth of 3 MHz or more dwarfed the 15 or 20 kHz required by audio. Add to these physical differences the larger crews required to shoot video — lighting, makeup, camera operators, camera control, video switching and tape operator personnel vs. the audio crew of a single mix engineer, plus one or two stage/microphone techs when necessary and/or occasionally a separate tape operator.
The general impression that audio plays a substantially secondary role is confirmed. This is reflected in the pay-scale and advertising-rate differentials between radio and television, as well. (Of course, to be fair, television includes both audio and video, but the video component always demands the lion’s share of budgets and attention.)
A perceptible difference
The second-class citizenship of audio is not shared where it counts, however: in the human brain. There the perceptual and cognitive processes applied to sound are at least equivalent to those applied to light; and many experts contend that human aural perception is substantially more sophisticated than its visual counterpart.
While this may be hard to accept by those attuned to the electronic media industry, there are plenty of examples to bear out the premise, as follow.
First, consider frequency response. The human hearing sense extends across 10 “octaves,” i.e., 10 doublings of frequency, while human vision barely ekes out a single doubling of frequency perception. While the absolute range in Hertz between the red and violet ends of the visible light spectrum may extend across a wider numerical zone, the wavelength difference between extremes is less than 2:1 for vision, while human hearing handles a 10:1 wavelength range with aplomb.
Consider that this implies a sensory perception managing longitudinal waves (i.e., disturbances in the medium of air that surrounds us) ranging from around 50 feet in wavelength down to a fraction of an inch. The physical behavior of the waves at the low end of the audio spectrum is completely different from that at the high end; for example, reflection, absorption and diffraction effects are markedly dissimilar at 100 Hz vs. 10 kHz. Yet a single sensory organ manages the electromechanical coupling process in an equivalent fashion across this vast range.
Not only are the frequencies themselves sensed, but an amazing amount of intelligence can be gleaned from subtle variations in these signals (consider speech or music). Such is the power of the cognitive processes associated with aural perception.
Another amazing parameter is the dynamic range of human hearing. Again, unlike vision, which has a fairly limited range of perception from dark to light, human hearing can manage a dynamic range of about 120 dB, which corresponds to about 40 doublings of intensity (sound power) from quietest to loudest perceptible sounds. To grasp the majesty of this achievement, consider that the displacement of the eardrum at the quiet end of human hearing is approximately equivalent to 1/10 the diameter of a Helium atom.
Another amazing factor is the directional resolution of human sound perception. While sight is limited to the so-called binocular field of vision (an essentially oval-shaped region in the direction a person is facing), human hearing can detect a full 360 degrees in three dimensions with remarkable precision in localization of a sound’s direction of origin, almost without regard to the direction in which the listener is facing.
The human ear is in itself a marvel.
Following the complexity of the pinna or outer ear, the folds of which provide subtle cues on direction, the middle ear acts as an extremely sensitive and responsive transducer. Its almost unimaginably complex design includes the smallest bones and muscles in the body, which work to convert the wide range of sound waves they encounter into a manageable set of impulses that are sent to the inner ear.
This is where perhaps the most impressive work is done, in converting mechanical vibrations into neural impulses. Such is the work of the cochlea and its manifestly intricate set of membranes, fluids, cilia and receptors that ultimately feed the auditory nerve bundle with the signal that the brain interprets as perceived sound.
Although substantial study has been devoted to this process in recent decades, and a significant body of literature has been produced, there is still much that is not fully understood. What is becoming clear, however, is that the amount of brainpower applied to hearing is likely far greater than that utilized for sight.
This is not to downplay the amazing abilities of human vision. Perhaps most impressive are the adaptive nature of this sense, in its ability to adjust over fairly brief periods of time to large variations in light intensity, or to trade off color perception or resolution for peripheral vision or motion sensitivity.
But in terms of sensory complexity, perceptual data processing and overall efficiency in extraction of information from external stimuli, the hearing sense wins the day.
The latest round in this eternal sparring occurs in the field of data compression or “perceptual coding.” Here the numbers show that video signals can often tolerate around 100:1 compression ratios without significant artifacts, whereas audio can barely manage 10 or 20:1, given today’s technology. One argument made to support this seeming order of magnitude difference is that human aural acuity somehow is more advanced than human visual acuity, and therefore we are less tolerant of audio impairments.
While there are too many variables in the compression argument to draw this conclusion unequivocally, there have been other studies through the years that seem to indicate that test subjects are more easily annoyed by technical problems with audio signals than those in video signals.
As a medium that deals only with the aural mode of communication, radio should consider the relatively high bar that has been placed for it. There are also arguments that this bar continues to move higher, as listeners’ tastes “evolve” to expect continually improving fidelity. AM begot FM and the LP begot the CD. What’s next?
A fork in the road
Here’s where we return to the compression argument.
The audio industry today seems to be of two distinctly different minds. One group is pushing the envelope in its traditional fashion, toward ever-higher fidelity. This is evidenced commercially at present by developments like the SACD and DVD-A.
Meanwhile, another contingent is concerned with squeezing the best level of fidelity possible out of constrained bit rates. This group is the codec development community, which has most recently extended its art with products like MP3pro, CT-AAC, PAC4 and the like, along with improvement in streaming media players’ proprietary codecs, and apparently more to come soon.
Radio is faced with perhaps its most critical technical challenge as it decides which path to take. The higher-fidelity route seems unreachable with current spectrum availability and technical proposals on the table. Meanwhile, the “more-with-less” route seems possible, but there is substantial concern that the multiple generations of coding that it will bring to bear in practice may seriously harm the original intent of continually improving the standards of audio quality delivered to listeners.
Broadcasters would be well-advised to tread carefully here. The ear is a harsh critic.