Human Perception and Cognitive System Sound Masking BERGAMO, Italy — Supporters of any digital radio broadcasting standard commonly claim that digital radio enables listeners to enjoy a superior sound quality.
To achieve this however broadcasters need to deploy a proper configuration of the signal chain. If it is poorly designed, digital broadcasting sound can become noticeably worse than a corresponding analog broadcast.
In all analog transmission standards any signal at the receiver input is treated as a “useful signal,” provided it stands within standard-specific amplitude and frequency limits. In digital transmission standards, the receiver is capable of detecting the presence of “unwanted” signal components. Within standard- and modulation-specific limits, it is also capable of digging for the original (i.e. transmitted) signal, decoding and presenting it as if the receiver was directly, ideally connected to the transmitter.
Digital radio transmissions allow for the elimination of all typical noises that affect the concerned frequency band — from man-made noise to thermal noise and transmission impairments due to non-ideal propagation. Clean sound, (without noise, scratches or hisses) is a valuable part of “sound quality.” But this is only a part of a listening experience.
A Block Diagram of a Common Perceptual Audio Encoder
Digital radio broadcasting offers much more. Long-distance analog broadcasting standards (short-, long-, and medium-wave transmissions) were mainly designed for speech-shaped signals. Their narrow frequency channels usually range from 5 kHz to 10 kHz, limiting the audio bandwidth and preventing these standards from delivering realistic, “life-like” sound and music.
The FM radio standard features better sound delivery capabilities with 30 to 15,000 Hz audio bandwidth stereo sound. Whatever the frequency band, digital radio can extend the audio bandwidth from 20 to 20,000 Hz with vast dynamic range.
The use of advanced psycho-acoustic encoding enables digital radio standards to broadcast optimal sound, even at low bitrates. Unfortunately, unoptimized configuration of the broadcasting chain can severely limit these sound capabilities, driving digital radio to “sound” poorer than its analog counterpart.
The heart of digital radio sound performance is the psycho-acoustic encoder. How does it work? Suppose you receive a text asking “How do u do?” You will likely answer as if the you were asked “How do you do?”
Your perceptual and cognitive systems are able to rebuild the correct message even if just a part of it has been received. This happens because the sound of the vowel u “prevails” over the sound of the “yo.” You can then send just the u in your text cutting out the “yo” and saving two digits — saving 2/3 of the original bandwidth.
A similar process happens when a (usually) audible sound is masked by another sound. Conversation at a bus stop can be impossible if a loud truck is driving by. A quieter sound is masked when it is made inaudible in the presence of a louder sound. Psycho-acoustic encoders remove (or aggressively compress) those parts of a given digital audio signal that can safely be neglected — that is, without significant losses in the (consciously) perceived quality of the sound.
To safely detect these parts, human ear sensitivity and perceptual models are taken into account. Then the compression algorithm can assign a lower priority to sounds outside the range of human hearing, including masking and ear sensitivity at that specific frequency. Everything works properly when the encoder is fed by a “plain” sound signal, that is to say, a sound the human ear could “normally” hear — the voice of a speaker, the sound of a guitar during a live performance, traffic noise at a bus stop, and so on.
The algorithms are designed on human ear characteristics. For this reason any audio signal that has been altered or processed can mislead the encoder, leading it to fail a proper designation of high- and low-priority parts of the audio signal.
Let’s consider a typical sound chain of a FM station. The “master” signal coming from the studio playout enters the main sound processor; this creates the specific “sound” of the station. Dynamic range compression is often used in radio broadcasting, either as a part of this sound or to boost the perceived volume of the station, while complying with frequency deviation requirements.
Pre-emphasis is a typical requirement of FM sound broadcasting. In FM transmissions, noise has a triangular spectral distribution, resulting in higher noise affecting the highest frequencies within the baseband.
Pre-emphasis boosts the high frequencies before transmission; a specific circuit within any receiver reduces the same frequencies by a corresponding amount. Reducing the high frequencies in the receiver also reduces the high-frequency noise.
The Configuration of a Typical FM Sound Chain
A chart of the signal chain that allows for the best sound performance in analog and digital radio simulcasting.
Here the psychoacoustic encoder receives a signal
with compressed dynamic range.
Here the psychoacoustic encoder receives a signal with compressed dynamic range and a pre-emphasis applied to higher frequencies.
When broadcasting the same content on both analog and digital radio (simulcasting), the excellent sound capabilities of digital radio (20 to 20,000 Hz frequency response and CD-like dynamic range) can only be fully taken advantage of when the digital chain is fed by a full-quality source signal. The master signal from the studio playout has to be sent to both the FM sound processor and the dedicated digital radio sound processor (some equipment combines the two processors into a single chassis).
A proper setting allows the station manager to preserve both the sound of the station and the crystal integrity of the source sound, allowing the digital chain to deliver a CD-like signal free from scratches and noise.
On the other hand, when the signal to the digital chain is derived from (and after) the dynamic range compressor or even after the pre-emphasis, the sound delivered by the digital transmission will most likely be worse than the corresponding FM one. Dynamic range compression and pre-emphasis dramatically alter the “master” sound signal in a way that the psycho-acoustic encoder is not prepared to manage.
The psycho-acoustic encoder gets “tricked” by a supposedly plain signal, and applies the human perceptual and cognitive model to a signal that no human ear will ever hear under natural conditions. The result is usually an unnatural, “cold” sound, with unrealistic emphasis on high frequencies and lack of “presence.”
Davide Moro reports on the industry for Radio World from Bergamo, Italy.