Delay makes real-time broadcasting a thing of the past in the not-so-distant future
As the transition to digital has unfolded, we gradually have accepted the fact that “delay happens.” Digital signal processing takes some time to do its work, and buffers are required at numerous stages in a digital audio signal path. Perceptual coding is a particular culprit in this respect, with higher data compression ratios generally involving proportionally higher processing latencies.
The radio industry has adapted itself to these delays, just as it did when geostationary satellites were first used for signal backhaul and distribution. Satellite delay was uniform and predictable, at about 1/4-second per hop. This was not altogether different from confidence monitoring delay on three-head analog tape decks – a staple of the time, but now historical artifacts – so the amount and fixed nature of the delay involved were somewhat familiar. It required some creative methods in com linking (i.e., various uses of the “mix-minus” process) so that talent never had to hear its own live voice with this kind of delay, which could turn an otherwise intelligent person into a babbling idiot.
Digital delay was a less predictable thing. Switched 56, ISDN and even POTS lines equipped with perceptual codecs began to replace satellite backhaul; in addition to their convenience, these systems reduced the audio delay encountered on sat links, so this was considered a step forward on multiple accounts. Yet because every path length was now different, one never knew just how long the delay would be, and it became a guessing game to see if talent could handle a live voice return or if mix-minus was still required.
Engineers began to notice that there were two variables, one physical and one human. The actual (as opposed to the billed) routing of the phone line plus the codec latency determined the physical delay, generally on the order of tens of milliseconds. But some talent seemed to tolerate more delay than others, so obviously a perceptual variable was also at work.
The latter issue was also noticed when perceptual codecs began to be used on compressed digital STLs. Here the delay remained fixed throughout, but some talent could handle the delay better than others. A rule of thumb held that any delay over 10 ms (in some cases, up to 20 ms was used) should not be used for live talent monitoring, meaning that either a mix-minus (for backhaul communication) or a local program feed (for off-air monitoring) should be engaged instead.
Another temporary step forward came about with uncompressed digital STLs, making many complex air-monitor switching systems no longer seem necessary. But this respite was short-lived.
The past is prologue
Today most broadcast air chains are populated by numerous sources of delay, from compressed backhaul links to digital storage systems to digital mixing consoles and routers to STLs to signal processors, all of which take small amounts of time to fill their buffers and run their DSPs. Even with the uncompressed STLs now in common use, such signal paths often tally up total latency that runs well over the 10 ms perceptual threshold. It has therefore become standard practice in many facilities to use local program signals for all live talent monitor feeds, rather than off-air monitoring. This generally is accomplished via switching added to all mic keys. Whenever a mic comes on, local program audio is fed to the monitoring bus, and when the mic goes off, the air monitor returns.
This is a good thing, because delays are likely going to increase dramatically for many stations soon. What may be added to those air-chain delays are two forms of IBOC delay, and in many cases, profanity delays, as well.
The first type of IBOC delay is generated by the HD Radio format’s audio codec processing and its channel codec interleaving. The former turns the audio program into a high-quality, low-bit-rate signal, while the latter adds robustness for mobile reception. The exact amount of this delay varies, depending on the design and mode settings of the exciter, but it will generally run around three or four seconds. Because the HD Radio exciter outputs both the digital and analog signals, the analog signal is delayed in the exciter by whatever amount of processing delay is applied to the digital signal, so the two signals are time-aligned after the digital signal is generated.
Then the second form of IBOC delay is applied. This is the so-called diversity delay, which is fixed at three HD Radio frames (1.486 seconds each), or about 4.5 seconds total. This is an additional delay added to the analog signal by the exciter, which is introduced so that the analog and digital signals are purposely out-of-sync in the broadcast signal. IBOC receivers are programmed to delay the decoded digital signal by three frames to resync the two signals after reception.
The asynchronous relationship of the signals while they are in the air means that any disturbance to the RF signal will affect the two audio signals at different times. In other words, if at time T an obstruction causes a momentary fade at the receiver, the digital signal will be affected at time T, but the analog signal will be affected at time (T – 4.5s) (i.e., earlier), implying that the analog signal likely will be back to an unimpaired condition when the digital signal impairment occurs, and a “blend” to analog can then occur without difficulty.
The 4.5-second gap was chosen after statistical analysis revealed it would be an optimal window to accommodate most mobile FM reception impairments, thereby providing the highest likelihood that analog and digital signals would not fail simultaneously. The three-frame buffer is relatively easy and cheap to implement in receivers, and because the signal would necessarily be delayed by codec processing and transmission interleaving anyway, there might as well be some benefit in increased system robustness added by a little more delay. (This is based on the rationale that says, “Once your system has four seconds of delay, is another four that big a deal?”)
Note that HD Radio’s codec/interleaver delay is intrinsic to the system, but the diversity delay is optional. Ibiquity strongly recommends it be used, but it can be switched off by the broadcaster. There is a flag in the bit stream to indicate such status, but it is up to the receiver manufacturer to decide whether or how its devices will react if diversity delay is not enabled. Some designs may disable analog blend functions in this case, while in others digital and analog audio signals will be out of sync (and the advantage of time diversity to combat signal fading will be lost). Of course, it also makes the analog signal ~4.5 seconds closer to real time, which is why a number of early adopting IBOC stations have elected to run with diversity delay switched off for the time being, until a reasonable number of HD Radio receivers are in use.
Finally, consider that this sync issue is not only encountered in the event that a digital signal loss causes the receiver to blend to analog (and back), but also occurs every time a station is tuned in (Ibiquity calls the latter “acquisition blend”). Recall that HD Radio tuners acquire signals using the analog audio, switching to digital signal only after the receiver’s delay buffer is filled. If diversity delay is not enabled by a broadcaster, upon signal acquisition some receivers may not automatically switch to digital (they may require the user to manually switch over), while in others the listener will hear a few seconds of analog audio followed by an instant replay of the approximately same few seconds in digital audio. (The opposite effect would occur during loss-of-signal blend, in which case the listener would miss a few seconds of audio when switching from digital to analog, or the receiver might simply mute.)
In any case, listeners could assume that this station was having audio problems, and would likely tune away after a few such occurrences, choosing to stay with stations where this did not occur (i.e., those with diversity delay enabled).
The biggest problem for listeners with such delays will likely occur when listening to the radio while watching live sports events at their originating venues or on TV, or when watching live news events covered by multiple broadcasters (such as presidential press conferences). In these cases, a seven- or eight-second delay between the event’s live occurrence and its representation on radio could be quite disturbing. While TV programs are also likely to be delayed from real time or from one another by a satellite hop or so, radio will be shifted significantly further. Rather than being off by a word or two, radio will be a sentence or more behind the news event, and well in the wake of the action for sports. (Remember that this will affect both analog and digital listeners.)
Of course, these previously unencountered latencies may seem insignificant in the face of the current indecency binge, which is causing many stations to consider or even install multiple minutes of delay to their signal for protective purposes. While this, too, is not new, earlier profanity delays were limited to live call-in shows, and typically added only a few seconds of delay, providing just enough time for an operator to react to a profane utterance in the studio and dump the caller and the buffered audio before it hit the air.
Today’s approach is routinely to add much longer delays to all live-assembled programs, whether call-ins are involved or not, such that a station’s “appropriateness authority” can listen with, say, five minutes of lead time to decide whether the content should be broadcast. It’s amazing what a little wardrobe malfunction can do to an (audio-only) industry.
Unlike the IBOC delays noted earlier, which station monitoring can deal with in much the same way earlier latencies were handled, a multi-minute program delay with dump option standing by requires more complex operational rethinking.
Like Moore’s law, delay seems to be something that has increased exponentially over time. In just a few short years, broadcast audio delays have stretched from milliseconds to minutes, due to an odd confluence of technical and programmatic reasons. It seems hard to imagine, but the immediacy once so prized by radio broadcasters has itself been devalued by the passage of time.