Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now


Simple, Affordable Live Voice Contributions

BERGAMO, Italy — One of the best ways to produce a live event requires a production crew in the field connected to a production hub that gathers field contributions via dedicated radio links, and a satellite connection from the production hub to the main studio.

This method provides production flexibility and audio quality but it also carries heavy production costs. Another less expensive way to do this is for a reporter to connect to the main studio via a mobile telephone and telephone hybrid. Though voice quality may be poor, connection costs would be affordable. Possible compromises include establishing a data connection from the field reporter’s telephone to the main studio, then delivering live voice reporting through IP via suitable codecs.

This is usually a good solution when connecting to the main studio from remote locations, where the reporter is one of the few people using a mobile phone. But when the contribution comes from a crowded location, like a breaking-news event, things can dramatically change. When many people use a data connection from their smartphones to share highlights of the ongoing action, the data connection between the field reporter and the central studio can be affected by bandwidth limitation.

One of the main differences between broadcast and telecommunication is the way each manages the available resources, depending on the number of concurrent users.

Broadcast, in its essence, is an “exclusive” technology. Transmission resources are allocated per service and per broadcaster, and no interference is expected. In addition, both the network load and the quality perceived by the end user are not dependent on the number of active users — in any given area, broadcasting content to 10 listeners requires exactly the same effort as broadcasting to 10,000 listeners.

Telecommunication on the other hand is a “shared” technology. Transmission resources are combined among the users, meaning that there is a finite, upper limit to the number of actual active users, and that the amount of transmission resources available to each user depends on the overall number of users on the network.

The established, yet rising popularity of social networks means a huge amount of data flows from people’s smartphones through mobile networks, particularly where a breaking-news event is taking place. A data connection is used by smartphone owners to share pictures and videos, dynamically impacting the bandwidth mobile base stations are able to each connected user. Broadcasters reporting from that specific location through an IP data connection may then experience connection degradation, as well as insufficient bandwidth to ensure appropriate audio quality from the field.

In order to widen both the performance and the dependability of the data connection that lets the reporter in the field go live with acceptable quality, radio broadcasters can make use of bonded cellular modems. These devices splice the data rate and distribute it on several data connections across different networks, minimizing possible impairment on single networks.

Mainly designed for video applications, bonded cellular devices can fit radio applications, though may seem overkill to a budget-minded radio broadcaster.

A possible alternative is HD Voice technology. HD Voice (high-definition voice) is the commercial name for Adaptive Multi-Rate WideBand (AMR-WB) codec, a wideband speech audio coding standard based on adaptive multirate encoding, using similar methodology as Algebraic Code Excited Linear Prediction (ACELP).

AMR-WB provides improved speech quality thanks to a wider speech bandwidth of 50–7,000 Hz compared to narrowband speech coders, which in general are optimized for a wireline quality of 300–3,400 Hz. The range of the human voice extends from 80 Hz to 14 kHz. AMR-WB is codified as G.722.2, an ITU-T standard speech codec.

The HD Voice standard is today available on a number of mobile networks worldwide, thus broadcasters can simply select the appropriate couple of mobile network devices (send/receive) to benefit from the technology’s extended capabilities, without incurring experience the typical negative aspects of a data connection.

The mobile network treats an HD Voice call as a voice call with no data involved. Therefore, once an HD Voice call has been set up, it will not be affected by the usual bandwidth limitations experienced by mobile data connections when the number of concurrent users grows over a certain limit.

The audio processing algorithms, which are working behind the scenes to process the audio, are also redesigned to run at a higher sampling rate than normal voice calls (the sampling rate for HD Voice is at least 16 kHz, compared to the usual 8 kHz). A direct comparison between a standard call and HD Voice quality can be accessed online at

Measurements consistently show that the intelligibility of speech decreases with decreasing bandwidth. Harvey Fletcher, the first president of the American Acoustical Society, demonstrated in his “Speech and Hearing” in 1929 that for single syllables, 3.3 kHz bandwidth yields an accuracy of only 75 percent, as opposed to more than 95 percent with a bandwidth of 7 kHz.

Communications technology developer Polycom reported in its 2006 white paper “Effect of Bandwidth on Speech Intelligibility” that “Loss of intelligibility is compounded when sounds are combined in sentences. A sentence composed of 10 words, each with 90 percent reliability, has only a 35 percent probability of being understood clearly. In normal speech, words come at a rate of about 120 words per minute. Consequently, 3.3 kHz speech produces about 40 ambiguities per minute, where 7 kHz speech will produce fewer than four, or close to the accuracy of live open-air speech.”

We are not fully conscious of the actual confusion because our brain has some ability to compensate. When a sound is not clear, the brain attempts to examine the context of the sound. The first analysis is grammatical, but when multiple possibilities fit grammatically, the listener then tries to decide what would make sense in the present context.

However, when presented with a continual string of such verbal puzzles as the broadcast progresses, the listener is distracted. Pieces of the conversation are lost on these mental detours, trying to deduce what words were used.

As this occurs over and over in a show, fatigue increases, while comprehension and interaction drop. The listener has to divert his or her attention much more often to figure out what words were spoken, instead of staying with the flow of the conversation.

The cost for an HD Voice call depends on the mobile operator, but it is usually the same as a standard voice call. This is because the network bandwidth associated with the call is the same of a standard audio call.

Several broadcast-oriented HD-Voice devices are available on the market. Some feature XLR mic inputs and a simplified yet effective mixer to manage the various sound contributions on the remote broadcast site, before sending the masterfeed to the studio.

Davide Moro reports on the industry for Radio World from Bergamo, Italy.