PRSS Tackles Audio Program Consistency

The author is senior technologist at NPR Labs.

The Public Radio Satellite System is a highly diverse web of satellite and IP links serving more than 400 public radio stations.

What distinguishes it from other radio distribution networks, which generally feed the content from a handful of sources, is that the PRSS network delivers audio content from close to 100 producers and distributors each month, with a wide range of program types, from magazine news shows to comedy/music/variety programs to opera broadcasts. This immense diversity of content producers has resulted in an equally diverse range of audio program levels that share the delivery “pipe.”

Prompted by public station concerns about inconsistent levels, PRSS led a recent overhaul of audio standards for all submitted program content so that stations — and ultimately, listeners — get audio content that is consistent in level from producer to producer, as well as of the highest quality. This overhaul resulted in the selection of Program Loudness as the metric — a metering technique that has been dubbed a “true audio leveling revolution” but is still relatively new to American radio broadcasters.

THE BEGINNING: ANALYZE THE SOURCE MATERIAL

The PRSS began with an analytical review of recent public radio program material, which confirmed the need for change. In Fig. 1, the average program level of 4,650 individual programs carried by PRSS is plotted. These full-program averages were measured using the International Telecommunications Union standard loudness meter, which measures audio program loudness the way human ears perceives audio loudness. This standard is based on years of psychoacoustic research and testing by the ITU to develop the best measure for program audio levels. Loudness is a major departure from ordinary peak meters, which only indicate the maximum instantaneous signal; loudness considers the amplitude, frequency, content and the duration of the sound signals.

In Fig. 1, the vertical scale is in Loudness Units referred to Digital Full Scale (LUFS), where one LU is also a change of 1 dB. Ideally, we would like the average loudness of all programs to agree within a small range, but it is apparent that these averages are not close to a common level, and many vary more than 10 LU (dB) from the center. This variance in audio level between programs requires stations to manually adjust levels during broadcast or use more automatic gain control to compensate for the level shifts — else the listener must reach for the volume control.

HOW THE LEVELS BECAME UNLEVEL

PRSS has had submission standards for audio level in place for decades, so one would understandably wonder how audio levels could become so uneven. As I discussed in my article in the April 16, 2014 issue of RW Engineering Extra, “A New Way to Monitor Signal Level” [see Reference 1], dependence on peak-reading meters to set audio levels has led to poor loudness management.

Readers are encouraged to review that article, but to summarize: Peak-reading signal meters are designed to indicate signal peak overload, but signal peaks are not a good indicator of optimal audio level. Perhaps ironically, the transition from VU meters to signal peak meters took us a step away from consistent program levels: The VU meter has a rise time of 300 ms, too slow to indicate peaks, but closer to the ear’s perception of loudness than a peak meter.

Knowing the shortcomings of peak and VU meters, working groups at the Radiocommunication Sector of ITU and the European Broadcasting Union developed an algorithm for a better meter — one that measures program loudness similar to human hearing. This algorithm is currently defined by ITU Broadcast Systems recommendation BS.1770-3 [see Ref. 2]. ITU standard loudness meters are in widespread use in broadcast organizations across Europe and in video production centers in the U.S., a result of concern (and regulations) for consistent loudness in television programs and commercials.

The ITU loudness meter, shown in simplified form in Fig. 2, first performs frequency weighting for each channel, rolling off below 100 Hz and providing a uniform boost to frequencies above 2 kHz of about 3.5 dB, as indicated by the blue curve. This is performed by the “K weighting filter” shown in the diagram. From this filter, the total mean-square amplitudes are calculated, channel-summed and logarithmically converted to a decibel scale.

The output of the logarithm (10 log) converter provides a real-time indicator with program loudness, LU, where a change of 1 LU = 1 dB. The numerical measurement is referenced to nominal full scale with the designation LUFS.

For long-term measurement of loudness, such as the program averages shown earlier, a relative-threshold gate is added to hold readings when the signal drops below a certain threshold. This prevents silence or background sounds from biasing a long-term integrated loudness value. While this is not necessary for real-time displays, gating is applied to all average loudness measurements under the current standard.

The ITU loudness meter includes an advanced peak metering feature intended to measure the “True Peak level” of the audio signal. This was included because digital processing can cause inter-sample peaks that exceed the indicated sample level. Where it is important to have a reliable indication of level, this meter can indicate clipping, even when the peak lies between samples, so overshoots created by subsequent digital-to-analog converters, sample rate converters or commonly used codecs can be predicted and avoided. These measurements have the designation dBTP.

In the article mentioned above, 49 sample audio streams were collected from public radio stations and analyzed for loudness. Fig. 3 shows the result after matching, or normalizing, those streams to the same loudness target (the height of the blue bar is truncated to show detail). This chart shows that the peaks are distributed over a range. That is, when program loudness is held to a target average, signal peaks should be expected to vary. For this reason, the target loudness is chosen sufficiently below digital full scale to provide ample headroom for signal peaks and changes in dynamics. The European Broadcasting Union formed the PLOUD Committee to determine the optimum parameters to use with the ITU loudness meter, which resulted in a series of documents under the EBU R 128 standard [3].

COLLABORATION AND AGREEMENT BETWEEN THE PARTIES

After consultation with American Public Media, Public Radio International and NPR, as well as member station engineers and producers, PRSS developed the following parameters for network distribution of high-quality audio:

The average loudness value supports program content of wide dynamic range, such as concerts, dramas, etc., with adequate headroom to full scale. A range of ±1 LU is recommended by EBU so the PRSS specification should be achievable with all programming. (When recorded programs are normalized, the target is achievable with no error.) The maximum for program peaks allows for overshoots that may accumulate from subsequent encode-decode cycles, digital filtering, etc. The dBTP figure provides for oversampled (True Peak) metering, whereas the dBFS figure applies to standard sampling rates.

Fig. 4 shows the PRSS program loudness standard on the ITU meter scale and how it relates to other meters. Because the ITU meter scale is referenced to digital full scale, it can serve a dual purpose, showing both loudness in LUFS and signal peaks in dBFS. (The colors are merely for illustration and are not official.) Some meter displays, such as Martin Zeuther’s K-Meter [4], show both indications on a common bar graph, with loudness designated by a solid bar and signal peaks indicated by a single floating segment. A lineup tone is illustrated at –20 dBFS. Although there is no standard for this level, 1 kHz is important: The ITU loudness meter is designed so that the frequency-weighted LUFS scale aligns in level with an unweighted peak meter scale at 1 kHz. Only a 1 kHz tone serves this common indication.

With the help of loudness meters, especially ones that can determine the program average, consistency in loudness can be easily achieved. Fig. 5 illustrates the process, called “loudness normalization.” In this chart, the audio stream of an NPR news magazine program is logged for approximately 6 minutes, producing the solid blue line for short-term loudness (3 second integration time) and the solid red line for signal peaks. It has a long-term (average) loudness, indicated by the dashed green line, which reaches approximately –20 LUFS at the end of the sample period. A second audio stream — the opening soundtrack music of the movie “Harry Potter and the Half-Blood Prince,” is shown in Fig. 6. At the end of this three-minute segment, the average loudness reaches about –32 LUFS. Without normalization, a listener switching from the first to the second stream would hear a drop in loudness of approximately 12 dB.

Normalization of these two audio streams to a level of –24 LUFS, then, simply lowers the encoding gain of the stream in Fig. 5 by 4 LU (from –20 LUFS to –24 LUFS), and raises the gain of the stream in Fig. 6 by 8 LU (from –32 LUFS to –24 LUFS). The two streams now have compatible loudness for transmission to listeners.

ITU loudness meters indicate the average loudness of a program, but this does not tell us how well the loudness variations within the program adhere to a given average. For this, the meter offers another feature: the Loudness Range (LRA) of a program, which provides an objective measure based on the statistical distribution of loudness over the program time-scale. The details of this algorithm are beyond the scope of this article; interested readers should see EBU – Tech 3342 [5].

Using the same 4,650 public radio shows measured earlier, Fig.7 shows their LRA distribution. The vertical scale covers 25 LU, with 0 at the bottom. It is apparent that the programming spans a large range of values, with a mean LRA of about 7.5 LU.

Loudness Range can guide broadcasters in determining how much the loudness of a program varies, and whether the program is suitable for broadcast as-is or whether dynamic compression is desirable before broadcast or streaming. Fig. 7 includes a scale adapted from EBU data, which suggests the optimal loudness range for some common listening environments: in-car, late-night (not to disturb others), kitchen, living room (presumably without noisy kids) and home theater.

Comparing this to the measured values, most programs are at the Loudness Range suited to “car” or “late-night” listening, and this is before they are broadcast or streamed, during which more audio processing is usually added. This is likely to result in dynamically flat, dense sound that may fatigue listeners.

There are also a significant number of programs that fit the “living room” or “home theater” environment. Unlike the first group, these programs may need some dynamic range compression for noisier environments, unless the majority of listeners are in the living room. In any case, the LRA provides an additional measure of programs that can guide program directors on the need — or not — to process programming for broadcast or streaming.

IMMEDIATE STEPS AND FOR THE FUTURE

PRSS has adopted the following multi-step program to roll out system-wide audio loudness standards for programming:

• Research and adopt new loudness-based audio standards for PRSS content.
• Gather technical material as available for the public radio system to help production centers understand and implement loudness standards.
• Conduct webinars for producers to promote and teach loudness measurement concepts.
• Help engineers and producers with solutions, by providing lists of available loudness software for digital audio workstations and standalone loudness meters.
• Develop automatic tools to measure the audio parameters of programs full-time, with Content Depot reports accessible to the respective producers.

More pieces remain to be put in place and milestones to be achieved, as the PRSS loudness standard was just announced in late December 2014; there may be changes and refinements in the program as it proceeds. But the goal of improving the public radio experience for the listener remains firm, and the PRSS is hoping 2015 will the year to achieve consistency in program levels, which will be a welcome improvement for stations and listeners alike.

The author thanks the staffs of PRSS, NPR Labs, NPR Audio Engineering and American Public Media for their contributions and support for this article.

REFERENCES

[1] www.radioworld.com/article/a-new-way-to-monitor-signal-level/269991
[2] www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-3-201208-I!!PDF-E.pdf
[3] https://tech.ebu.ch/loudness
[4] www.mzuther.de/en/software/kmeter
[5] “Loudness Range: A measure to supplement loudness normalisation in accordance with EBU R 128,” https://tech.ebu.ch/docs/tech/tech3342.pdf