The author is senior technologist at NPR Labs.
A new form of audio measurement is poised to invade North American control rooms and studios: loudness-based metering.
It is one of the most fundamental changes to occur in audio metering since the introduction of peak metering nearly two decades ago. It is also a major departure from signal peak metering, which is so common today, as well as the historic VU meter — a change that some believe will resolve long-standing issues with irregular program levels and listener annoyance.
Peak-reading meters are designed to indicate the potential for signal peak overload, but are not such good indicators of optimal audio level. For that, engineers use their ears and gain controls to mix content at the appropriate levels. As discussed below, loudness metering is a more sensible alternative for characterizing the dynamically-compressed and often mismatched program levels in today’s audio.
Fortunately, a great deal of work has been done on loudness measurement by some dedicated engineers on working groups at the Radiocommunications Sector of the International Telecommunications Union (ITU) and the European Broadcasting Union. Their research over many years led to the development of an algorithm for a better meter — one that measures program loudness similar to human hearing. This algorithm is defined by ITU Broadcast Systems Recommendation BS.1770-3. (A PDF can be downloaded from the ITU here.
THE LOUDNESS METER
The design of the loudness meter is illustrated in Fig. 1, showing a simplified diagram of the ITU device. The left and right channel audio are passed through separate “K-weighting filters” having a frequency response indicated by the blue curve. The content below 100 Hz is rolled off by a high-pass filter, while content between 100 Hz and nearly 1 kHz are passed normally. From 1 kHz to 2 kHz, the filter increases gain, and then shelves the gain for frequencies above approximately 2 kHz. The signal from each filter is converted to means-square amplitude before being summed. Multichannel meters do the same steps with center channel and surround channels, with slight difference in gain compared to the left and right channels. The low-frequency effects channel is not included, as it contributes little to the sense of program loudness.
The ITU standard design provides an indicator of real-time program loudness in Loudness Units (“LU”), where a change of 1 LU is 1 dB. The numerical measurement is referenced to nominal full scale with the designation LUFS.
For long-term measurements a gate is added to pause the measurement when the signal drops below a level-determined threshold. This prevents silence or background sounds from biasing a long-term integrated loudness value. The ITU algorithm also defined the method of measuring the “reconstructed” signal peaks that accompany the loudness graphs. This algorithm can estimate the height of signal peaks that exceeded full scale — in effect displaying a positive dBFS value!
THE PROBLEM WITH PEAKS
Assuming for a moment that the ITU loudness meter represents our sensation of program loudness, the problem with peak-reading meters is illustrated in Fig. 2. In this chart, five minutes of NPR’s “All Things Considered,” a nightly news program, is measured for both signal peak (in red) and loudness (in blue). The data was collected with the Orban Loudness Meter, which permits logging the audio data to a CSV file for analysis. In the early part of the sample, at the left, a vertical arrow indicates one of the highest local moments of loudness, where the difference between the two meters is around 12 dB (the difference between the peaks, in dBFS, and the loudness in LUFS). Toward the center, a high loudness point coincides with one of the highest signal peaks, with a difference of 18 dB. Then, a little more to the right, less than a minute later, the highest moment of loudness occurs, but the signal peak is only 10 dB above the loudness.
The great majority of loudness peaks in Fig. 2 range between –25 and –20 LUFS, suggesting that the loudness is relatively even, with this unprocessed audio. However, the signal peaks have ranged from about –12 dBFS to –2 dBFS, a range of at least 10 dB. Had we adjusted parts of this program to maintain a similar peak level, we can expect that program loudness would have varied substantially.
For those who are interested in the accuracy of the ITU loudness meter, readers can review Appendix 1 of the standard, which describes the years of psychoacoustic testing that produced the loudness algorithm. The test data shows good correlation with listener assessment of loudness.
CONSISTENCY AND NORMALIZATION
The following illustrates how loudness metering can be applied to transmission, such as Internet streaming, to provide more consistent loudness from stream to stream, and reduce the need for dynamic compression. The histogram in Fig. 3 shows measurements of 49 public radio streams carrying NPR’s “Weekend Edition with Scott Simon.” The blue bars indicate the number of stations at various levels of loudness for a one-minute portion of the program. The largest grouping is at –18 LUFS. The gold-colored bars indicate the distribution of maximum peak levels for the same part of the program, ranging from –20 dBFS to 0 dBFS.
A common approach in audio editing is to “peak normalize” a section of audio by adjusting the overall gain of the section so that the highest peak reaches a target value. This was done for each of the 49 measurements in Fig. 4, where all share the same peak level of –2 dBFS. It is apparent that the streams do not share the same loudness, partly because of small differences from minute to minute in the program, but mostly because of differences in the audio processing for each stream. While it is understandable that audio processing may vary from stream to stream, it is apparent that setting audio transmission according to a peak target does not make the loudness compatible with other streams; aligning to a loudness target does make the streams more compatible.
The chart in Fig. 5 shows the alternate condition, in which the loudness of the streams have been aligned, or normalized, to a common target of –23 LUFS. Notice that the stream samples share a common loudness, in the blue bar, but the peak levels are distributed from –16 dBFS to 0 dBFS. While this seems contrary to belief, based on years of looking at peak meters, this effect is quite normal: As long as peak levels do not exceed full scale, it’s the loudness that we wish to target, not the signal peaks.
Because audio signals that are adjusted to a loudness target are freer to peak as dictated by their content, it’s natural to ask “What is a reasonable loudness target?” The European Broadcasting Union had the same question, and established a “PLOUD” committee to determine these parameters and develop procedures for use. Based on extensive study of programs from a range of broadcast material, PLOUD adopted a target loudness of –23 LUFS for production and transmission. (The EBU R128 standard and the ATSC A85 standard for U.S. digital television share similar values and techniques for loudness normalization.) This loudness value permits most programs with greater dynamic range and signal peaks to fit safely under the digital full-scale limit.
Loudness meters look much like peak-reading meters in use today. An example is the K-Meter, a program for Windows and Linux computers, as shown in Fig. 6: ITU loudness is indicated by the solid green bar while the momentary signal peak is shown by a single red segment. Loudness meters often color-code the segments to indicate when the –23 LUFS target is reached or exceeded. A long-term indication of loudness may help a mixing engineer or producer to align a program to the target loudness. The standalone metering system from TC Electronics, shown in Fig. 7, includes a unique “radar” display of loudness history that uses less screen space. This look back at earlier program loudness can help a mixing engineer decide if the current program is “on track” to deliver the target overall loudness. Advanced meter systems may include other information about the audio, including amplitude spectrograms and phase displays.
With the help of loudness meters, especially ones that can display a measurement log over time, consistency in loudness can be easily achieved. Fig. 8 illustrates the process, called “loudness normalization.” In this chart, the stream at the left was logged for a few minutes, producing the solid blue line for short-term loudness and the solid red line for signal peaks. It has a long-term (average) loudness, indicated by the dotted blue line, of approximately –14 LUFS at the end of the sample period.
Measurements should be taken for longer periods when the program has greater dynamic range. A second audio stream, shown at right, is logged for a similar time interval and has a long-term loudness of about –27 LUFS. A listener switching from the first to the second stream would hear a drop in loudness of approximately 13 dB.
Normalization of these two audio streams to a level of –23 LUFS, then, simply lowers the encoding gain of stream number one by 9 dB (from –14 LUFS to –23 LUFS), and raises the gain of stream number two by 3 dB (from 27 LUFS to 23 LUFS). The two streams now have a similar loudness.
It should be noted that normalization in no way dictates how one should process audio. Some engineers or programmers prize a particular “sound” resulting from particular processing; normalization just encourages agreement between the media distributors, which the data show can please listeners (or at least can diminish their annoyance). The technique is nothing more than observance of a common standard for transmission loudness. There is nothing to prevent a rogue operator from pursuing a “loudness war” on the Internet.
AUDIO DISTRIBUTION BENEFITS
Another consideration in the adoption of loudness-based metering is the appropriate target level. Based on extensive study of programs from a range of broadcast material, the EBU adopted a target loudness of –23 LUFS for production and transmission. This loudness value permits most programs with greater dynamic range and signal peaks to fit safely under the digital full-scale limit. This would be the appropriate audio loudness target for production and distribution.
An example of the alignment of the ITU BS-1770 scale, in comparison to the IEC PPM, BBC PPM and VU meter scales is shown in Fig. 9. Full scale (0 dBFS) is indicated for all by the vertical red line to the right. The ITU loudness meter reads in LU (one LU = one dB), with –23 LUFS being the target. ITU peak readings share the same scale. PPM meters typically are referenced to 9 dB below full scale, which represents maximum permissible peaks for program audio. The BBC meter uses a simple numbered scale with instructions to the user on where to peak various types of program content. The VU meter, once common in North America, has a longer response time that requires a greater back from full scale, typically 18 dB or more below digital clipping. In digital audio systems, Reference Level is commonly a 1 kHz tone at –20 dBFS, which conveniently displays at –20 LUFS on the loudness scale.
To stream audio, some other factors should be considered in choosing a target loudness value. For example, streams are all-digital, from the encoder to the consumer: The HE-AAC stream operated for tests by NPR Labs has a measured dynamic range over the Internet of 96 dBA, which is possible with a 16-bit audio system. This is equivalent to the performance of most audio production centers, so there is no technical need to compress or limit the dynamic range for Internet distribution. However, the sound chips in computers, smart phones and tablets are not quite that good, and certainly the acoustic power range of amplifiers and speakers limit the practical range of reproduction, although their capability is far greater than most commercially-produced music at present.
Another consideration is keeping up with the loudness of other content: The average loudness of commercially-produced popular music is substantially higher than –23 LUFS, due to processing in mixing and production. A study of material on Apple iTunes by mastering engineer Bob Katz found an average loudness of –16.5 LUFS with a variation of only a few LU. Katz indicated that by normalizing commercial music to that loudness target “the debilitating loudness war has finally been won.” (While achieving this value of loudness requires dynamic processing, Katz reasons that there is no advantage to use excessive dynamic compression, thereby allowing producers to “turn down the volume” on their processing and deliver more dynamics.)
Beside the output range capability of devices, such as smart phones, tablets and car audio systems, the loudness necessary to overcome background noise is a factor. As noted above, however, additional audio processing is simply unneeded with commercial music. Fortunately, the ITU loudness meter includes the objective LRA tool to determine whether the loudness range is already sufficiently limited. As the meter comes into increasing use, we can expect that engineers and producers will rely on it to provide guidance on whether more dynamic processing is needed or “just fine as it is.”
To address the issues of loudness matching, optimal target level and loudness range for consumer audio content, especially Internet streams, the Consumer Electronics Association has established a new “Audio Metrics” working group, R03WG15, sponsored by the R03 Audio Systems Committee, to evaluate techniques for improving listener satisfaction related to loudness.
The author is the chair of this group, and invites readers to follow the working group’s progress and comment on their experiences.
If you want to add your comments to the loudness discussion, email John Kean at firstname.lastname@example.org and include “Loudness” in the subject line.”