AES Releases First Guidelines on Loudness for Streaming

Recognizing the importance of Internet streaming services, the Audio Engineering Society introduced its first Guideline Recommendations for Internet audio streaming or file playback over the Internet at the 139th International AES Convention in New York this fall. The guidelines were developed this year by a technical committee of audio engineers, content distributors and manufacturers to address inconsistent audio levels between services, head off a growing loudness war that degrades quality and improve the experience for millions of online listeners.

The document, titled Recommendation for Loudness of Audio Streaming and Network File Playback, was introduced in one of the most popular sessions in the Broadcast and Streaming Media program series at the convention. Bob Katz, a widely-known recording engineer, chaired the committee and the panel of authors who discussed the guidelines: Rob Byers of American Public Media, John Kean (formerly) of NPR Labs, Thomas Lund of Genelec Oy, Scott Norcross of Dolby Laboratories and Adrian Wisbey of BBC Digital Media Services. Other authors of the final document were audio consultant James Johnston and Bob Orban of Orban USA.

  • “Loudness” is the listener’s perception of audio volume, as defined by the ITU-R standard BS.1170-3, in Loudness Units Relative to Digital Full Scale (LUFS), where 1 LU represents a change of 1 dB;
  • Audio is measured over many hours to characterize the overall content loudness;
  • “Target Loudness” is introduced, which represents the intended loudness of a stream;
  •   ○ It must not exceed −16 LUFS, to avoid excessive peak limiting and allow a higher dynamic range in a program stream;
      ○ It should not be lower than −20 LUFS, to improve the audibility of streams on mobile devices;
  • Short-form programming, such as commercials, are dealt with separately, to ensure that they sound consistent with program loudness;
  •   ○ This content is measured using Short-Term loudness integration, having a time constant of 3 seconds;
      ○ The Short-Term loudness may not be more than 5 LU higher than the Target Loudness;
  • True Peaks (TP) are measured by an inter-sample estimate of the digital signal;
  •   ○ Encoded peaks should not exceed –1.0 dBTP when using lossy codecs such as MP3 or HE-AAC;
      ○ A larger back-off may be needed, depending on the combination of codec and bitrate.

    One could argue that there is a beneficial bias toward audio quality in the guidelines. Allowing the Target Loudness to be –20 LUFS allows some producers and distributors to deliver very open content with exciting dynamics, nearly as good as can be delivered on compact discs. The high end of the range is actually quite aggressive. In opting for loudness, one should note that because of the statistical distribution of peaks, the amount of processing required rises exponentially across the –20 LUFS to –16 LUFS range. If one starts from high-quality (dynamic) content, a noticeable loss of dynamic impact results from the processing to achieve the high target. While there is a large difference in processing from low to high targets, the range spans only 4 LU. Psychoacoustic studies considered by the subcommittee indicate that 4 LU is the largest shift that listeners will tolerate before they turn off or change the stream.

    It is understandable for providers of online content and streams to ask, “How do I go about complying with your recommendations?” The answer depends on what you are streaming, and what your goals are for streaming.

    In the first case, if you are streaming popular music, your content is recorded, and the audio (per music industry custom) has already been processed. The Integrated Loudness of a song may already be –13 to as high as –6 LUFS, and peaks may frequently reach as high as full scale clipping. This results in peak-to-loudness ratios ranging from 12 dB to 6 dB. Preparing this content for encoding and streaming may only require reducing the gain to achieve a Target Loudness between –20 and –16 LUFS. Observing the signal in an editor’s linear waveform view may look like the following, in which the Michel Camilo Latin jazz song was normalized to –18 LUFS.

    In this example, no other processing is needed; the peaks are mostly 3 to 4 dB below digital full scale. If one is distributing this kind of content, meeting a Target Loudness of –18 would require nothing more than normalizing the audio files in advance to a common target. The material might initially be normalized to –23 LUFS, to match the level of other production material, networks, etc. If so, a 5 dB gain would raise the content to –18 LUFS for encoding.

    Fig. 1: Here, Latin jazz pianist Michel Camilo’s 1994 song “One More Once” has been normalized to –18 LUFS.

    A similar solution was recognized by the BBC, in studying the content of its Radio 3 fine arts service. According to Adrian Wisbey, the loudness, over a recent 24-hour period, averaged –26.3 LUFS, and the maximum peak level was –0.1 dBFS. However, in that time 21,364 peaks exceeded –7 dBFS, but each has a duration of less than 300 microseconds, or 0.0005 percent of the time. They conclude that Radio 3’s audio can be increased by 6 dB, to achieve –20 LUFS with nothing more than a “good peak limiter.”

    This probably seems too simple, and it may be, if you distribute other content such as news and talk programs. High-quality studio speech has high peak-to-loudness ratio, which can vary dramatically. The waveform in Fig. 2 is from 6 minutes of an NPR newsmagazine show, which was normalized to a Target Loudness of –23 LUFS.

    Fig. 2: An NPR newsmagazine show normalized to a Target Loudness of –23 LUFS.

    The peaks reach nearly as high as the popular music normalized to –18 LUFS (within 1 dB), but the average loudness is approximately 6 LU lower. This content needs some dynamic control to allow an increase without suffering peak overload. One approach would use a combination of dynamic range compression and safety (peak) limiting. In Fig. 3, the original news-magazine audio is processed by a simple compressor having a threshold of –18 dB, a compression ratio of 3:1 and output gain of 6 dB.

    Fig. 3: The original newsmagazine audio is shown after processing by a simple compressor, set to a –18 dB threshold, a compression ratio of 3:1 and output gain of 6 dB.

    The result is raised to a Target Loudness of –17.6 dB for encoding. A few peaks rise within 1 dB of full scale, and with a low (3:1) compression ratio, some parts of a program could go higher and cause audible clipping. Fig. 4 shows the material after it is passed next through a look-ahead fast limiter having a high (>100:1) compression ratio and a threshold of –1 dB.

    Fig. 4: Here is the original newsmagazine audio after processing by a look-ahead fast limiter having a high (>100:1) compression ratio and a threshold of –1 dB.

    Fig. 5: The signal peaks illustrate what happened to the newsmagazine audio before and after the audio processing.
    Click To Enlarge
    The waveform is almost noticeably (and inaudibly) changed, and the Integrated Loudness of this example remains –17.6 LUFS. A chart of the signal peaks illustrate what happened to the audio before and after the audio processing is shown in Fig. 5.

    If content is what I call “moderated speech” (representing the in-studio voice of newscasters), and mixed with some care using an ITU loudness meter, this simple processing scheme is all that is needed to adjust the programming to a target toward the middle of the Target Loudness range recommended in the guidelines. Note that the processing is not “loudness aware” and that Integrated Loudness over shorter periods may drift above and below a long-term average. This may be due to changes in the spectral balance of the audio, or changes in the peak-to-loudness ratio of the material.


    Fig. 6: “One More Once” before and after processing.
    Click To Enlarge
    Large changes in the type of content, such as often happens with eclectic public radio programming, may cause wider shifts in Integrated Loudness, especially from program to program. Our popular music sample used earlier can illustrate this effect, as shown in Fig. 6. Using the same processing setup that was applied to our newsmagazine show we start with the music normalized to –23 LUFS, as shown by the lower red line in the chart. The signal peaks, shown by the thinner red line, above, reach approximately –8 dBFS.

    After our demonstration processing, the peaks are safely held to –1 dBFS, but the Integrated Loudness rises higher, ending at –14.7 LUFS. This is approximately 3 dB higher than the –17.6 LUFS that the same chain achieved with the newsmagazine content. If music and speech were mixed in the same program, this processing might result in unexpected imbalances in loudness between the program parts.

    The differences in loudness from audio processing depend on internal factors, as well. For example, attack and release times, compression ratio and gating of audio processing can change the resulting loudness; your mileage may vary! Committee member Bob Orban notes that the multiband compressors, such as in his Optimod processor, may provide more natural inter-genre balances than does loudness-normalizing each generic segment to the same BS.1770 loudness. That is possible because multiband processing is effectively leveling audio within discrete frequency bands, which results in a more constant spectral distribution. This, of course, is at the expense of allowing the spectrum to vary as it was produced, but some feel that contributes to a “signature sound” that is preferred.

    My experience with Loudness Targets in the range recommended by the guidelines is that only small amounts of processing are needed to comply, and the generous peak-to-loudness ratio (up to 19 dB) allows the use of limiting only infrequently. At NPR Labs, a stream with live newsmagazine content has operated for more than a year with only a gated, windowed AGC using around 4 dB of gain reduction and a safety limiter. That has resulted in levels consistently between –16.5 and –18.5 LUFS, averaging over periods of 15 minutes or more.

    Contributors to “Recommendation for Loudness of Audio Streaming and Network File Playback”

    Editor: Bob Katz

    Writers: Rob Byers, James Johnston, John Kean, Thomas Lund, Robert Orban, Adrian Wisbey

    Additional study group members: David Bialik, Frank Foti, Alex Kosiorek, Fabian Kuech, Skip Pizzi, Ian Shepherd, Jim Starzynski

    The recommendation document can be read online at

    Considering that nearly all online content is based on 16-bit linear PCM digital audio, we have an opportunity to deliver an open, dramatic sound quality. Allowing for Target Loudness up to –16 LUFS, and some will opt for that, should overcome the SNR deficiencies in all players. Poor players are a thing of the past — nearly all smartphones, tablets and computers now have D/A (digital-to-analog) converters that can reproduce sound as good as the distributors can deliver. The new guidelines were intended to support the quality that many are producing in the studio, and the technical quality that the Internet can deliver to current-day mobile devices.

    John Kean is former senior technologist for NPR Labs. He joined NPR in 1980, working on technology projects and FCC regulatory activities. In 1986 he joined Jules Cohen & Associates as a consultant, followed by 15 years with Moffet, Larson & Johnson. He returned to NPR in 2004 to help establish NPR Labs. He is a past president of the IEEE Broadcast Symposium and Washington DC Section of the AES; contributing author to the NAB Engineering Handbook, editions 7, 8 and 9; and a recipient of the APRE Engineering Achievement Award. He is establishing a private consulting engineering practice for broadcast and audio clients.

    Receive regular news and technology updates. Sign up for our free newsletter here.

    Share This Post