The author is senior technologist, NPR Labs – National Public Radio.
What is the best digital audio codec, and what is the optimum bitrate for Internet streaming? That was the question posed by NPR Digital Media, which funded an extensive study by NPR Labs to answer these questions impartially.
While the study was conducted for public radio, the premise and conclusions may be helpful to commercial broadcasters that stream audio as well.
The search for answers was more involved than one might expect, and led to other related investigations, such as the reliability of mobile wireless media, the availability of decoders in consumer equipment and the consistency of loudness from stream to stream. (This last consideration will be the focus on a subsequent article here.)
Internet streaming and related media, such as host websites, mobile applications, digital car products, social tools and API distribution, are the fastest-growing outlets for public radio. At any given moment, there are more than 37,000 average active sessions that are listening to streams from public radio stations and NPR; and that’s up 14 percent over last year.
Until recently, stations — and even NPR itself — used a variety of codecs for streams and podcasts: MP3, LAME, AAC, Ogg, etc., and even more choices of bitrates, ranging from 24 kilobits per second to 160 kbps. Finding a common codec would provide a more uniform quality for listeners and help ensure that the services are uniformly available to listeners on a range of playback equipment, such as smartphones, tablets, WiFi “radios” and personal computers.
Digital audio codecs make Internet streaming commercially feasible by vastly reducing the bitrate required for the audio without noticeably reducing the audio quality. Minimizing the bitrate has direct benefits for the listeners, because lower stream bitrates:
● Start faster when a stream is selected, much like a radio plays as soon as tuned in to a station;
● Restart audio faster after a dropout occurs, and
● Save potential data charges for the listener.
Experience tells us that the flow of data in mobile wireless networks is highly variable: Data capacity in a geographic area can drop, even halt during peak usage hours, and mobile handoff from cell to cell can interrupt the flow of data. These variations can affect the audio continuity in real-time streaming. We expected that lower bitrates would improve reliability (due to fewer dropouts) in mobile wireless networks.
To learn the relationship between data rate and reliability, we built a computer logging system that could record stream reliability from multiple smart phones while traveling in a vehicle. Fig. 1 shows an example of a route from northern Virginia into Washington, with dots marked in green (service) and white (no service) for one stream. A chart indicates several dropouts of varying duration along the route.
We determined at what bitrate the chosen codec must operate while providing high-performance audio:
● Reliability decreased at rates above approximately 40 kbps, as shown in the simplified graph of Fig. 2;
● At bitrates below approximately 40 kbps reliability tended not to improve, probably because of RF signal dropouts, cell handoff issues and capacity shortages in high-use areas during peak hours
The results indicated that a “sweet spot” exists in the 40–50 kbps range, where the stream reliability and compressed audio quality is optimal.
SIX CODEC COMPARISONS
NPR Labs began testing with a comparison of six codecs by “well-informed listeners.” The purpose of this pre-test was to critically evaluate these codecs and identify two that would be presented to consumers in a final round of audio quality testing, where more time could be given to detailed comparisons, such as the impact of bitrate on perceived quality. The first round tested:
● MPEG-2 Layer III (“MP3”) — the legacy codec from the early 1990s
● LAME — a free software encoder compatible with MP3 playback
● AAC-LC (Low Complexity) — successor to MP3, used by Apple iTunes
● High-Efficiency AAC (“HE-AAC” or “AAC+”) — adds spectral band replication for quality similar to AAC at lower bitrates
● G.722.2 (“AMR-WB+”) — an ITU standard vocoder, enhanced for hi-fi speech and music
● Extended HE-AAC (“xHE-AAC”) — combines an enhanced vocoder with HE-AAC (including parametric stereo) that adapts to the program signal
This test used high-quality headphones (Sennheiser HD-600) and listening was done in quiet environments. Four women and seven men, all NPR staff or associated with NPR affiliates, listened to audio samples of all six codecs at a time, in random orders, ranking each on a sale of zero to 100. The preliminary round of testing is summarized in Fig. 3.
With these listeners, all of the codecs performed well at 96 kbps. For this combined genre, the xHE-AAC was on top, while LAME was at the bottom. (HE-AAC was not included at this rate as it exceeds the intended bitrate. At this high rate, its performance would be equivalent to AAC-LC.) The positions of xHE-AAC and LAME remained at top and bottom, respectively, at successively lower bitrates. HE-AAC tied the top position at 48 and 32 kbps, but dropped slightly below xHE-AAC at 24 and 16 kbps. MP3 declined consistently, getting relatively low ratings below 64 kbps.
While xHE-AAC was a consistent winner, it is not yet widely available in consumer devices — an important consideration for a replacement of the legacy MP3 codec. The HE-AAC codec is almost universally available in portable devices operated by iOS and Android (at least since version 3.1 “Honeycomb,” according to the developer). While older Android devices may not have HE-AAC on board, these devices are now a declining group, as consumers replace their devices with newer products.
To consider what xHE-AAC might do for listeners in the future, the consumer test included both xHE-AAC (also known as the Universal Speech and Audio Codec or “USAC”) and HE-AAC. The test results, summarized in Fig. 4, include MP3 and AAC-LC at 128 kbps, as high references, and MP3 at 16 and 24 kbps as a low reference.
As expected, the chart shows AAC-LC doing the best, at 128 kbps, followed closely by HE-AAC at 96 kbps, which holds up well at 64 and 48 kbps. The score by USAC (xHE-AAC) shows its advancement in technology, over HE-AAC, at 32 kbps and continues to receive a near-“good” score at 16 kbps!
The results confirm NPR’s recommendation of HE-AAC for all live streaming, using 48 kbps in the “sweet spot” of optimum combination of reliability and quality.
We also suggested that where bitrate is not an issue, such as podcasting and file transfer, that 64 kbps provides a slight improvement. At both bitrates, we consider the efficiency high enough to rely on HE-AAC without the parametric stereo feature (which adds a “v2” designation).
Critical listening tests at NPR Labs determined that the artifacts of parametric stereo, such as a slight positional blurring and instability, do not outweigh the potential reduction in compression artifacts, which listeners indicated are already quite low at 48 and 64 kbps.
The study discovered another issue for Internet streaming: consistency of loudness from stream to stream. This led us into an investigation of the causes and potential solutions, which we will address next time.
Comment on this or any story. Email [email protected], with “Letter to the Editor” in the subject field.