The Developing MPEG Spatial Coding Specification Attempts to Add 5.1 Surround to a Digital Stereo Channel
Unlike the digital transition in television world, the audio industry’s has been almost exclusively focused on stereo distribution. On the air, online and on the store shelf, practically all digital audio media available are either monaural or stereophonic forms. The lone exceptions have been an occasional surround broadcast on terrestrial or satellite radio, and the relatively limited catalog of releases on the DVD-A and SACD formats.
Yet surround sound is a hot property in the consumer electronics business today. Of course, its great success in the home theater sector has been driven almost entirely by video content, particularly movies (and more recently, video games), where a 5.1 soundtrack has become the norm.
Meanwhile, the other main locale where surround sound reproduction systems have begun to proliferate is in the car – which, for the most part, is intrinsically an audio-only environment. In the automotive environment, however, the surround content to date has been largely “virtual” or “derived,” meaning that it is a single-ended, pseudo-surround system based on effects created in the receiver by manipulating the difference signal between left and right stereo channels.
In some cases this derived effect is reproduced through properly placed, distributed multichannel speaker systems, while in others the derived effect is “acoustically virtual” as well, i.e., reproduced with as few as two speakers. The latter approach is far more sensitive to the listener’s position, while the former presents a relatively stable image anywhere in the listening area. Depending on the constraints of the environment, either approach can be effective, but the multi-speaker method is generally preferred.
Consider also that the packaged multichannel audio releases mentioned above generally carry both a 5.1 and a separate stereo version of the same material – a marked difference from the video/TV world, where a single compatible multichannel mix serves 5.1, stereo and mono receivers. This duality for audio content products hearkens back to the early days of stereo LPs, when separate mono and stereo discs were produced, each containing different mixes of the same material.
So within this diverse context comes the challenge of extending multichannel sound to become the norm (or at least more commonplace) in audio-only systems. A key enabler missing from this milieu is the existence of a digital audio format that provides plentiful, real multichannel-capable content in a mode satisfying all applications, from the high-end home theater to the clock radio. This format should also be quite spectrum-efficient, allowing it to be applied to mediums that previously considered only stereo delivery.
Thus there is a need for content that can be considered compatible to all such listening formats, and a delivery system that addresses these in a compatible and efficient manner. Such is the genesis of MPEG Spatial Coding, a format that is making its way toward standardization at present. It attempts to carry both surround and stereo content in a spectrally efficient, backward-compatible way. Too much to ask, you say? Read on.
To understand how Spatial Coding works, let’s look back a few years to the development of Spectral Band Replication (SBR), which was a technique designed to improve the spectral efficiency of high-fidelity audio coding. This was the first commercial implementation of a technique now referred to generically as parametric coding, meaning that auxiliary data is added to coded (i.e., compressed) audio data to provide instruction to the decoder on how to enhance the quality of the decoded audio. This implies that the stored or transmitted signal includes both coded audio packets and dynamic instructions on what the decoder should do with the audio during/after decoding.
Of course, within a fixed-bandwidth channel, a small amount of the channel’s bit rate is required for the instructional data, so fewer bits are available for audio coding. Therefore the parametric data has to “earn its keep” by more than making up for the bits it “steals” from the channel.
SBR uses parametric data to extend high-frequency response by adding a small amount of data that describes the high-frequency spectrum characteristics of the encoded audio signal. The SBR decoder applies these instructions during its decoding of the compressed audio data, and thereby extends the audio bandwidth of the decoded signal. The parametric approach allows this to be done at a lower overall data rate than the same codec (without SBR) would have required to pass an audio signal with equivalent bandwidth.
Theoretically, parametric data could be retrofitted to any existing codec to provide backward-compatible performance extension. This means that ideally, the parametric data is added to the encoded audio signal in such a way that legacy decoders lacking the ability to interpret the dynamic instructions simply ignore them and decode the coded audio alone as they always would. Meanwhile, new decoders utilize the parametric data and improve the quality of that same coded audio.
In the case of SBR, commercial implementations have been made by Coding Technologies that retrospectively extend the high-frequency response of MP3 and AAC codecs. With the SBR additions, these codecs are called MP3-Pro and AAC+. The latter has subsequently been standardized under MPEG-4 as High-Efficiency AAC (HE-AAC).
Enter Spatial Coding
Now a similar parametric coding technique is being applied to surround sound. In this case, instead of backward-compatible bandwidth extension, the parametric data is used to add multichannel steering information to coded stereo audio, in the MPEG Spatial Coding format currently under development. Again, Coding Technologies is a key developer, this time working with Philips, but MPEG Spatial Coding also includes developments from Fraunhofer IIS and Agere Systems. (For those keeping score, the joint CT/Philips proposal and the joint Fraunhofer/Agere proposal were selected as the most promising candidates received in an ISO/MPEG call for contributions. The MPEG process has since converged the two systems into a format now referred to as MPEG Spatial Coding Reference Model 0 [RM0].)
Under rigorous listening tests, RM0 has proven to be as good or better than either of the two original proposals, across various codecs and at various bit rates. The bit rate applied to the parametric steering data has also been varied in these tests, and results have shown good performance using as little as 5 kbps of steering data (regardless of the audio codec used and its bit rate).
Next time we will conclude this examination with a look under the hood of the current Spatial Coding system’s interesting design, and some boundaries on its operation.