Return to Omnia hompepage

Audio Processing  For DAB and the Internet: How Audio Quality and Intelligibility Can Be Improved In the Data Reduced Environment, Using Dynamics Control

By Frank Foti, Omnia Audio
Cleveland, Ohio (c) 1999

Abstract
Digital Audio Broadcasting (DAB) and the Internet are the latest broadcast media that exist today. Data bandwidth capacity dictates various degrees of quality that is possible. DAB has the potential to deliver ‘near CD’ quality, while the Internet provides quality ranging from ‘good AM’ to ‘near FM like’. In each situation, some degree of audio data reduction must be applied. Use of dynamics processing can be employed as a tool to "predict" when audio conditions occur that can degrade signal quality. Utilizing a unique analysis algorithm that is modeled around the masking curve of a coding application, the processing adjusts the audio signal on a frequency dependent basis to reduce artifacts. This is especially noticeable whenever very low bitrate coding is applied.

Audio Processing: "The Tool"
Is audio processing needed for the new transmission media like DAB, DTV, and the Internet? Audio purists will claim, that there is no need to create or clone the sound of FM on these new mediums. In a literal sense they are correct! But the reality is that we do, in fact, need audio processing as a tool so that these new mediums can be utilized with sonic efficiency and maximum intelligibility. Depending upon the method or system, one of the above mentions items will benefit from a processing tool.

When most people think of broadcast audio processing, they usually imagine the method used in FM and AM radio, where it’s employed to create a ‘dial presence.’ In those instances it’s a protection device for bandwidth control and to guard against overmodulation. In addition, it’s a programming ‘statement’ that creates a sonic signature for the listener. For this presentation however, we will refer to processing in a different context. Here it will be discussed as a method that will improve the operation of the data compression required in these new mediums.

Coding Is Processing Too
When audio coding is discussed, it’s usually thought of in terms of what is actually transpiring: the reduction of audio data, using an algorithm that’s designed to operate within a specified medium or data bandwidth. When viewed at a bit more closely, it’s also audio processing! Consider for a moment, in the following nutshell overview, what is occurring in this process:

  • The audio signal is divided up using a filterbank.

  • Analysis of the audio signal is applied to create a masking curve.

  • The masker signal will be adjusted and moved over the spread of filterbank outputs to select the most dominant signal, and remove undesired spectrum.

  • The remaining signal is then coded.

Each of these functions is an element of signal processing. In some instances, there are similarities between what transpires in a dynamics based audio processor and an audio coder. What this discussion will reveal is that dynamics based processing can operate in tandem with data reduction coding and thus create a transmission method that works together as a complete coupled system. The benefits of this coupling is better sounding audio through the coded system.

Quick Codec Review
A technique that is very popular is the use of the audio codec with transmission systems. These devices make use of "lossy" data reduction algorithms to compress the bitrate down to a size that will fit within the existing bandwidth of the system. While there are a number of specific algorithms to choose from, most have employed ISO/MPEG Layer-II, ISO/MPEG Layer-III, apt-x, and Dolby AC-2, and now AAC.

The basic operation of the "lossy" data reduction system stems from the use of a technique known as perceptual coding. Simply stated, the basic principle relies on a masking signal, or masker, that exists around a threshold curve which happens to follow that of the human auditory system. Any signal which falls below this threshold curve is basically discarded. In the digital domain, any audio data that would fall below the threshold curve is data that is then discarded by the algorithm, and thus data reduction is accomplished.

Detailed operation of the above mentioned algorithms is not needed for this discussion, as the intent will be to focus upon the effects that dynamics processing has upon data reduced audio. Suffice it to say that each system does possess many strengths and possible weaknesses for their application. It is not the intent of this discussion to compare audio coding algorithms.

Codec Transmissions, The Caveats
All audio coding methods have strengths and weaknesses. In almost all cases as bitrate is reduced, audio quality degrades due to less data bandwidth. However, it should be pointed out that some of the most recent demonstrations of AAC at lower bitrates is quite impressive!

Depending upon the transmission medium DAB, DTV, or Webcasting the audio quality of the codec will be determined by two issues: the coding algorithm and amount of data reduction. For DAB and DTV, the bitrates generally are at higher levels, usually 192kbps or greater for stereo. At these rates, the coding process affords wide audio bandwidth (20kHz) and contains a small amount of artifacts (near CD quality). But in the case of webcasting, where bitrates of 28kbps might be required, the coding artifacts and bandwidth restrictions are quite severe (AM radio-like). Audio bandwidth is sometimes reduced to 4kHz. At lower bitrates, coding artifacts differ considerably with each of the data reduction algorithms.

Given this wide range of diversity, how does processing fit in to all of this, especially given the view that transmission system processing for FM Stereo and codec STL systems do not mix that well1! This can be answered with two points:

  • FM Stereo transmission processors employ hard limiters, or clippers, to achieve absolute peak control. It has been shown through testing and research that codecs do not perform well when a transmission processor is operated through the codec. The audio quality suffers from added distortion generated by the coding method used in the STL. Precise peak control is lost as overshoots are generated by the lossy data compression of the codec. This is the result of the preemphasis and clipper functions in the processor. Thus, in most applications of a coded STL and processor, the processor needs to be installed after the STL system to avoid the previous stated problems. In essence, the problems stem from two difficulties that occur as the harmonic content of the clipper becomes displaced by the coding algorithm and preemphasis (50µs or 75µs) hinders the masking curve of the codec from operating in an efficient manner. Coding algorithms were not designed to operate on emphasized audio, as this increases the audibility of coding artifacts. This is what contributes to lost peak control, and adds further Total Harmonic Distortion (THD) to the sonic quality of the signal.
     

  • While the above comments do not bode well for processing and coding, it must be pointed out that this is true when the use of FM transmission processing is employed with a codec. As with that type of system, where processing is designed to fulfill the specific needs of the technology and augment its performance, the same thing must be done for coded systems too. Again, using the model of traditional broadcasting, we can apply the same thinking to coded transmission. As FM and AM require different processing methods, the same holds true for coded systems. Here the differences in processing will be determined by audio bandwidth, bitrate, and coding algorithms. Therefore the use for processing in this environment is dictated by the support requirements of the codec system and its attributes, just as it is in conventional analog broadcasting. What follows are some of the issues and features that a processor for coded transmission must recognize.

Digital Transmission Landscape
Before specifying the attributes of a processing system for DAB, Internet, or DTV, the landscape for these mediums need to be defined and understood, as there are certain aspects that differ from the conventional analog methods. These deal with algorithm, bitrate, sampling rate, audio spectral bandwidth, digital full scale, and metadata. Processing for the digital mediums is highly dependent on how each of these issues are dealt with.

Algorithm
Definitely the most debatable item, and quite important, the choice of coding algorithm will play a significant role in the overall sonic performance of the transmission system. In most of the digital mediums, the algorithm of choice has already been made. Then the issue is to understand the usage of the specific algorithm in order to achieve best sonic performance. For this discussion, it is not necessary to delve into the aspects of each coding method. What’s important is to know and understand those aspects of the algorithm for the medium used.

Bitrate
Almost as important as the choice of algorithm is the operating bitrate of the system. This will be determined by the available data bandwidth. This can range from as low as 24kbps for lower grade narrow range monophonic audio to greater than 384kbps for full range CD-like stereo or multichannel sound. As bitrate is reduced, the coding algorithm must operate more aggressively in reducing the amount of data pass-through. This will affect the available audio bandwidth and sonic quality of the audio. Generally speaking, as bitrate is reduced, both bandwidth and audio quality degrade. The amount of degradation will be different based upon the algorithm employed.

Spectrum
Each algorithm and bitrate will have a direct effect upon the available amount of audio bandwidth that the system can transport. As stated above, when bitrate is reduced, sampling rate and bandwidth follows suit. Therefor, it is crucial that any processing system be capable of managing the audio bandwidth, as this will have a direct effect on sonic performance at lower bitrates. More on this topic is covered later in this paper.

While on this topic, an item should be pointed out: preemphasis is not required in these systems. This alone will provide a major sonic improvement when compared to analog FM systems. The digital mediums all utilize a flat spectrum, and this negates the need for specialized high frequency control methods that operate around the emphasis networks. Later however, we will see how spectrum management through dynamics control will improve an encoder’s efficiency.

Full Scale
All digital transmission systems have one important element in common. They have a specified maximum word size. In other words, they have a peak ceiling level that can not be exceeded! Overmodulation is not possible; exceeding full scale (0dBfs) is a nasty sounding type of distortion. Any signal processor for digital mediums must perform absolute peak control, yet eliminate the unwanted coding artifacts that would occur if a clipper was employed, as described earlier.

Metadata
Digital delivery systems usually provide some means of carrying ancillary information about the signal content, or attributes of the content. This is a function that metadata can provide. Already employed in the DTV standard, metadata has the capability to carry information about the dynamic content of the signal at the transmission point, and then apply that knowledge to corresponding functions in the receiver. In this manner, audio processing effects can be implemented in the system, and then each individual receiver can use this in whatever manner they choose. By example, the transmission may contain metadata information about how much dynamic level compression to employ, and then the end-user can allow that compression to be applied to the signal, or eliminated, and make use of the wider dynamic range.

The Dolby Digital system for DTV2 provides numerous functions that take advantage of metadata. These include a dialog normalization setting for loudness control, dynamic range compression, and emergency notification purposes. Some of the proposed DAB systems provide space for metadata.

From the above definitions, and functions it can be seen why a conventional FM or AM transmission processor is not applicable for the digital mediums. A simple, off-the-shelf, compressor/limiter is not the answer either, as these units only provide a generalized form of dynamics control, and are usually wideband in nature. Just as these units will not suffice in the conventional analog transmission applications, because they lack specified functions for the medium, the same analogy exists for the digital systems, too.

Processing For Digital Delivery
What is required of an audio processor for digital transmission are three specific items, and a possible fourth should be considered:

  • Precise peak control so that 0dBfs is maintained without system distortion.

  • A peak control method that does not exaggerate coding artifacts.
    (This also holds true for any dynamics control and equalization methods.)

  • Audio frequency response that is controlled within the bandwidth limits of the system.

  • In low bitrate systems, processing should enhance intelligibility.

The following sections describe methods and means of accomplishing the above requirements. These are based upon a digital processing system that was designed specifically for digital transmission mediums. What’s presented here is a global view of the implemented functions, as this provides an easier understanding of the concepts involved.

Absolute Peak Control
The simplest form of peak control is the hard limiter, or clipper, as it’s quite often referred to. In the coded digital environment, the use of a clipper can cause three sonic hardships: harmonic distortion from the truncation action of the clipper, exaggerated coding artifacts of the data reduction algorithm, and clipper induced aliasing distortion, aka digital grunge, that results from clipper harmonics that try and exceed the Nyquist frequency of the system.3 Therefore, a better method must be employed that controls peak levels with precision and does not generate any of the three aforementioned problems.

Look-Ahead Limiting

A Look-Ahead, or Delay-Line limiter is perfect for this operation. Basically, this limiter creates a gain control signal based upon the absolute peak value of the audio signal, except that while the peak level is being calculated, the audio signal is physically delayed by an amount of time, equal to the time needed to calculate the peak value. Once the control signal is ready to implement level adjustment, the audio is then sent ahead to the control element at the exact moment that the control word arrives to make the adjustment4. In this manner, absolute peaks can be controlled without the need to truncate the excursion, as a clipper would do. Figure-1 is a block diagram of a look-ahead limiter.


Figure-1

This results in little or no harmonic distortion generated by the limiter function. The caveat to this method is that instead of harmonic distortion, Intermodulation Distortion (IMD) can result, but that amount is dependent on the design of the look-ahead algorithm. In addition, there will be some amount of latency as the audio is delayed by a specified amount; it can be up to a few milliseconds. This could make it a bit problematic when trying to monitor oneself off-the-air, except all of the digital mediums generate some amount of latency. So the use of look-ahead limiting is not a problem here.

This is not a new concept, as this method has been utilized before in other applications. Generally, it has been employed in a wideband mode which can exaggerate IMD products. That’s one of the main reasons why it was never popular in conventional broadcasting, along with the latency issue.

New research has revealed some fresh methods to implement look-ahead limiting so that IMD can be minimized, or suppressed completely. The sonic result is absolute peak control that yields a very high degree of fidelity when peak control is performed.

Prediction Analysis
In addition to look-ahead limiting, another new processing function that will aid the operation of the ensuing data encoder is a method known as Prediction Analysis. Its operation, much like the aforementioned limiter, will analyze signal information based upon peak level and frequency content as it relates to the coding process. The resulting analyzed information is either added or subtracted from the control signal of the final limiter based upon the prediction model that is designed around coding algorithms. The prediction model takes into consideration certain frequency and dynamics conditions that can agitate the encoder and generate codec artifacts. With Prediction Analysis, the limiter is able to allow the following encoder to operate more efficiently as it reduces coding artifacts.

Multiband Dynamics Control
When lower bitrates are used, intelligibility and overall quality is a problem. Some coding algorithms provide audio quality that has been described as sounding like a ‘bad cassette’ recording. Voice is muffled, music sounds thin and lifeless. Inserting a graphic equalizer is not the answer, as it will provide inconsistent adjustment on a source-to-source basis.

A multitband dynamics control section is the answer for these situations. It provides three key functions:

  • It can be setup and adjusted for consistent source-to-source consistency in sound.

  • The action of the frequency bands can be optimized to enhance voice intelligibility.

  • The upper bands can assist the final limiter, and further improve coding efficiency.

The following block diagram, Figure-2, provides an overview of a processing system for digital transmissions.


Figure-2

As with the case of conventional multiband processing, effects EQ can be inserted before the cross-over section. Here is where a gentle boost in the midrange or presence frequencies will assist in enhancing intelligibility in low bitrate systems. In addition, an adjustable cut-off low pass filter can be employed after the final limiter so that audio bandwidth control is provided. Due to the low harmonic content of the look-ahead limiter, the low pass filter will not generate any system overshoots. Using a low pass filter to remove any audio spectrum that will not be encoded further reduces the shrillness associated with low bitrate transmissions whenever too much high frequency content is presented to the encoder. It is desirable to know what the audio bandwidth limits of the coded system are, and then set the processor low pass filter accordingly.

It should be easily seen that each of the aforementioned items operates in a much different manner than the conventional FM/AM audio processor. Here is where the system is employed as a ‘tool’ instead of an effects box that’s trying to create the threshold of pain on the dial! Audio processing for digital delivery will improve the overall performance of any system. Each coding algorithm has its own set of artifacts. Processing can be used to minimize, or eliminate those attributes. Additionally, in the new DTV system, it can also be used to write and implement metadata. Should that method cross over into DAB, it can be done there, as well. Here’s a case where the use of signal processing is expanded from the conventional model that’s employed today in analog transmission services.

Loudness Wars, And A Final Thought...

Speaking of loudness, will DAB and Netcasters have loudness wars? Chances are some services will be concerned with competitive quality and density when compared to another. The issue will probably never end, as it just migrates onto other services. (Light humor intended!) At least loudness through overmodulation is not possible in the digital mediums! Although audio purists may scoff at the thought of a "Hot-Rockin’ Flame-Throwin" digital signal, it should ease the mind that processing for DAB does not involve the extreme amounts of hard processing that’s available in FM/AM broadcasting. The end result, even processed, will be superior sound, as the use of preemphasis and clipping will be eliminated.

The new digital transmission mediums create a plethora of opportunity for content providers. Therefore, the demands that these audio signals will put on the chosen coding methods will put them to the test. Hence, the employment of audio processing that suits the medium can only improve the overall end result. In this case, the analogy to the conventional processing system for FM/AM broadcasting holds true.

As DAB, DTV, and Netcasting continue growth in their respective paths, processing will find the need to reinvent itself as these new technologies break ground and flourish. The model of ‘yesterday’s ideals’ in processing must be put to rest, as these new mediums offer a larger volume of opportunity, in both the content and technical domains. This demands that innovative research and design be performed today, as we finally bring to life these mediums for the new millennium.

References

[1] Foti, F,: Broadcast Signal Processing and Audio Coding: Are We Trying to Mix Oil with Water?, 100th Convention of the Audio Engineering Society (AES), Copenhagen 1996, Preprint 4203 (I-8)

[2] Lyman, S.: Distribution Options In The DTV Studio, Dolby Laboratories, Digital Television 98 Conference, Chicago, December 1998

[3] Foti, F,: Omnia.fm: An Engineering Study, White Paper, Cutting Edge, Cleveland, September 1998

[4] Zolzer, U, : Digital Audio Signal Processing, John Wiley & Sons Ltd, Chichester, 1997

Top