 |
|
Audio
Processing For DAB and the Internet: How Audio Quality
and Intelligibility Can Be Improved In the Data Reduced
Environment, Using Dynamics Control
By
Frank Foti, Omnia Audio
Cleveland, Ohio (c) 1999
Abstract
Digital Audio Broadcasting (DAB) and the Internet are the
latest broadcast media that exist today. Data bandwidth
capacity dictates various degrees of quality that is possible.
DAB has the potential to deliver ‘near CD’ quality, while
the Internet provides quality ranging from ‘good AM’ to
‘near FM like’. In each situation, some degree of audio
data reduction must be applied. Use of dynamics processing can
be employed as a tool to "predict" when audio
conditions occur that can degrade signal quality. Utilizing a
unique analysis algorithm that is modeled around the masking
curve of a coding application, the processing adjusts the
audio signal on a frequency dependent basis to reduce
artifacts. This is especially noticeable whenever very low
bitrate coding is applied.
Audio Processing: "The
Tool"
Is audio processing needed for the new transmission media
like DAB, DTV, and the Internet? Audio purists will claim,
that there is no need to create or clone the sound of FM on
these new mediums. In a literal sense they are correct! But
the reality is that we do, in fact, need audio processing as a
tool so that these new mediums can be utilized with sonic
efficiency and maximum intelligibility. Depending upon the
method or system, one of the above mentions items will benefit
from a processing tool.
When most people think of broadcast
audio processing, they usually imagine the method used in FM
and AM radio, where it’s employed to create a ‘dial
presence.’ In those instances it’s a protection device for
bandwidth control and to guard against overmodulation. In
addition, it’s a programming ‘statement’ that creates a
sonic signature for the listener. For this presentation
however, we will refer to processing in a different context.
Here it will be discussed as a method that will improve the
operation of the data compression required in these new
mediums.
Coding Is Processing Too
When audio coding is discussed, it’s usually thought of
in terms of what is actually transpiring: the reduction of
audio data, using an algorithm that’s designed to operate
within a specified medium or data bandwidth. When viewed at a
bit more closely, it’s also audio processing! Consider for a
moment, in the following nutshell overview, what is occurring
in this process:
-
The audio signal is divided up
using a filterbank.
-
Analysis of the audio signal is
applied to create a masking curve.
-
The masker signal will be adjusted
and moved over the spread of filterbank outputs to select
the most dominant signal, and remove undesired spectrum.
-
The remaining signal is then
coded.
Each of these functions is an element
of signal processing. In some instances, there are
similarities between what transpires in a dynamics based audio
processor and an audio coder. What this discussion will reveal
is that dynamics based processing can operate in tandem with
data reduction coding and thus create a transmission method
that works together as a complete coupled system. The benefits
of this coupling is better sounding audio through the coded
system.
Quick Codec Review
A technique that is very popular is the use of the audio
codec with transmission systems. These devices make use of
"lossy" data reduction algorithms to compress the
bitrate down to a size that will fit within the existing
bandwidth of the system. While there are a number of specific
algorithms to choose from, most have employed ISO/MPEG
Layer-II, ISO/MPEG Layer-III, apt-x, and Dolby AC-2, and now
AAC.
The basic operation of the "lossy"
data reduction system stems from the use of a technique known
as perceptual coding. Simply stated, the basic principle
relies on a masking signal, or masker, that exists around a
threshold curve which happens to follow that of the human
auditory system. Any signal which falls below this threshold
curve is basically discarded. In the digital domain, any audio
data that would fall below the threshold curve is data that is
then discarded by the algorithm, and thus data reduction is
accomplished.
Detailed operation of the above
mentioned algorithms is not needed for this discussion, as the
intent will be to focus upon the effects that dynamics
processing has upon data reduced audio. Suffice it to say that
each system does possess many strengths and possible
weaknesses for their application. It is not the intent of this
discussion to compare audio coding algorithms.
Codec Transmissions, The Caveats
All audio coding methods have strengths and weaknesses. In
almost all cases as bitrate is reduced, audio quality degrades
due to less data bandwidth. However, it should be pointed out
that some of the most recent demonstrations of AAC at lower
bitrates is quite impressive!
Depending upon the transmission medium
DAB, DTV, or Webcasting the audio quality of the codec will be
determined by two issues: the coding algorithm and amount of
data reduction. For DAB and DTV, the bitrates generally are at
higher levels, usually 192kbps or greater for stereo. At these
rates, the coding process affords wide audio bandwidth (20kHz)
and contains a small amount of artifacts (near CD quality).
But in the case of webcasting, where bitrates of 28kbps might
be required, the coding artifacts and bandwidth restrictions
are quite severe (AM radio-like). Audio bandwidth is sometimes
reduced to 4kHz. At lower bitrates, coding artifacts differ
considerably with each of the data reduction algorithms.
Given this wide range of diversity,
how does processing fit in to all of this, especially given
the view that transmission system processing for FM Stereo and
codec STL systems do not mix that well1! This can be answered
with two points:
-
FM Stereo transmission processors
employ hard limiters, or clippers, to achieve absolute
peak control. It has been shown through testing and
research that codecs do not perform well when a
transmission processor is operated through the codec. The
audio quality suffers from added distortion generated by
the coding method used in the STL. Precise peak control is
lost as overshoots are generated by the lossy data
compression of the codec. This is the result of the
preemphasis and clipper functions in the processor. Thus,
in most applications of a coded STL and processor, the
processor needs to be installed after the STL system to
avoid the previous stated problems. In essence, the
problems stem from two difficulties that occur as the
harmonic content of the clipper becomes displaced by the
coding algorithm and preemphasis (50µs or 75µs) hinders
the masking curve of the codec from operating in an
efficient manner. Coding algorithms were not designed to
operate on emphasized audio, as this increases the
audibility of coding artifacts. This is what contributes
to lost peak control, and adds further Total Harmonic
Distortion (THD) to the sonic quality of the signal.
-
While the above comments do not
bode well for processing and coding, it must be pointed
out that this is true when the use of FM transmission
processing is employed with a codec. As with that type of
system, where processing is designed to fulfill the
specific needs of the technology and augment its
performance, the same thing must be done for coded systems
too. Again, using the model of traditional broadcasting,
we can apply the same thinking to coded transmission. As
FM and AM require different processing methods, the same
holds true for coded systems. Here the differences in
processing will be determined by audio bandwidth, bitrate,
and coding algorithms. Therefore the use for processing in
this environment is dictated by the support requirements
of the codec system and its attributes, just as it is in
conventional analog broadcasting. What follows are some of
the issues and features that a processor for coded
transmission must recognize.
Digital Transmission Landscape
Before specifying the attributes of a processing system
for DAB, Internet, or DTV, the landscape for these mediums
need to be defined and understood, as there are certain
aspects that differ from the conventional analog methods.
These deal with algorithm, bitrate, sampling rate, audio
spectral bandwidth, digital full scale, and metadata.
Processing for the digital mediums is highly dependent on how
each of these issues are dealt with.
Algorithm
Definitely the most debatable item, and quite important,
the choice of coding algorithm will play a significant role
in the overall sonic performance of the transmission system.
In most of the digital mediums, the algorithm of choice has
already been made. Then the issue is to understand the usage
of the specific algorithm in order to achieve best sonic
performance. For this discussion, it is not necessary to
delve into the aspects of each coding method. What’s
important is to know and understand those aspects of the
algorithm for the medium used.
Bitrate
Almost as important as the choice of algorithm is the
operating bitrate of the system. This will be determined by
the available data bandwidth. This can range from as low as
24kbps for lower grade narrow range monophonic audio to
greater than 384kbps for full range CD-like stereo or
multichannel sound. As bitrate is reduced, the coding
algorithm must operate more aggressively in reducing the
amount of data pass-through. This will affect the available
audio bandwidth and sonic quality of the audio. Generally
speaking, as bitrate is reduced, both bandwidth and audio
quality degrade. The amount of degradation will be different
based upon the algorithm employed.
Spectrum
Each algorithm and bitrate will have a direct effect
upon the available amount of audio bandwidth that the system
can transport. As stated above, when bitrate is reduced,
sampling rate and bandwidth follows suit. Therefor, it is
crucial that any processing system be capable of managing
the audio bandwidth, as this will have a direct effect on
sonic performance at lower bitrates. More on this topic is
covered later in this paper.
While on this topic, an item should
be pointed out: preemphasis is not required in these
systems. This alone will provide a major sonic improvement
when compared to analog FM systems. The digital mediums all
utilize a flat spectrum, and this negates the need for
specialized high frequency control methods that operate
around the emphasis networks. Later however, we will see how
spectrum management through dynamics control will improve an
encoder’s efficiency.
Full Scale
All digital transmission systems have one important
element in common. They have a specified maximum word size.
In other words, they have a peak ceiling level that can not
be exceeded! Overmodulation is not possible; exceeding full
scale (0dBfs) is a nasty sounding type of distortion. Any
signal processor for digital mediums must perform absolute
peak control, yet eliminate the unwanted coding artifacts
that would occur if a clipper was employed, as described
earlier.
Metadata
Digital delivery systems usually provide some means of
carrying ancillary information about the signal content, or
attributes of the content. This is a function that metadata
can provide. Already employed in the DTV standard, metadata
has the capability to carry information about the dynamic
content of the signal at the transmission point, and then
apply that knowledge to corresponding functions in the
receiver. In this manner, audio processing effects can be
implemented in the system, and then each individual receiver
can use this in whatever manner they choose. By example, the
transmission may contain metadata information about how much
dynamic level compression to employ, and then the end-user
can allow that compression to be applied to the signal, or
eliminated, and make use of the wider dynamic range.
The Dolby Digital system for DTV2
provides numerous functions that take advantage of metadata.
These include a dialog normalization setting for loudness
control, dynamic range compression, and emergency notification
purposes. Some of the proposed DAB systems provide space for
metadata.
From the above definitions, and
functions it can be seen why a conventional FM or AM
transmission processor is not applicable for the digital
mediums. A simple, off-the-shelf, compressor/limiter is not
the answer either, as these units only provide a generalized
form of dynamics control, and are usually wideband in nature.
Just as these units will not suffice in the conventional
analog transmission applications, because they lack specified
functions for the medium, the same analogy exists for the
digital systems, too.
Processing For Digital Delivery
What is required of an audio processor for digital
transmission are three specific items, and a possible fourth
should be considered:
-
Precise peak control so that 0dBfs
is maintained without system distortion.
-
A peak control method that does
not exaggerate coding artifacts.
(This also holds true for any dynamics control and
equalization methods.)
-
Audio frequency response that is
controlled within the bandwidth limits of the system.
-
In low bitrate systems, processing
should enhance intelligibility.
The following sections describe
methods and means of accomplishing the above requirements.
These are based upon a digital processing system that was
designed specifically for digital transmission mediums. What’s
presented here is a global view of the implemented functions,
as this provides an easier understanding of the concepts
involved.
Absolute Peak Control
The simplest form of peak control is the hard limiter,
or clipper, as it’s quite often referred to. In the coded
digital environment, the use of a clipper can cause three
sonic hardships: harmonic distortion from the truncation
action of the clipper, exaggerated coding artifacts of the
data reduction algorithm, and clipper induced aliasing
distortion, aka digital grunge, that results from clipper
harmonics that try and exceed the Nyquist frequency of the
system.3 Therefore, a better method must be employed that
controls peak levels with precision and does not generate
any of the three aforementioned problems.
Look-Ahead Limiting
A Look-Ahead, or Delay-Line limiter is
perfect for this operation. Basically, this limiter creates a
gain control signal based upon the absolute peak value of the
audio signal, except that while the peak level is being
calculated, the audio signal is physically delayed by an
amount of time, equal to the time needed to calculate the peak
value. Once the control signal is ready to implement level
adjustment, the audio is then sent ahead to the control
element at the exact moment that the control word arrives to
make the adjustment4. In this manner, absolute peaks can be
controlled without the need to truncate the excursion, as a
clipper would do. Figure-1 is a block diagram of a look-ahead
limiter.

Figure-1
This results in little or no harmonic
distortion generated by the limiter function. The caveat to
this method is that instead of harmonic distortion,
Intermodulation Distortion (IMD) can result, but that amount
is dependent on the design of the look-ahead algorithm. In
addition, there will be some amount of latency as the audio is
delayed by a specified amount; it can be up to a few
milliseconds. This could make it a bit problematic when trying
to monitor oneself off-the-air, except all of the digital
mediums generate some amount of latency. So the use of
look-ahead limiting is not a problem here.
This is not a new concept, as this
method has been utilized before in other applications.
Generally, it has been employed in a wideband mode which can
exaggerate IMD products. That’s one of the main reasons why
it was never popular in conventional broadcasting, along with
the latency issue.
New research has revealed some fresh
methods to implement look-ahead limiting so that IMD can be
minimized, or suppressed completely. The sonic result is
absolute peak control that yields a very high degree of
fidelity when peak control is performed.
Prediction Analysis
In addition to look-ahead limiting, another new processing
function that will aid the operation of the ensuing data
encoder is a method known as Prediction Analysis. Its
operation, much like the aforementioned limiter, will analyze
signal information based upon peak level and frequency content
as it relates to the coding process. The resulting analyzed
information is either added or subtracted from the control
signal of the final limiter based upon the prediction model
that is designed around coding algorithms. The prediction
model takes into consideration certain frequency and dynamics
conditions that can agitate the encoder and generate codec
artifacts. With Prediction Analysis, the limiter is able to
allow the following encoder to operate more efficiently as it
reduces coding artifacts.
Multiband Dynamics Control
When lower bitrates are used, intelligibility and overall
quality is a problem. Some coding algorithms provide audio
quality that has been described as sounding like a ‘bad
cassette’ recording. Voice is muffled, music sounds thin and
lifeless. Inserting a graphic equalizer is not the answer, as
it will provide inconsistent adjustment on a source-to-source
basis.
A multitband dynamics control section
is the answer for these situations. It provides three key
functions:
-
It can be setup and adjusted for
consistent source-to-source consistency in sound.
-
The action of the frequency bands
can be optimized to enhance voice intelligibility.
-
The upper bands can assist the
final limiter, and further improve coding efficiency.
The following block diagram, Figure-2,
provides an overview of a processing system for digital
transmissions.

Figure-2
As with the case of conventional
multiband processing, effects EQ can be inserted before the
cross-over section. Here is where a gentle boost in the
midrange or presence frequencies will assist in enhancing
intelligibility in low bitrate systems. In addition, an
adjustable cut-off low pass filter can be employed after the
final limiter so that audio bandwidth control is provided. Due
to the low harmonic content of the look-ahead limiter, the low
pass filter will not generate any system overshoots. Using a
low pass filter to remove any audio spectrum that will not be
encoded further reduces the shrillness associated with low
bitrate transmissions whenever too much high frequency content
is presented to the encoder. It is desirable to know what the
audio bandwidth limits of the coded system are, and then set
the processor low pass filter accordingly.
It should be easily seen that each of
the aforementioned items operates in a much different manner
than the conventional FM/AM audio processor. Here is where the
system is employed as a ‘tool’ instead of an effects box
that’s trying to create the threshold of pain on the dial!
Audio processing for digital delivery will improve the overall
performance of any system. Each coding algorithm has its own
set of artifacts. Processing can be used to minimize, or
eliminate those attributes. Additionally, in the new DTV
system, it can also be used to write and implement metadata.
Should that method cross over into DAB, it can be done there,
as well. Here’s a case where the use of signal processing is
expanded from the conventional model that’s employed today
in analog transmission services.
Loudness Wars, And A Final
Thought...
Speaking of loudness, will DAB and
Netcasters have loudness wars? Chances are some services will
be concerned with competitive quality and density when
compared to another. The issue will probably never end, as it
just migrates onto other services. (Light humor intended!) At
least loudness through overmodulation is not possible in the
digital mediums! Although audio purists may scoff at the
thought of a "Hot-Rockin’ Flame-Throwin" digital
signal, it should ease the mind that processing for DAB does
not involve the extreme amounts of hard processing that’s
available in FM/AM broadcasting. The end result, even
processed, will be superior sound, as the use of preemphasis
and clipping will be eliminated.
The new digital transmission mediums
create a plethora of opportunity for content providers.
Therefore, the demands that these audio signals will put on
the chosen coding methods will put them to the test. Hence,
the employment of audio processing that suits the medium can
only improve the overall end result. In this case, the analogy
to the conventional processing system for FM/AM broadcasting
holds true.
As DAB, DTV, and Netcasting continue
growth in their respective paths, processing will find the
need to reinvent itself as these new technologies break ground
and flourish. The model of ‘yesterday’s ideals’ in
processing must be put to rest, as these new mediums offer a
larger volume of opportunity, in both the content and
technical domains. This demands that innovative research and
design be performed today, as we finally bring to life these
mediums for the new millennium.
References
[1] Foti, F,: Broadcast Signal
Processing and Audio Coding: Are We Trying to Mix Oil with
Water?, 100th Convention of the Audio Engineering Society (AES),
Copenhagen 1996, Preprint 4203 (I-8)
[2] Lyman, S.: Distribution Options
In The DTV Studio, Dolby Laboratories, Digital Television 98
Conference, Chicago, December 1998
[3] Foti, F,: Omnia.fm: An
Engineering Study, White Paper, Cutting Edge, Cleveland,
September 1998
[4] Zolzer, U, : Digital Audio
Signal Processing, John Wiley & Sons Ltd, Chichester,
1997
Top
|
|