 |
|
Broadcast
Signal Processing and Audio Coding: Are We Trying to Mix Oil
with Water?
Frank Foti
Cutting Edge Technologies
Cleveland, Ohio, USA
In the broadcast industry there has become a plethora of coded
audio systems that utilize some form of data reduction and/or
compression technique. While this is a benefit for data
storage and point to point transmission capability, it is a
challenge for a dynamic range signal processor used in
broadcasting.
With audio coding located in the broadcast transmission path,
as in the case of a digital Studio to Transmitter Link (STL),
there are a number of considerations and compromises a
broadcaster must face. Considering that non-linear properties
of both an audio codec and a broadcast audio processor, are
being applied, there can be some on-air audible penalties to
be paid. Discussion will show where these non-linear
applications can form an unhappy relationship!
Further discussion will reveal the known and possibly unknown
pitfalls of audio coding and dynamic signal processing when
used together. Since the coded audio signal is now commonplace
in the broadcast facility of today, we must examine and
possibly rethink the approach of signal processing as used by
the broadcaster. With Digital Audio Broadcasting (DAB)
becoming a reality, how will the industry cope with dynamic
signal control, and coded audio, now and in the future?
Background
Signal processing for broadcast transmission can almost be
thought of as a combination of engineering, art, and science.
It is through a mixture of these three ingredients that all
transmission processors are derived. Please understand, that
while any specific system will provide the same form of
technical parameter specifications, each are designed and
developed using different forms of engineering art. The
following sections explore the ingredients of this mixture.
Engineering
The engineering aspect is practically a given. The use of
adequately designed transmission system processors insures
that all modulation control, signal parameter standards, and
government limitations are adhered to. These would address the
issues of maximum modulation level that corresponds to either
maximum carrier amplitude, or frequency deviation. As well as
guidelines for pre-emphasis standards and bandwidth
limitations.
Most transmission processors exist as multiple band signal
devices. They divide the audio signal into a preset number of
frequency bands, then perform processing functions such as
automatic gain control (AGC), equalization (EQ), dynamic
limiting, and hard limiting, or clipping. Within the system,
are sections designed to provide the required pre-emphasis and
bandwidth limiting for the respective transmission medium.
Some processors, in addition, offer the stereo generator/coder
function wherever multiplex FM Stereo is used.
The job of the processor is not only to prevent over
modulation of the transmitter and insure the proper audio
bandwidth. But, it is also used as a tool to generate a
specific aural presentation of the programme. This occurs as
the multiband structure operates as a form of automatic
equalization to the signal. This yields a consistent spectral
balance of the signal content from programme source to
programme source.
Art
Art covers the abstract and expressionistic perspective.
In most broadcast applications, this might be viewed as the
most important component. Here is where the aural signature is
created, also known to those in the broadcast community as
"the sound", or "the sound of the
station." This is created less by engineering means, but
more so from the adjustment and manipulation of the signal
processing system. Almost the consideration of an artform or
"black magic." Although each transmission processor
is capable of achieving the same engineering specifications,
it is in the adjustment and operation of these devices where
the sonic differences occur.
It has become the normal state of operation in the USA, and
now the World over, that a station's sound is judged from two
key factors, quality and perceived loudness. Quality refers to
the purity of the transmitted audio signal. How much the
transmission mirrors the sound of the originating programme.
Perceived loudness is more of a competitive term. This is the
subjective opinion of how much louder a specific broadcast
station is judged when compared to another.
There has been, and continues to be a belief in this industry
that perceived loudness is desired. The psychological belief
held by many broadcast programmers who feel that the listener
will perceive a louder station to be more attractive to listen
to when tuning across the dial. While this issue has been
discussed and debated within the broadcast industry for many
years, it still is a crucial requirement of any transmission
processing system to generate a high degree of loudness.
There is a compromise between the two. The more that perceived
loudness is desired, quality will degenerate. This happens
because of the processing tradeoffs involved. As more
processing is employed to reduce the peak to average ratio of
the programme material, perceived loudness increases.
Meanwhile, dynamic range decreases and
harmonic distortion, (THD), and intermodulation distortion (IMD),
or a combination of both will increase. Hence the reduction in
quality.
While this paper is not an essay on the artform of broadcast
signal processing, it is the humble opinion of this author,
that a pleasing compromise does exist that will produce a high
level of perceived loudness while maintaining an equal degree
of quality!
Science
Another key ingredient is science. In order for any
processing system to accomplish the two previous discussed
issues, there are a number of scientific topics that must be
dealt with. One of these is peak level overshoots. Since
stringent peak level modulation limits exist in most
Countries, it is the requirement of the processing system to
maintain all overshoot components of a signal to at least 2%
of the maximum permissible level. This is easier said than
done!
Another important requirement is bandwidth. Dependant on the
broadcast medium, AM or FM, the audio bandwidth can be as
narrow as 5kHz for short wave, and as broad as 15kHz for FM
stereo. Again, most Countries have bandwidth limitations that
must be adhered to. While these issues were mentioned in the
above described engineering section, it is the scientific
conditions that deserve mention in order for a better
understanding of the total transmission system.
Overshoots and bandwidth, while two separate issues, are
actually a by-product of one another. As discussion will soon
reveal, a non-linear waveform must be able to pass through a
specified bandwidth filter, and the result must be devoid of
overshoots and out of band products. An analogy to this is the
old parable of trying to insert a square peg into a round
hole!
Most processing systems employ some form of hard limiter for
final peak level control. Either as a simple clipper, or as
some form of integrated limiter with associated distortion
control, peak ceiling levels are maintained. Unfortunately,
this non-linear process does produce harmonic products that
will fall out of the permissible passband. In order to
control, or remove these products, some form of filtering must
be employed. Knowing that in most cases, the bandwidth
limitations require severe attenuation of out of band
products, a filter with a high degree of cutoff must be
utilized. A filter with an elliptical response is sufficient
for this.
Here is where the paradox begins. All filters produce
overshoot. Known as the Gibbs effect[1], the overshoot
characteristics can be calculated to begin at one-third of the
cutoff frequency. This is the exact point in the filter
passband where the third harmonic is removed from any
non-linear signal that, is equal to, or greater in frequency
above the one-third cut off point. This can be exhibited when
trying to pass a square wave through a low pass filter. For
signals of a sinusoidal nature, there is no problem as each
frequency will contain little, if any harmonic content.
Whenever final limiting is employed by the processor,
harmonics generated from the non-linear process will be
reproduced as square waves or some derivative thereof. Here is
where the Gibbs effect comes into play.
Adding to this equation is the affect of the amount of
overshoot from the filter employed. An elliptical filter by
its nature will possess a large amount of non-linear group
delay. Overshoots as much as 70.7% can be realized in any
filter with a non-linear group delay. For overshoots to be
minimal, a corresponding group delay equalizer must be used.
The Gibbs effect research indicates that any low pass filter
with a uniform group delay will provide an overshoot
component, of a non-linear waveform, that will not exceed 1.09
the original amplitude or 9% overshoot. Realizing that prior
discussion reveals that permissible system overshoot to be
less than 2% of the maximum limit, there is still a 7%
disparity in the overshoot component. If the system filters
are not group delay compensated, the Gibbs overshoots can
reach 1.707 of the original amplitude, or 70.7% overshoot!
Some systems deal with this disparity by adding additional
clippers and subsequent filters to control any residual
overshoot. Practice has indicated that while this process does
prove reliable, there are audible penalties to be paid as the
additional clipping can be audible under normal operating
conditions.
Others, utilize proprietary engineering art to scientifically
and mathematically eliminate system overshoot disparity. This
approach actually takes advantage of the Gibbs overshoot
component to be used for both overshoot control and the
reduction of final limiting induced distortion products. It is
again the humble opinion of this author, and also as a
designer, to use the latter method.
Taking all of the above discussed scientific issues into
account, the end result of any processing system is a precise
signal that adheres to exacting amplitude control, audio
bandwidth with minimal or no overshoots. While lengthy in
nature, this background information will be used as reference
material during the discussion of the insertion of the audio
codec into the transmission path.
Analog Studio Transmitter Link (STL)
The analog Studio Transmitter Link (STL) is a time tested
and proven method for connecting the audio signal of a
broadcast studio to the transmitter. This system can be
discussed in two forms, discrete or composite channel.
The discrete channel makes use of two independent STL
frequencies, or links for transmitting multiple channels of
audio. In a stereo system, the left and right channels would
use independent discrete STL channels. Since stereo broadcast
transmission requires at least 53kHz of bandwidth, and
discrete systems provide only 15kHz audio bandwidth, this
usually requires that the entire processing be located at the
transmitting location. Although it is possible to locate the
processing for the discrete channels at the studio, and employ
the stereo generator/coder at the transmitting location. For
the latter to operate properly, the discrete STL channels must
be able to provide zero overshoot modulation capability as
well as provide zero group delay within each 15kHz passband.
Figure-1 is an example of a discrete STL system.
. 
Figure-1, Discrete Analog STL System
Composite STL systems are capable of passing the 53kHz
stereophonic multiplex signal. An advantage of this system is
that it uses only one STL frequency or link, and the
processing system can be located at the studio location. Of
importance here is the linearity aspects of the composite
modulator and demodulator of the system. As long as group
delay is minimal through the 53kHz passband, precise
modulation of the composite signal is possible through this
type of STL system. The disadvantage here is that over a long
path, any signal degradation will result in added noise in the
multiplexed signal when inserted into the FM
exciter/transmitter. The composite STL is illustrated in
Figure-2.

Figure-2, Composite Analog STL System
It is then desirable for any transmission system to maximize
the performance of the STL system. As this will produce low
noise, distortion, yet optimize modulation abilities.
Digital STL
Over the last ten years, the emergence of the digital STL
system has taken place. With this, the realm of possibility
exists for a total digital broadcasting facility. This would
be inclusive of a digital mixing console, signal routing,
processing, STL, and completing with a digital modulator for
the transmitter. With Digital Audio Broadcasting (DAB) having
begun in calender year 1995, this possibility has in fact
become a reality!
The existing analog broadcasting systems are all taking
advantage of the digital STL systems, as they too have started
to replace older analog equipment with newer digital
counterparts.
Enter the Codec
For a STL to transport a digitized high quality stereo
audio signal in a linear real time format, it would require a
capacity of about 700 kbps for each audio channel[2]. This
would consume more than the allowable bandwidth of existing RF
microwave STL systems. Although through the use of a dedicated
digital communications link such as a T-1 service, this is
possible. Unfortunately, link availability and cost has made
this option more difficult to make use of. As with the case in
the analog STL domain, the cost of bandwidth for either analog
or digital service still remains quite high.
A technique that is very popular is the use of the audio codec
to reduce the data requirement for a digital STL system. These
devices make use of "lossy" data reduction
algorithms to compress the bitrate down to a size that will
fit within the existing bandwidth of a STL system. While there
are a number of specific algorithms to choose from, most STL
manufactures have made use of proprietary digital formats that
are derivatives of prior development. Most common usage has
been done with ISO/MPEG Layer-II, ISO/MPEG Layer-III, apt-x,
and Dolby AC-2.
The basic operation of the "lossy" data reduction
system stems from the use of a technique known as perceptual
coding. Simply stated, the basic principle of which is a
masking signal, or masker, that exists around a threshold
curve which happens to follows that of the human auditory
system. Any signal which falls below this threshold curve is
basically discarded. In the digital domain, any audio data
that would fall below the threshold curve is data that is then
discarded by the algorithm, and thus data reduction is
accomplished.
Detailed operation of the above mentioned algorithms is not
needed for this discussion, as the intent will be to focus
upon the actual affects that data reduction algorithms have
upon signal processed audio. Suffice it to say that each
system does possess many strengths and possible weaknesses for
this application. It is not the intent of this discussion to
provide choice for any audio coding technique to be considered
as a standard or preference when compared to one another.
Application & Usage
The coded STL is capable of being inserted into an
existing analog composite STL path. The insertion of an
encoder at the STL transmitter point will accept discrete left
& right audio. The receiver outputs a data signal to a
comparable decoder to reconstruct the individual channels.
Sampling rate and transmission speed (bitrate) are all
configured for each specific system.
Pre-emphasis
All broadcast applications make use of some form of
pre-emphasis boost. For FM broadcasting, North American
Countries utilize a 75us emphasis, whereas 50us is used
elsewhere the World over. Medium-wave, or AM makes use of an
optional modified 75us emphasis.
Transmission signal processors employ pre-emphasis within
their system architecture. Since emphasized audio must also
fit within the imposed modulation limits, the processor
employs specialized high frequency control sections that
provide both the emphasized boost and control of the high
frequency energy. Using this manner, efficient high levels of
modulation are easily obtained as the processor is designed
and set to limit any tradeoffs due to the pre-emphasis and
high frequency limiting requirements. Basically, these two
sections work in concert with one another to allow
pre-emphasis to be employed, and yet control the emphasized
energy content.
One of the critical requirements of any codec is that the
audio signal must be "flat" in spectral origin. The
term "flat" is used here in the context of a signal
where no additional EQ has been added to the original
component. This is due to the operation of the masker signal
used in the coding process. Any significant change, or
imbalance of the frequency spectrum can cause the threshold
curve of the coding system to possibly have a profound affect
on the output of the coded audio[3].
Now, think back to the earlier stated description of the
transmission processor. It is a multiband processing system
that is providing a constant leveling and equalization of the
signal, along with pre-emphasis. This would seem to be in
stark contrast to that of what a coding device is looking for!
While there has not been any formal substantiated testing
performed, casual listening tests have revealed that operation
of a highly processed signal through a codec will reveal many
sonic weaknesses. These can be in the form of losses in
clarity, detail, and stereo field localization, along with an
increase in audible distortion.
Unless there is careful thought and planning in the design of
the digital STL path, here is where trouble can begin!
Digital Transmission Path
Configurations
Previous discussion would lead one to believe that the
only possibility to configure a digital path would be to
install all processing at the final transmission point and
utilize the digital path as a "flat" signal pipe to
the processor, Figure-3.

Figure-3, Digital STL System
There are a number alternatives that will be discussed. Each
will posses advantages and drawbacks.
Probably the most logical form of setup is the recently
mentioned configuration where the processing is located at the
transmitter facility thereby using the STL as a digital pipe.
The advantage is that the audio signal is provided to the
coded STL in a "flat" domain, and the subsequent
processing system is located as close to the transmitter as
possible. This will allow it to yield the most efficient
modulation performance, as the final output is then coupled to
the transmitter in a manner where nothing can alter the
amplitude and bandwidth limited signal.
The drawbacks require all adjustments to made at the
transmission facility. (Although a number of processing
systems now available offer control via computer and modem.)
Another difficulty is when using a multiple transmitter
network. This is very popular in Europe. A single studio
facility is used to provide programme to an entire Country or
region. Whereas a single encoder is used to feed a satellite
uplink or terrestrial microwave, each network affiliate
location must then install a complete processing system. While
it makes for the best possible transmission system, it is both
costly, and difficult to perform adjustments when they are
needed.
A first alternative would be to install the discrete
processing at the studio, and provide the stereo
generator/coder at the transmitter, Figure-4.

Figure-4, Digital STL System: Alternative Configuration
The benefit here is that processing adjustments can be made
from the originating location, and this would also provide an
equal affect to any additional affiliates, if used in a
network. Also, it reduces equipment cost.
Here the drawback can affect modulation and sonic performance.
Earlier discussion revealed that the input to the encoder must
be "flat". Since the processing system will employ
pre-emphasis, a complementary de-emphasis must be switched
into the output so that the resultant signal is
"quasi-flat". Unfortunately, this quasi-flat signal
will still posses any changes that were implemented by the
multiband operation. These might be in the form of multiband
EQ, or even more importantly, any non-linear functions applied
such as final clipping.
The true test of any coded STL that is setup in this
configuration will be to see how well it can pass the filtered
non-linear signal from the processor. Considering that large
amounts of processing will generate a high level of
bandlimited harmonic content, the exercise for the codec to
perform is to be able to reproduce this bandlimited signal at
the decoder outputs. Any alteration of this signal will
produce audible distortion, and overshoots to the system. This
unfortunately degrades the performance of the processor as the
sonic quality of the audio suffers, as well as a loss of
modulation efficiency. This is due in part to the level of
codec generated overshoot that now must be compensated for. In
this form of configuration there are two possible methods to
try and make best of the situation.
First, is to employ another, yet more, simple peak limiter at
the transmitting location, or each network affiliate. This
would control any systemic overshoots that were created. While
it might not relieve the added distortion from the transcoded
non-linear process, it will provide good modulation
efficiency. This limiter must also restore the needed
pre-emphasis to the signal that was de-emphasized prior to
coding. It should be noted that additional generations of
pre-emphasis does contribute yet another form of signal
conditioning that may have negative sonic implications. These
exist because most secondary limiters will utilize another
non-linear final limiter, and that will add more distortion to
an already prior limited and possibly distorted signal!
A second alternative, but one that requires a bit more
engineering, is to split apart the processing system into two
separate sections. Realizing that the audible problems from
the coding process are mostly related to the non-linear
operations of the processor. It would make logical sense to
split the non-linear section off of the processor and locate
this at the transmitter location, Figure-5.

Figure-5, Proposed Alternative Digital Transmission Path
Thus keeping the AGC and dynamic limiting at the origination
point. In this manner, almost all of the processing
adjustments can be made at the head end, while the afilliate
locations will provide the needed final limiting, pre-emphasis
and stereo generation/coding. By minimizing the amount of
non-linear signal needing to be passed through the codec, the
qualitative content of the signal will remain high, while also
providing the optimum coupling of the processing system to the
transmitter. The only drawback here, is that will require some
thought on how to break apart the processing so that the
respective sections are appropriately in the right locations.
This would appear to be the best compromise as it allows for
the following to occur. The codec can operate on a
"flat" audio signal. The need for pre-emphasis is
limited to one generation, and most importantly, all
non-linear processes are also performed once and located at
the optimum location which is the transmitting facility. In
addition, equipment cost can be kept to a reasonable level.
In this day and age where competitive signal processing is a
given, generating additional distortion to the transmission
path would seem to be a sort of fatalistic approach when
considering the value that is placed upon sonic presentation
of a radio station. If this is due in part to reasoning why an
installation must be accomplished in a digital format, then
the question must be asked, is the digital path an
improvement?
Therefore it is the opinion of this author, and designer, to
promote the use of either the initial configuration of
utilizing the processing at the transmitter location, or the
use of the latter part of the second alternative. In this
manner the benefits of the digital path can be realized, and
more importantly the integrity of the "station
sound" can not only be maintained, in fact, it might be
improved!
DAB, DSP, and the Future!
As mentioned earlier, DAB is upon the World. It has been
through the research, development, and deployment of DAB that
audio coding has become commonplace in this industry. While
still very much in the domain of testing and research, it will
no doubt become a factor in the world of radio broadcasting.
Also, the issue of transmission processing has yet to be
addressed in the realm of DAB. The thinking here is that some
form of processing will become adapted. Considering that the
need for any pre-emphasis will no longer be needed or
required, the possibilities for uses of signal processing in
DAB might be endless! Suffice it to say that generating a
perceived loud "station sound" will rise again on
the DAB horizon.
In world of processing, the use of DSP has also finally
progressed into the transmission path. To date, DSP
transmission processors have only been able to approximate the
performance of their analog counterparts. So far each DSP
device, developed prior to 1996, has fallen short on good
sonic performance in competitive signal processing. Be not
dismayed, with continuing research and developement in the DSP
domain, there will no doubt exist one day a digital signal
processor for broadcasting that can provide greater
performance than that made possible by analog, and initial
digital systems of the present.
In conclusion, the coded STL system
has already made a sizeable impact on the broadcast industry.
Through proper deployment, these systems can be very
beneficial in delivering high quality audio signals to
transmitting facilities. This can only happen if the user
understands the possible pitfalls of mixing the processing and
codec together. Hence the title of mixing oil with water! As
coding algorithms improve and DSP processors work to prosper,
the pitfalls of the present will simply be a nasty thought of
yesterday!
References
[1] Oppenheim, A. and Willsky, A. Signals and Systems,
Prentice Hall, 1983
[2] ISO/IEC 11172. International Standard: Coding of moving
pictures and associated audio for digital storage media up
to 1.5 Mbit/s.
[3] Mendenhall, G. White Paper Pre-emphasis and Limiting
Considerations for Audio Processors and Digital
Studio-to-Transmitter Link, 1995
[4] Wegel, R.L. and Lane, C.E. Physical Review, The Auditory
Masking of One Pure Tone by Another and Its Probable
Relation to the Dynamics of the Inner Ear, 1924
Top
|
|