|
Critical
Issues and Considerations for an All Digital Transmission Path
Frank Foti
Cutting Edge
Cleveland, Ohio
Abstract
The trend in FM broadcasting is the all-digital transmission
facility. Some believe that digital connectivity is merely as
easy as joining together numerous AES/EBU signals and
magically, the digital path appears. Real life experience
indicates there is more to it than that. To date, there are
known situations where a digital transmission path has caused
added distortion, loss of loudness, overshoots, and/or
excessive delay. A real world problem exists today! This paper
examines the all-digital path and illustrates where benefits
are realized, and where potential hazards can occur. An
in-depth look at sample rate converters used in digital
exciters will reveal the cause of modulation overshoots.
Finally, a proposal is offered to illustrate the need for
connectivity of a digital audio processor that incorporates a
stereo generator with a digital excite
Overview
Digital. The technical buzz word of the 90s used as if
it's a magical potion that makes everything perfect...well
almost perfect. Truth is, under certain circumstances- or
without a complete understanding-it can actually be degrading!
Let's take a look at the digital broadcast facility from top
to bottom. Sure, numerically, the signal path can now
transverse from the microphone all the way to the FM exciter,
but is that journey sonically as pure as analog? Some argue
yes, and some argue no.
This presentation details where the
strengths and weaknesses are in the digital path. Make no
mistake: There is potential to create an outstanding
performing broadcast facility using digital, but certain
parameters must be observed and some guidelines understood. At
issue are discussions about: sampling rate, AES/EBU, time
delay, STL systems, codecs, and digital exciters. Each of
these items play an important part in the all-digital
facility. Before getting started, some history is in order.
Back to The Future ...
If you're a broadcast veteran whose career spans back to
the early/mid 1970s, the following reflection will hopefully
be of amusing interest to you. If you're one of the younger
"pups" to radio, please read on for some interesting
revelation.
In the early/mid 70s, remember the
first attempts at "competitive" audio processing in
FM? You know, inserting a final limiter/clipper before a
stereo generator, thinking that the hard limiting action of
the clipper would stop over-modulation. Wow, rocket science
you thought back then, right? Wrong! Remember the crashing and
burning of that concept as the modulation would exhibit wild
overshoots? Peaks that occasionally would reach up to 170%.
Loudness was not gained, but lost! What happened?
As history shows, it was the
non-linear response of the 15kHz low pass pilot protection
filters in the stereo generator that were the culprits. Later,
Robert Orban would research, design, and develop a complete
processing system that eliminated the problem by integrating
the hard limiter, final filter, and stereo generator together.
From that point on, "competitive" audio processing
was possible.
Why tell that story? We all know it.
How does it relate to the topic at hand? Well, based upon
performance thus far with certain digital transmission paths,
some of those older 1970s problems have come back to haunt us.
Even some of those veteran engineers of the 1970s are
beginning to think that the digital path is revisiting
history. To date, there are concerns that some digital systems
exhibit overshoots, which can reduce loudness. Some generate a
significant amount of time delay through the system. Enough in
fact, that air talent can not monitor off-air in their
headphones. Other paths are known to use a codec inside STL
systems. That can have adverse effects on the audio, depending
upon the coding algorithm and bitrate employed. Finally, the
issue of sampling rate, and sample rate converters must be
revisited. Is 32kHz enough for quality FM broadcasting that
truly rivals analog? (I can hear the analog fanatics screaming
already.)
How do we avoid these issues?
Unfortunately, the answer can not be reduced to a single item
as it was with the low pass filters in the stereo generator.
It's a broader concern.
A System-wide View
As stated earlier, the total digital transmission path is
more than a connection of linear AES/EBU signals that exist
between program origination and the transmitter. While that is
desirable and, hopefully, possible soon, the path actually
exists as a number of cascaded digital signal types that must
be made compatible with one another. For example, the digital
signals may vary in sampling rate among various components in
the system. Also, the use of a codec-based STL adds yet
another dimension to the sonic aspects of the audio, as well
as sampling rate.
Placement of various components that
make up the system can have a dramatic effect on performance.
Probably the most important is audio processing. It's location
within a specified system, coupled with the remaining items
that comprise that particular system, will have monumental
performance benefits or drawbacks. Where pre-emphasis is
inserted is yet another important concern.
The only exception to this, would be a
digital facility where the entire plant is co-located. In that
instance, it is possible to connect the path via linear AES/EBU
between the studio and signal processor, and then between the
processor and the exciter.
These are only the surface issues. Of
equal importance are propagation time delay. Can the on-air
talent continue to monitor themselves off-air? Sample rate
conversion: Does it adversely affect the output of an audio
processor when coupled to a digital exciter?
Each of these items are critical in a
digital audio path. In some instances, the path may have to
deal with one or all of them. Following are in-depth views of
the important aspects that comprise the digital transmission
system and recommendations that yield beneficial performance
improvements.
Path Configurations
The digital transmission path can be configured in a few
ways. For discussion purposes, it will be assumed that some
form of Studio-To-Transmitter (STL) link is involved.
First consider if the STL path will be
linear, or coded. The goal is to be as linear as possible, but
sometimes circumstances may make this possible.
If the path is coded, the next
decision involves where to insert the audio processing. The
preference is to place the processing at the transmitter for
technical and sonic reasons, but it is much easier to adjust
if installed in front of the STL. Figure 1 is an example of
the processing located after the codec and at the transmitter
sight.

Figure 1
Figure 2 is an example of the
processing inserted before the codec.

Figure 2
If a linear path is possible, then
installation of the processing can be at either the studio or
transmitter location. Good technical and sonic performance
will be possible at either location.
The Components
Examining the transmission path is literally the sum of
its parts. This section describes and details each of these
components so that a better understanding of the complete
system will result. The one commonality to the whole system is
the AES/EBU interface. The author will assume that the reader
has enough understanding of this standardized protocol, such
that further discussion and review is not warranted. AES/EBU
should be viewed as the "glue" that ties each of the
pieces together.
Digital FM Exciters
Digital FM exciters are the latest entry to the digital
path. Capable of incredible modulation performance, the
digital exciter offers two forms of signal input. Analog
composite (MPX) for the non-digital transmission sight, and
AES/EBU.
The analog MPX input to be modulated
digitally, requires a very high sampling rate. Consider for a
moment that the audio spectrum of FM is 99kHz, the digital
exciter must provide a sampling rate of at least 200kHz. That
would provide a Nyquist frequency at 100kHz, which would cover
the baseband spectrum.
The AES/EBU input accepts the
Left/Right signal in the digital format. Because it still is
in the discrete Left/Right state, the exciter must perform the
stereo generator function. Here is where the story gets
interesting.
Consider for a moment the signal at
the AES/EBU input of the exciter. It might be a different
sampling rate than the exciter is expecting. If so, a sample
rate converter is employed to make the proper transition. This
can pose problems, as the digital filter in the sample rate
converter can generate overshoots to the already tight
peak-controlled audio data that is being converted.
As mentioned, the audio will have
already been peak-controlled and bandlimited by the audio
processor. (The processor can be either analog or digital, it
does not really matter for this discussion.) The processor
also would have already applied the needed pre-emphasis.
Hence, the Left/Right stereo signal only needs the matrixing
and MPX encoding for stereo modulation to occur. That is all.
What is present in most digital
exciters is yet another low pass filter, potential stereo
limiter, and in some cases the addition of, yet again,
pre-emphasis. The latter may occur, if the incoming Left/Right
audio signal was de-emphasized earlier in the path.
In essence, the signal that only
needed to be matrixed and MPX encoded has now had additional
elements of conditioning applied to it. This can degrade the
modulation efficiency and sonic performance. Let's have a look
at why.
Sample Rate Converters (SRC): This
device transforms one system sampling rate to another. This
becomes necessary when interfacing digital equipment that uses
different sampling rates, and thereby permitting compatibility
among different systems.
In an example of changing 48kHz
sampling to 32kHz sampling, the conversion is accomplished by
scaling up, or interpolating the original sampling rate,
usually by a factor of ten; then, at the 10x rate 480kHz,
filtering the signal with a low pass filter that is set to the
Nyquist of the new, desired, sampling rate. This filter is
required to '"smooth out" the 10x rate. If it was
not used, aliasing products would result. Finally, the signal
is scaled down, or decimated by the factor needed, in this
case ÷15, to the achieve the new rate of 32kHz. Figure 3 is a
block diagram of a SRC. While this sounds quite simple-and
basically it is-there are a few issues to consider. Of main
interest is the interpolation filter.
Figure 3
All audio processors, both analog and
digital, apply some form of overshoot control in the output
filtering section. In most designs, this function is performed
by an integrated protection clipper working around the final
low pass filter.
In each case, the overshoot component
can be determined by the "Gibbs Phenomenon"1, which
states that an overshoot will occur at one-third the cut-off
frequency of any low pass filter whenever a non-linear
waveform is passed through it. In the case of broadcasting,
the non-linear waveform would be that of a clipped signal.
Knowing that the audio bandwidth used
in FM stereo is 15kHz, overshoot components will begin above
5kHz with any non-linear waveform.
Should the slope of the previously
described up-sampled interpolation filter appear greater than
the slope of the final filter in the audio processor, then
output overshoots may result in the sample rate conversion
process. Unfortunately, these overshoots are generated after
the audio processor. To remove them would require another
limiting device-thus the reason for the added limiter in the
digital exciter.
Not all sample rate converters will
cause overshoots. But in most cases, the filtering used in the
sample rate converter will be of a large magnitude in the
bandstop rejection area. In all probability, it will be an FIR
filter with at least 96dB rejection in the stop-band.
Of interest is the direction of rate
conversion. Should the host sampling rate be higher in value,
than the incoming rate, chances of overshoot are small. This
happens due to the up-sampled filter being set to a broader
spectrum than the spectrum of the incoming signal. Potential
problems may arise when transforming a larger sampling value
to a lower rate, as the example of converting 48kHz to 32kHz
sampling. Then the details of the above description apply.
Input Sampling Rate: It is unknown why
32kHz sampling rate is used in broadcast paths, when it has
been discussed and demonstrated that 48kHz sampling is far
superior in performance. The importance is not so much the
added spectrum available with 48kHz sampling, but that 32kHz
causes aliasing distortion in specific instances. This is
clearly demonstrated by any DSP based audio processor designed
before 1997, and extensively detailed by the author in a
previous NAB presentation2.
The use of 48kHz sampling as the AES/EBU
input rate in the exciter ensures the best sonic performance.
In addition, any input that needed to be converted up from
32kHz sampling would not create any overshoot component in the
modulator. These factors ultimately benefit the broadcaster.
Integrated Limiter: Due to the above
SRC scenario about SRC, most digital exciters provide a
baseband limiter to eliminate the overshoot problem. Also,
there are certain path configurations that can cause
overshoots that do not relate to the sample rate conversion
process. Those are usually situations that involve use of a
coded STL system where the desire is to insert the main audio
processor before the encoder of the STL. It has been shown
that employing audio processing in front of a codec can have
sonic and modulation performance penalties. The codec issue
will be discussed in an upcoming section.
The integrated limiter used in the
exciter is combined with part of the stereo generator. This is
not a composite clipper, but a time-delay, feed-forward
limiter that controls peaks with a zero attack time. Waveforms
are controlled with little or no harmonic distortion (T.H.D.)
components, but will produce a larger intermodulation (I.M.)
level.
Technically, this style of limiter
will operate sufficiently when controlling overshoot peaks or
as an additional limiter to the audio processing. Sonically,
this type of limiter will produce a "busier" sound.
It will sound more like a limiter that is operating with
"heavy" levels of compression. That is the result of
the added I.M. In the audio processing realm, adding more I.M.
to an already processed signal is undesirable
Of interest to the author is that a
digital overshoot clipper is not employed. It has been proven
that digital composite clipping is possible, while maintaining
a clean spectrum. Composite clipping produces far less I.M.
products, as does a delay limiter, and it will yield cleaner
sound for the same amount of limiting/clipping used.
Pre-emphasis: The exciter has the
option of adding the required pre-emphasis. Optimally, the
transmission system would be set up so that the pre-emphasis
is generated once in the audio processor. Then, the
pre-emphasized, processed signal is coupled directly to the
exciter.
Broadcast audio processors employ
pre-emphasis within system architecture. Since emphasized
audio must also fit within the imposed modulation limits, the
processor employs specialized high frequency control sections
that provide both the boost and control of the high frequency
energy. In this manner, efficient high levels of modulation
are easily obtained since the processor is designed and set to
limit any tradeoffs resulting from pre-emphasis and high
frequency limiting requirements. Basically, these two sections
work in concert with one another to allow pre-emphasis to be
employed, and yet control the emphasized energy content.
In situations where a codec STL system
and audio processing are inserted before the encoder-the codec
must pass "flat" (non pre-emphasized) audio. This
requires adding de-emphasis to the output of the processor in
order to send the restored "flat" signal to the
codec. Figure 4 illustrates this. A flat signal is required by
the coder because of the use of masking in the encoding
process. Any significant change, or imbalance of the frequency
spectrum, can cause the threshold curve of the coding system
to possibly have a profound effect on the output of the coded
audio3.

Figure-4
These additional de-emphasis and
pre-emphasis steps will add modulation overshoot to the total
transmission system. To eliminate the added overshoot, another
limiter must be employed.
Unfortunately, tests have shown that
operating a transmission processor with an emphasized output
into a codec will generate audible high frequency distortion.
This occurs because the spectral balance to the codec masking
process is not spectrally flat, which is what the masker
signal wishes to operate on.
Based upon the previous discussion,
one can see that it is advantageous to install the audio
processing system as close to the exciter as possible. As it
allows employing the emphasis in the processing, and using a
"flat" input on the exciter. Also, internal limiting
in the exciter becomes unnecessary and allows the audio
processing system to provide all of the required peak control.
STL Systems
There is a choice in digital STL systems: Linear or coded.
The linear systems have mainly been available as
"nailed-up" data links, (such as T-1) but recently
there has been the introduction of the
"uncompressed" radio links as well. In analyzing the
STL link, two items are of critical interest: Time delay and
coding algorithm.
The first item, time delay, is an
issue with either the linear or coded STL system, as there is
a propagation delay (in milliseconds) for the audio data to
travel from the input to the output of the system. If the
delay is excessive, then the on-air talent can not comfortably
monitor themselves off-air. This occurs because the off-air
audio in their headphones is delayed relative to the arrival
time of their voice directly through bone conduction.
The following table, based on
real-world tests, was created by a radio engineer to
illustrates the effects of time delay for on-air talent:
• 1-3 ms: Undetectable delay.
• 3-10 ms: Shift in voice
character audible to person speaking. (comb filter effect)
• 10-30 ms: A slight echo turning
to obvious slap @ 25-30 ms.
• 30-50 ms: Disturbing echo,
disorienting the announcer.
• >50 ms: Too much delay for
live monitoring.
Thanks to Jeff Goode in Indianapolis
for providing this information! The key break points seem to
be that delays of 5-7 ms are not a problem; from 10-30 most
announcers can work live, but anything above 25-30 is
annoying. These are real world tests.
While the subject of time delay is
being discussed, it must be pointed out that time delay is a
cumulative item. Each device that exhibits any delay will add
up to produce a total system delay. Should this system delay
exceed some of the figures in the above chart, then off-air
monitoring will not be possible. In addition to the STL
system, audio processors, digital exciters, and even digital
modulation monitors can generate delay.
A possible second item for
consideration-coding algorithm-is only applicable if a
non-linear STL is used. These devices make use of 'lossy' data
reduction algorithms to compress the so it can fit within the
existing bandwidth of the STL system. While there are a number
of specific algorithms from which to choose, most STL
manufacturers have made use of proprietary digital formats
that are derivatives of prior development. Most common usage
has been ISO/MPEG Layer-II, ISO/MPEG Layer-III, apt-x, and
Dolby AC-2.
Detailed operation of the
above-mentioned algorithms is not needed for this discussion,
as each system possesses both strengths and weaknesses for
this application.
What is of importance here is to
understand that each audio coding algorithm will have a
specific sonic effect on the audio. Along with the fact that
the signal will also be dynamically controlled, the use of
audio coding can have a negative effect on the performance of
the audio processor. It has been demonstrated that any use of
a coded STL should be done where the audio processor is
inserted after the coding. There will be less degradation of
the audio, and the processor will perform peak control more
efficiently.
Sampling Rate
Stated earlier, the issue of sampling rate must be looked
at again: 32kHz is simply too low. With the cost of DSP chips,
and converter sets constantly falling in price, there is no
reason why 48kHz can not be used. We have now reached a point
in the professional audio domain where 96kHz is the desired
sampling rate. How will that affect the broadcast community?
Transmission systems thus far have
used 32kHz as a base sampling rate, which in turn sets the
Nyquist freq. at 16kHz. Considering that conventional FM
stereo broadcasting requires 15kHz of audio bandwidth, this
leaves only 1kHz of guard band spectrum before the Nyquist
point. To facilitate this, a filter of very large magnitude
must be employed in order to suppress all energy by at least
96 dB at the Nyquist, or aliasing occurs. This can be
accomplished digitally using a finite impulse response filter
(FIR). The only drawback is that many "taps" are
required within the filter to achieve this level of stopband
rejection. The significance of the 'taps' is that for every
two taps in the filter, it requires one sample to perform its
duty. For a 15kHz FIR filter of this magnitude, 101
"taps" are needed. This in turn results in 50
required samples, equating to 1.56 milliseconds of propagation
delay through the filter.
Add up the number of 15kHz filters
employed in a 32kHz sampled transmission path, and those alone
will create enough time delay to disorient on-air announcers.
A broad question is: Why the use of
32kHz as a base sampling rate? Tests and research show that a
base of 48kHz would make all of the aforementioned problems
much easier to deal with. The guard band to the Nyquist is
much farther out, which in turn moves out the aliasing point.
This would allow a final filter with less time restriction,
and the propagation delay associated with 48kHz would be
faster in itself and makes this rate more desirable.
Most likely, 32kHz sampling was chosen
in the past, because there would be more machine cycles
available to handle the workload. That would be the only
reason to possibly support a lower sampling rate.
Digital Exciter Proposal
In the discussion section about the digital exciter,
numerous options were explained about the interfacing
possibilities of the audio processor to the exciter. All of
them revolve around the usage of the AES/EBU input protocol.
In that configuration, the audio data arrives in Left/Right
format and requires the exciter to perform the MPX generation.
Question: Why can't the digital audio
processor, which has the MPX encoder built-in, connect it's
digitally-generated baseband signal directly to the digital
modulator of the exciter? This would be analogous to the
analog composite input on any exciter.
Figure 5 is a block example of this.

Figure 5
Unfortunately, there is not a standard
for transporting the MPX baseband signal, but an
"official" standard is not needed. Consider the
following: In either the audio processor or the digital
exciter, a fast sampled section is needed to accommodate the
MPX signal. (Remember we're dealing with a spectrum out to
100kHz.)
Knowing that most DSP families use
some form of serial data stream, it is possible to
"publish" the needed input data format so that an
audio processor can provide that format as a digital composite
output. The important factor is that both systems agree on
sampling rate. Again, the use of a very fast rate is
advantageous, say, 384kHz. That's 8x above 48kHz.
Naturally, this type of configuration
would require installing the processing at the transmitter
facility, since transporting a digital composite signal of
this speed and size would be cost prohibitive. Current
generation digital processors provide some form of computer
control via modem, or network making processing at the
transmitter somewhat less inconvenient.
It is curious that none of the digital
exciters available provide-or even propose-an application like
this. It provides the best possible coupling to the exciter,
and the performance benefits are significant. Imagine having
the power of a complete digital processing system and
integrated stereo generator that is directly connected to a
digital modulator. Now we're talking about super efficient
modulation capability. Zero overshoots due to added emphasis,
coding, or sample rate converters. That's real power.
Conclusion
The total digital transmission path is capable of
providing some outstanding performance results. To achieve
this, audio processing must be inserted at the transmitter
site, and a "flat" input should be used on the
digital exciter. If an STL system is employed, a linear system
would be preferable, but a high bitrate coded system is
acceptable as long as the dynamics processing occurs after the
coding.
As long as the systems design engineer
in a broadcast facility is aware of these critical issues,
there is no reason why an all-digital broadcast facility can
not exist today, providing broadcasting of exceptional
quality.
REFERENCES
[1] Baher, H. Analog & Digital
Signal Processing, J. Wiley & Sons, 1990
[2] Foti, F. Digital Broadcast Audio
Processing: Finally, The New Frontier, NAB Convention, 1997
[3] Mendenhall, G. Pre-emphasis and
Limiting Considerations for Audio Processors and Digital
Studio-to-Transmitter Link, White Paper, 1995
Top
|