Return to Omnia hompepage

Critical Issues and Considerations for an All Digital Transmission Path

Frank Foti
Cutting Edge
Cleveland, Ohio

Abstract
The trend in FM broadcasting is the all-digital transmission facility. Some believe that digital connectivity is merely as easy as joining together numerous AES/EBU signals and magically, the digital path appears. Real life experience indicates there is more to it than that. To date, there are known situations where a digital transmission path has caused added distortion, loss of loudness, overshoots, and/or excessive delay. A real world problem exists today! This paper examines the all-digital path and illustrates where benefits are realized, and where potential hazards can occur. An in-depth look at sample rate converters used in digital exciters will reveal the cause of modulation overshoots. Finally, a proposal is offered to illustrate the need for connectivity of a digital audio processor that incorporates a stereo generator with a digital excite

Overview
Digital. The technical buzz word of the 90s used as if it's a magical potion that makes everything perfect...well almost perfect. Truth is, under certain circumstances- or without a complete understanding-it can actually be degrading! Let's take a look at the digital broadcast facility from top to bottom. Sure, numerically, the signal path can now transverse from the microphone all the way to the FM exciter, but is that journey sonically as pure as analog? Some argue yes, and some argue no.

This presentation details where the strengths and weaknesses are in the digital path. Make no mistake: There is potential to create an outstanding performing broadcast facility using digital, but certain parameters must be observed and some guidelines understood. At issue are discussions about: sampling rate, AES/EBU, time delay, STL systems, codecs, and digital exciters. Each of these items play an important part in the all-digital facility. Before getting started, some history is in order.

Back to The Future ...
If you're a broadcast veteran whose career spans back to the early/mid 1970s, the following reflection will hopefully be of amusing interest to you. If you're one of the younger "pups" to radio, please read on for some interesting revelation.

In the early/mid 70s, remember the first attempts at "competitive" audio processing in FM? You know, inserting a final limiter/clipper before a stereo generator, thinking that the hard limiting action of the clipper would stop over-modulation. Wow, rocket science you thought back then, right? Wrong! Remember the crashing and burning of that concept as the modulation would exhibit wild overshoots? Peaks that occasionally would reach up to 170%. Loudness was not gained, but lost! What happened?

As history shows, it was the non-linear response of the 15kHz low pass pilot protection filters in the stereo generator that were the culprits. Later, Robert Orban would research, design, and develop a complete processing system that eliminated the problem by integrating the hard limiter, final filter, and stereo generator together. From that point on, "competitive" audio processing was possible.

Why tell that story? We all know it. How does it relate to the topic at hand? Well, based upon performance thus far with certain digital transmission paths, some of those older 1970s problems have come back to haunt us. Even some of those veteran engineers of the 1970s are beginning to think that the digital path is revisiting history. To date, there are concerns that some digital systems exhibit overshoots, which can reduce loudness. Some generate a significant amount of time delay through the system. Enough in fact, that air talent can not monitor off-air in their headphones. Other paths are known to use a codec inside STL systems. That can have adverse effects on the audio, depending upon the coding algorithm and bitrate employed. Finally, the issue of sampling rate, and sample rate converters must be revisited. Is 32kHz enough for quality FM broadcasting that truly rivals analog? (I can hear the analog fanatics screaming already.)

How do we avoid these issues? Unfortunately, the answer can not be reduced to a single item as it was with the low pass filters in the stereo generator. It's a broader concern.

A System-wide View
As stated earlier, the total digital transmission path is more than a connection of linear AES/EBU signals that exist between program origination and the transmitter. While that is desirable and, hopefully, possible soon, the path actually exists as a number of cascaded digital signal types that must be made compatible with one another. For example, the digital signals may vary in sampling rate among various components in the system. Also, the use of a codec-based STL adds yet another dimension to the sonic aspects of the audio, as well as sampling rate.

Placement of various components that make up the system can have a dramatic effect on performance. Probably the most important is audio processing. It's location within a specified system, coupled with the remaining items that comprise that particular system, will have monumental performance benefits or drawbacks. Where pre-emphasis is inserted is yet another important concern.

The only exception to this, would be a digital facility where the entire plant is co-located. In that instance, it is possible to connect the path via linear AES/EBU between the studio and signal processor, and then between the processor and the exciter.

These are only the surface issues. Of equal importance are propagation time delay. Can the on-air talent continue to monitor themselves off-air? Sample rate conversion: Does it adversely affect the output of an audio processor when coupled to a digital exciter?

Each of these items are critical in a digital audio path. In some instances, the path may have to deal with one or all of them. Following are in-depth views of the important aspects that comprise the digital transmission system and recommendations that yield beneficial performance improvements.

Path Configurations
The digital transmission path can be configured in a few ways. For discussion purposes, it will be assumed that some form of Studio-To-Transmitter (STL) link is involved.

First consider if the STL path will be linear, or coded. The goal is to be as linear as possible, but sometimes circumstances may make this possible.

If the path is coded, the next decision involves where to insert the audio processing. The preference is to place the processing at the transmitter for technical and sonic reasons, but it is much easier to adjust if installed in front of the STL. Figure 1 is an example of the processing located after the codec and at the transmitter sight.

 Figure 1

Figure 2 is an example of the processing inserted before the codec. 

Figure 2

If a linear path is possible, then installation of the processing can be at either the studio or transmitter location. Good technical and sonic performance will be possible at either location.

The Components
Examining the transmission path is literally the sum of its parts. This section describes and details each of these components so that a better understanding of the complete system will result. The one commonality to the whole system is the AES/EBU interface. The author will assume that the reader has enough understanding of this standardized protocol, such that further discussion and review is not warranted. AES/EBU should be viewed as the "glue" that ties each of the pieces together.

Digital FM Exciters
Digital FM exciters are the latest entry to the digital path. Capable of incredible modulation performance, the digital exciter offers two forms of signal input. Analog composite (MPX) for the non-digital transmission sight, and AES/EBU.

The analog MPX input to be modulated digitally, requires a very high sampling rate. Consider for a moment that the audio spectrum of FM is 99kHz, the digital exciter must provide a sampling rate of at least 200kHz. That would provide a Nyquist frequency at 100kHz, which would cover the baseband spectrum.

The AES/EBU input accepts the Left/Right signal in the digital format. Because it still is in the discrete Left/Right state, the exciter must perform the stereo generator function. Here is where the story gets interesting.

Consider for a moment the signal at the AES/EBU input of the exciter. It might be a different sampling rate than the exciter is expecting. If so, a sample rate converter is employed to make the proper transition. This can pose problems, as the digital filter in the sample rate converter can generate overshoots to the already tight peak-controlled audio data that is being converted.

As mentioned, the audio will have already been peak-controlled and bandlimited by the audio processor. (The processor can be either analog or digital, it does not really matter for this discussion.) The processor also would have already applied the needed pre-emphasis. Hence, the Left/Right stereo signal only needs the matrixing and MPX encoding for stereo modulation to occur. That is all.

What is present in most digital exciters is yet another low pass filter, potential stereo limiter, and in some cases the addition of, yet again, pre-emphasis. The latter may occur, if the incoming Left/Right audio signal was de-emphasized earlier in the path.

In essence, the signal that only needed to be matrixed and MPX encoded has now had additional elements of conditioning applied to it. This can degrade the modulation efficiency and sonic performance. Let's have a look at why.

Sample Rate Converters (SRC): This device transforms one system sampling rate to another. This becomes necessary when interfacing digital equipment that uses different sampling rates, and thereby permitting compatibility among different systems.

In an example of changing 48kHz sampling to 32kHz sampling, the conversion is accomplished by scaling up, or interpolating the original sampling rate, usually by a factor of ten; then, at the 10x rate 480kHz, filtering the signal with a low pass filter that is set to the Nyquist of the new, desired, sampling rate. This filter is required to '"smooth out" the 10x rate. If it was not used, aliasing products would result. Finally, the signal is scaled down, or decimated by the factor needed, in this case ÷15, to the achieve the new rate of 32kHz. Figure 3 is a block diagram of a SRC. While this sounds quite simple-and basically it is-there are a few issues to consider. Of main interest is the interpolation filter.

Figure 3

All audio processors, both analog and digital, apply some form of overshoot control in the output filtering section. In most designs, this function is performed by an integrated protection clipper working around the final low pass filter.

In each case, the overshoot component can be determined by the "Gibbs Phenomenon"1, which states that an overshoot will occur at one-third the cut-off frequency of any low pass filter whenever a non-linear waveform is passed through it. In the case of broadcasting, the non-linear waveform would be that of a clipped signal.

Knowing that the audio bandwidth used in FM stereo is 15kHz, overshoot components will begin above 5kHz with any non-linear waveform.

Should the slope of the previously described up-sampled interpolation filter appear greater than the slope of the final filter in the audio processor, then output overshoots may result in the sample rate conversion process. Unfortunately, these overshoots are generated after the audio processor. To remove them would require another limiting device-thus the reason for the added limiter in the digital exciter.

Not all sample rate converters will cause overshoots. But in most cases, the filtering used in the sample rate converter will be of a large magnitude in the bandstop rejection area. In all probability, it will be an FIR filter with at least 96dB rejection in the stop-band.

Of interest is the direction of rate conversion. Should the host sampling rate be higher in value, than the incoming rate, chances of overshoot are small. This happens due to the up-sampled filter being set to a broader spectrum than the spectrum of the incoming signal. Potential problems may arise when transforming a larger sampling value to a lower rate, as the example of converting 48kHz to 32kHz sampling. Then the details of the above description apply.

Input Sampling Rate: It is unknown why 32kHz sampling rate is used in broadcast paths, when it has been discussed and demonstrated that 48kHz sampling is far superior in performance. The importance is not so much the added spectrum available with 48kHz sampling, but that 32kHz causes aliasing distortion in specific instances. This is clearly demonstrated by any DSP based audio processor designed before 1997, and extensively detailed by the author in a previous NAB presentation2.

The use of 48kHz sampling as the AES/EBU input rate in the exciter ensures the best sonic performance. In addition, any input that needed to be converted up from 32kHz sampling would not create any overshoot component in the modulator. These factors ultimately benefit the broadcaster.

Integrated Limiter: Due to the above SRC scenario about SRC, most digital exciters provide a baseband limiter to eliminate the overshoot problem. Also, there are certain path configurations that can cause overshoots that do not relate to the sample rate conversion process. Those are usually situations that involve use of a coded STL system where the desire is to insert the main audio processor before the encoder of the STL. It has been shown that employing audio processing in front of a codec can have sonic and modulation performance penalties. The codec issue will be discussed in an upcoming section.

The integrated limiter used in the exciter is combined with part of the stereo generator. This is not a composite clipper, but a time-delay, feed-forward limiter that controls peaks with a zero attack time. Waveforms are controlled with little or no harmonic distortion (T.H.D.) components, but will produce a larger intermodulation (I.M.) level.

Technically, this style of limiter will operate sufficiently when controlling overshoot peaks or as an additional limiter to the audio processing. Sonically, this type of limiter will produce a "busier" sound. It will sound more like a limiter that is operating with "heavy" levels of compression. That is the result of the added I.M. In the audio processing realm, adding more I.M. to an already processed signal is undesirable

Of interest to the author is that a digital overshoot clipper is not employed. It has been proven that digital composite clipping is possible, while maintaining a clean spectrum. Composite clipping produces far less I.M. products, as does a delay limiter, and it will yield cleaner sound for the same amount of limiting/clipping used.

Pre-emphasis: The exciter has the option of adding the required pre-emphasis. Optimally, the transmission system would be set up so that the pre-emphasis is generated once in the audio processor. Then, the pre-emphasized, processed signal is coupled directly to the exciter.

Broadcast audio processors employ pre-emphasis within system architecture. Since emphasized audio must also fit within the imposed modulation limits, the processor employs specialized high frequency control sections that provide both the boost and control of the high frequency energy. In this manner, efficient high levels of modulation are easily obtained since the processor is designed and set to limit any tradeoffs resulting from pre-emphasis and high frequency limiting requirements. Basically, these two sections work in concert with one another to allow pre-emphasis to be employed, and yet control the emphasized energy content.

In situations where a codec STL system and audio processing are inserted before the encoder-the codec must pass "flat" (non pre-emphasized) audio. This requires adding de-emphasis to the output of the processor in order to send the restored "flat" signal to the codec. Figure 4 illustrates this. A flat signal is required by the coder because of the use of masking in the encoding process. Any significant change, or imbalance of the frequency spectrum, can cause the threshold curve of the coding system to possibly have a profound effect on the output of the coded audio3.

Figure-4

These additional de-emphasis and pre-emphasis steps will add modulation overshoot to the total transmission system. To eliminate the added overshoot, another limiter must be employed.

Unfortunately, tests have shown that operating a transmission processor with an emphasized output into a codec will generate audible high frequency distortion. This occurs because the spectral balance to the codec masking process is not spectrally flat, which is what the masker signal wishes to operate on.

Based upon the previous discussion, one can see that it is advantageous to install the audio processing system as close to the exciter as possible. As it allows employing the emphasis in the processing, and using a "flat" input on the exciter. Also, internal limiting in the exciter becomes unnecessary and allows the audio processing system to provide all of the required peak control.

STL Systems
There is a choice in digital STL systems: Linear or coded. The linear systems have mainly been available as "nailed-up" data links, (such as T-1) but recently there has been the introduction of the "uncompressed" radio links as well. In analyzing the STL link, two items are of critical interest: Time delay and coding algorithm.

The first item, time delay, is an issue with either the linear or coded STL system, as there is a propagation delay (in milliseconds) for the audio data to travel from the input to the output of the system. If the delay is excessive, then the on-air talent can not comfortably monitor themselves off-air. This occurs because the off-air audio in their headphones is delayed relative to the arrival time of their voice directly through bone conduction.

The following table, based on real-world tests, was created by a radio engineer to illustrates the effects of time delay for on-air talent:

• 1-3 ms: Undetectable delay.

• 3-10 ms: Shift in voice character audible to person speaking. (comb filter effect)

• 10-30 ms: A slight echo turning to obvious slap @ 25-30 ms.

• 30-50 ms: Disturbing echo, disorienting the announcer.

• >50 ms: Too much delay for live monitoring.

Thanks to Jeff Goode in Indianapolis for providing this information! The key break points seem to be that delays of 5-7 ms are not a problem; from 10-30 most announcers can work live, but anything above 25-30 is annoying. These are real world tests.

While the subject of time delay is being discussed, it must be pointed out that time delay is a cumulative item. Each device that exhibits any delay will add up to produce a total system delay. Should this system delay exceed some of the figures in the above chart, then off-air monitoring will not be possible. In addition to the STL system, audio processors, digital exciters, and even digital modulation monitors can generate delay.

A possible second item for consideration-coding algorithm-is only applicable if a non-linear STL is used. These devices make use of 'lossy' data reduction algorithms to compress the so it can fit within the existing bandwidth of the STL system. While there are a number of specific algorithms from which to choose, most STL manufacturers have made use of proprietary digital formats that are derivatives of prior development. Most common usage has been ISO/MPEG Layer-II, ISO/MPEG Layer-III, apt-x, and Dolby AC-2.

Detailed operation of the above-mentioned algorithms is not needed for this discussion, as each system possesses both strengths and weaknesses for this application.

What is of importance here is to understand that each audio coding algorithm will have a specific sonic effect on the audio. Along with the fact that the signal will also be dynamically controlled, the use of audio coding can have a negative effect on the performance of the audio processor. It has been demonstrated that any use of a coded STL should be done where the audio processor is inserted after the coding. There will be less degradation of the audio, and the processor will perform peak control more efficiently.

Sampling Rate
Stated earlier, the issue of sampling rate must be looked at again: 32kHz is simply too low. With the cost of DSP chips, and converter sets constantly falling in price, there is no reason why 48kHz can not be used. We have now reached a point in the professional audio domain where 96kHz is the desired sampling rate. How will that affect the broadcast community?

Transmission systems thus far have used 32kHz as a base sampling rate, which in turn sets the Nyquist freq. at 16kHz. Considering that conventional FM stereo broadcasting requires 15kHz of audio bandwidth, this leaves only 1kHz of guard band spectrum before the Nyquist point. To facilitate this, a filter of very large magnitude must be employed in order to suppress all energy by at least 96 dB at the Nyquist, or aliasing occurs. This can be accomplished digitally using a finite impulse response filter (FIR). The only drawback is that many "taps" are required within the filter to achieve this level of stopband rejection. The significance of the 'taps' is that for every two taps in the filter, it requires one sample to perform its duty. For a 15kHz FIR filter of this magnitude, 101 "taps" are needed. This in turn results in 50 required samples, equating to 1.56 milliseconds of propagation delay through the filter.

Add up the number of 15kHz filters employed in a 32kHz sampled transmission path, and those alone will create enough time delay to disorient on-air announcers.

A broad question is: Why the use of 32kHz as a base sampling rate? Tests and research show that a base of 48kHz would make all of the aforementioned problems much easier to deal with. The guard band to the Nyquist is much farther out, which in turn moves out the aliasing point. This would allow a final filter with less time restriction, and the propagation delay associated with 48kHz would be faster in itself and makes this rate more desirable.

Most likely, 32kHz sampling was chosen in the past, because there would be more machine cycles available to handle the workload. That would be the only reason to possibly support a lower sampling rate.

Digital Exciter Proposal
In the discussion section about the digital exciter, numerous options were explained about the interfacing possibilities of the audio processor to the exciter. All of them revolve around the usage of the AES/EBU input protocol. In that configuration, the audio data arrives in Left/Right format and requires the exciter to perform the MPX generation.

Question: Why can't the digital audio processor, which has the MPX encoder built-in, connect it's digitally-generated baseband signal directly to the digital modulator of the exciter? This would be analogous to the analog composite input on any exciter.

Figure 5 is a block example of this.

Figure 5

Unfortunately, there is not a standard for transporting the MPX baseband signal, but an "official" standard is not needed. Consider the following: In either the audio processor or the digital exciter, a fast sampled section is needed to accommodate the MPX signal. (Remember we're dealing with a spectrum out to 100kHz.)

Knowing that most DSP families use some form of serial data stream, it is possible to "publish" the needed input data format so that an audio processor can provide that format as a digital composite output. The important factor is that both systems agree on sampling rate. Again, the use of a very fast rate is advantageous, say, 384kHz. That's 8x above 48kHz.

Naturally, this type of configuration would require installing the processing at the transmitter facility, since transporting a digital composite signal of this speed and size would be cost prohibitive. Current generation digital processors provide some form of computer control via modem, or network making processing at the transmitter somewhat less inconvenient.

It is curious that none of the digital exciters available provide-or even propose-an application like this. It provides the best possible coupling to the exciter, and the performance benefits are significant. Imagine having the power of a complete digital processing system and integrated stereo generator that is directly connected to a digital modulator. Now we're talking about super efficient modulation capability. Zero overshoots due to added emphasis, coding, or sample rate converters. That's real power.

Conclusion
The total digital transmission path is capable of providing some outstanding performance results. To achieve this, audio processing must be inserted at the transmitter site, and a "flat" input should be used on the digital exciter. If an STL system is employed, a linear system would be preferable, but a high bitrate coded system is acceptable as long as the dynamics processing occurs after the coding.

As long as the systems design engineer in a broadcast facility is aware of these critical issues, there is no reason why an all-digital broadcast facility can not exist today, providing broadcasting of exceptional quality.

REFERENCES

[1] Baher, H. Analog & Digital Signal Processing, J. Wiley & Sons, 1990

[2] Foti, F. Digital Broadcast Audio Processing: Finally, The New Frontier, NAB Convention, 1997

[3] Mendenhall, G. Pre-emphasis and Limiting Considerations for Audio Processors and Digital Studio-to-Transmitter Link, White Paper, 1995

Top