Return to Omnia hompepage

Broadcast Signal Processing and Audio Coding: Are We Trying to Mix Oil with Water?

Frank Foti
Cutting Edge Technologies
Cleveland, Ohio, USA

In the broadcast industry there has become a plethora of coded audio systems that utilize some form of data reduction and/or compression technique. While this is a benefit for data storage and point to point transmission capability, it is a challenge for a dynamic range signal processor used in broadcasting.

With audio coding located in the broadcast transmission path, as in the case of a digital Studio to Transmitter Link (STL), there are a number of considerations and compromises a broadcaster must face. Considering that non-linear properties of both an audio codec and a broadcast audio processor, are being applied, there can be some on-air audible penalties to be paid. Discussion will show where these non-linear applications can form an unhappy relationship!

Further discussion will reveal the known and possibly unknown pitfalls of audio coding and dynamic signal processing when used together. Since the coded audio signal is now commonplace in the broadcast facility of today, we must examine and possibly rethink the approach of signal processing as used by the broadcaster. With Digital Audio Broadcasting (DAB) becoming a reality, how will the industry cope with dynamic signal control, and coded audio, now and in the future?

Background
Signal processing for broadcast transmission can almost be thought of as a combination of engineering, art, and science. It is through a mixture of these three ingredients that all transmission processors are derived. Please understand, that while any specific system will provide the same form of technical parameter specifications, each are designed and developed using different forms of engineering art. The following sections explore the ingredients of this mixture.

Engineering
The engineering aspect is practically a given. The use of adequately designed transmission system processors insures that all modulation control, signal parameter standards, and government limitations are adhered to. These would address the issues of maximum modulation level that corresponds to either maximum carrier amplitude, or frequency deviation. As well as guidelines for pre-emphasis standards and bandwidth limitations.

Most transmission processors exist as multiple band signal devices. They divide the audio signal into a preset number of frequency bands, then perform processing functions such as automatic gain control (AGC), equalization (EQ), dynamic limiting, and hard limiting, or clipping. Within the system, are sections designed to provide the required pre-emphasis and bandwidth limiting for the respective transmission medium. Some processors, in addition, offer the stereo generator/coder function wherever multiplex FM Stereo is used.

The job of the processor is not only to prevent over modulation of the transmitter and insure the proper audio bandwidth. But, it is also used as a tool to generate a specific aural presentation of the programme. This occurs as the multiband structure operates as a form of automatic equalization to the signal. This yields a consistent spectral balance of the signal content from programme source to programme source.

Art
Art covers the abstract and expressionistic perspective. In most broadcast applications, this might be viewed as the most important component. Here is where the aural signature is created, also known to those in the broadcast community as "the sound", or "the sound of the station." This is created less by engineering means, but more so from the adjustment and manipulation of the signal processing system. Almost the consideration of an artform or "black magic." Although each transmission processor is capable of achieving the same engineering specifications, it is in the adjustment and operation of these devices where the sonic differences occur.

It has become the normal state of operation in the USA, and now the World over, that a station's sound is judged from two key factors, quality and perceived loudness. Quality refers to the purity of the transmitted audio signal. How much the transmission mirrors the sound of the originating programme. Perceived loudness is more of a competitive term. This is the subjective opinion of how much louder a specific broadcast station is judged when compared to another.

There has been, and continues to be a belief in this industry that perceived loudness is desired. The psychological belief held by many broadcast programmers who feel that the listener will perceive a louder station to be more attractive to listen to when tuning across the dial. While this issue has been discussed and debated within the broadcast industry for many years, it still is a crucial requirement of any transmission processing system to generate a high degree of loudness.

There is a compromise between the two. The more that perceived loudness is desired, quality will degenerate. This happens because of the processing tradeoffs involved. As more processing is employed to reduce the peak to average ratio of the programme material, perceived loudness increases.

Meanwhile, dynamic range decreases and harmonic distortion, (THD), and intermodulation distortion (IMD), or a combination of both will increase. Hence the reduction in quality.

While this paper is not an essay on the artform of broadcast signal processing, it is the humble opinion of this author, that a pleasing compromise does exist that will produce a high level of perceived loudness while maintaining an equal degree of quality!

Science
Another key ingredient is science. In order for any processing system to accomplish the two previous discussed issues, there are a number of scientific topics that must be dealt with. One of these is peak level overshoots. Since stringent peak level modulation limits exist in most Countries, it is the requirement of the processing system to maintain all overshoot components of a signal to at least 2% of the maximum permissible level. This is easier said than done!

Another important requirement is bandwidth. Dependant on the broadcast medium, AM or FM, the audio bandwidth can be as narrow as 5kHz for short wave, and as broad as 15kHz for FM stereo. Again, most Countries have bandwidth limitations that must be adhered to. While these issues were mentioned in the above described engineering section, it is the scientific conditions that deserve mention in order for a better understanding of the total transmission system.

Overshoots and bandwidth, while two separate issues, are actually a by-product of one another. As discussion will soon reveal, a non-linear waveform must be able to pass through a specified bandwidth filter, and the result must be devoid of overshoots and out of band products. An analogy to this is the old parable of trying to insert a square peg into a round hole!

Most processing systems employ some form of hard limiter for final peak level control. Either as a simple clipper, or as some form of integrated limiter with associated distortion control, peak ceiling levels are maintained. Unfortunately, this non-linear process does produce harmonic products that will fall out of the permissible passband. In order to control, or remove these products, some form of filtering must be employed. Knowing that in most cases, the bandwidth limitations require severe attenuation of out of band products, a filter with a high degree of cutoff must be utilized. A filter with an elliptical response is sufficient for this.

Here is where the paradox begins. All filters produce overshoot. Known as the Gibbs effect[1], the overshoot characteristics can be calculated to begin at one-third of the cutoff frequency. This is the exact point in the filter passband where the third harmonic is removed from any non-linear signal that, is equal to, or greater in frequency above the one-third cut off point. This can be exhibited when trying to pass a square wave through a low pass filter. For signals of a sinusoidal nature, there is no problem as each frequency will contain little, if any harmonic content. Whenever final limiting is employed by the processor, harmonics generated from the non-linear process will be reproduced as square waves or some derivative thereof. Here is where the Gibbs effect comes into play.

Adding to this equation is the affect of the amount of overshoot from the filter employed. An elliptical filter by its nature will possess a large amount of non-linear group delay. Overshoots as much as 70.7% can be realized in any filter with a non-linear group delay. For overshoots to be minimal, a corresponding group delay equalizer must be used.

The Gibbs effect research indicates that any low pass filter with a uniform group delay will provide an overshoot component, of a non-linear waveform, that will not exceed 1.09 the original amplitude or 9% overshoot. Realizing that prior discussion reveals that permissible system overshoot to be less than 2% of the maximum limit, there is still a 7% disparity in the overshoot component. If the system filters are not group delay compensated, the Gibbs overshoots can reach 1.707 of the original amplitude, or 70.7% overshoot!

Some systems deal with this disparity by adding additional clippers and subsequent filters to control any residual overshoot. Practice has indicated that while this process does prove reliable, there are audible penalties to be paid as the additional clipping can be audible under normal operating conditions.

Others, utilize proprietary engineering art to scientifically and mathematically eliminate system overshoot disparity. This approach actually takes advantage of the Gibbs overshoot component to be used for both overshoot control and the reduction of final limiting induced distortion products. It is again the humble opinion of this author, and also as a designer, to use the latter method.

Taking all of the above discussed scientific issues into account, the end result of any processing system is a precise signal that adheres to exacting amplitude control, audio bandwidth with minimal or no overshoots. While lengthy in nature, this background information will be used as reference material during the discussion of the insertion of the audio codec into the transmission path.

Analog Studio Transmitter Link (STL)
The analog Studio Transmitter Link (STL) is a time tested and proven method for connecting the audio signal of a broadcast studio to the transmitter. This system can be discussed in two forms, discrete or composite channel.

The discrete channel makes use of two independent STL frequencies, or links for transmitting multiple channels of audio. In a stereo system, the left and right channels would use independent discrete STL channels. Since stereo broadcast transmission requires at least 53kHz of bandwidth, and discrete systems provide only 15kHz audio bandwidth, this usually requires that the entire processing be located at the transmitting location. Although it is possible to locate the processing for the discrete channels at the studio, and employ the stereo generator/coder at the transmitting location. For the latter to operate properly, the discrete STL channels must be able to provide zero overshoot modulation capability as well as provide zero group delay within each 15kHz passband. Figure-1 is an example of a discrete STL system.

.


Figure-1, Discrete Analog STL System

Composite STL systems are capable of passing the 53kHz stereophonic multiplex signal. An advantage of this system is that it uses only one STL frequency or link, and the processing system can be located at the studio location. Of importance here is the linearity aspects of the composite modulator and demodulator of the system. As long as group delay is minimal through the 53kHz passband, precise modulation of the composite signal is possible through this type of STL system. The disadvantage here is that over a long path, any signal degradation will result in added noise in the multiplexed signal when inserted into the FM exciter/transmitter. The composite STL is illustrated in Figure-2.


Figure-2, Composite Analog STL System

It is then desirable for any transmission system to maximize the performance of the STL system. As this will produce low noise, distortion, yet optimize modulation abilities.

Digital STL
Over the last ten years, the emergence of the digital STL system has taken place. With this, the realm of possibility exists for a total digital broadcasting facility. This would be inclusive of a digital mixing console, signal routing, processing, STL, and completing with a digital modulator for the transmitter. With Digital Audio Broadcasting (DAB) having begun in calender year 1995, this possibility has in fact become a reality!

The existing analog broadcasting systems are all taking advantage of the digital STL systems, as they too have started to replace older analog equipment with newer digital counterparts.

Enter the Codec
For a STL to transport a digitized high quality stereo audio signal in a linear real time format, it would require a capacity of about 700 kbps for each audio channel[2]. This would consume more than the allowable bandwidth of existing RF microwave STL systems. Although through the use of a dedicated digital communications link such as a T-1 service, this is possible. Unfortunately, link availability and cost has made this option more difficult to make use of. As with the case in the analog STL domain, the cost of bandwidth for either analog or digital service still remains quite high.

A technique that is very popular is the use of the audio codec to reduce the data requirement for a digital STL system. These devices make use of "lossy" data reduction algorithms to compress the bitrate down to a size that will fit within the existing bandwidth of a STL system. While there are a number of specific algorithms to choose from, most STL manufactures have made use of proprietary digital formats that are derivatives of prior development. Most common usage has been done with ISO/MPEG Layer-II, ISO/MPEG Layer-III, apt-x, and Dolby AC-2.

The basic operation of the "lossy" data reduction system stems from the use of a technique known as perceptual coding. Simply stated, the basic principle of which is a masking signal, or masker, that exists around a threshold curve which happens to follows that of the human auditory system. Any signal which falls below this threshold curve is basically discarded. In the digital domain, any audio data that would fall below the threshold curve is data that is then discarded by the algorithm, and thus data reduction is accomplished.

Detailed operation of the above mentioned algorithms is not needed for this discussion, as the intent will be to focus upon the actual affects that data reduction algorithms have upon signal processed audio. Suffice it to say that each system does possess many strengths and possible weaknesses for this application. It is not the intent of this discussion to provide choice for any audio coding technique to be considered as a standard or preference when compared to one another.

Application & Usage
The coded STL is capable of being inserted into an existing analog composite STL path. The insertion of an encoder at the STL transmitter point will accept discrete left & right audio. The receiver outputs a data signal to a comparable decoder to reconstruct the individual channels. Sampling rate and transmission speed (bitrate) are all configured for each specific system.

Pre-emphasis
All broadcast applications make use of some form of pre-emphasis boost. For FM broadcasting, North American Countries utilize a 75us emphasis, whereas 50us is used elsewhere the World over. Medium-wave, or AM makes use of an optional modified 75us emphasis.

Transmission signal processors employ pre-emphasis within their system architecture. Since emphasized audio must also fit within the imposed modulation limits, the processor employs specialized high frequency control sections that provide both the emphasized boost and control of the high frequency energy. Using this manner, efficient high levels of modulation are easily obtained as the processor is designed and set to limit any tradeoffs due to the pre-emphasis and high frequency limiting requirements. Basically, these two sections work in concert with one another to allow pre-emphasis to be employed, and yet control the emphasized energy content.

One of the critical requirements of any codec is that the audio signal must be "flat" in spectral origin. The term "flat" is used here in the context of a signal where no additional EQ has been added to the original component. This is due to the operation of the masker signal used in the coding process. Any significant change, or imbalance of the frequency spectrum can cause the threshold curve of the coding system to possibly have a profound affect on the output of the coded audio[3].

Now, think back to the earlier stated description of the transmission processor. It is a multiband processing system that is providing a constant leveling and equalization of the signal, along with pre-emphasis. This would seem to be in stark contrast to that of what a coding device is looking for! While there has not been any formal substantiated testing performed, casual listening tests have revealed that operation of a highly processed signal through a codec will reveal many sonic weaknesses. These can be in the form of losses in clarity, detail, and stereo field localization, along with an increase in audible distortion.

Unless there is careful thought and planning in the design of the digital STL path, here is where trouble can begin!

Digital Transmission Path Configurations
Previous discussion would lead one to believe that the only possibility to configure a digital path would be to install all processing at the final transmission point and utilize the digital path as a "flat" signal pipe to the processor, Figure-3.


Figure-3, Digital STL System

There are a number alternatives that will be discussed. Each will posses advantages and drawbacks.

Probably the most logical form of setup is the recently mentioned configuration where the processing is located at the transmitter facility thereby using the STL as a digital pipe. The advantage is that the audio signal is provided to the coded STL in a "flat" domain, and the subsequent processing system is located as close to the transmitter as possible. This will allow it to yield the most efficient modulation performance, as the final output is then coupled to the transmitter in a manner where nothing can alter the amplitude and bandwidth limited signal.

The drawbacks require all adjustments to made at the transmission facility. (Although a number of processing systems now available offer control via computer and modem.) Another difficulty is when using a multiple transmitter network. This is very popular in Europe. A single studio facility is used to provide programme to an entire Country or region. Whereas a single encoder is used to feed a satellite uplink or terrestrial microwave, each network affiliate location must then install a complete processing system. While it makes for the best possible transmission system, it is both costly, and difficult to perform adjustments when they are needed.

A first alternative would be to install the discrete processing at the studio, and provide the stereo generator/coder at the transmitter, Figure-4.


Figure-4, Digital STL System: Alternative Configuration

The benefit here is that processing adjustments can be made from the originating location, and this would also provide an equal affect to any additional affiliates, if used in a network. Also, it reduces equipment cost.

Here the drawback can affect modulation and sonic performance. Earlier discussion revealed that the input to the encoder must be "flat". Since the processing system will employ pre-emphasis, a complementary de-emphasis must be switched into the output so that the resultant signal is "quasi-flat". Unfortunately, this quasi-flat signal will still posses any changes that were implemented by the multiband operation. These might be in the form of multiband EQ, or even more importantly, any non-linear functions applied such as final clipping.

The true test of any coded STL that is setup in this configuration will be to see how well it can pass the filtered non-linear signal from the processor. Considering that large amounts of processing will generate a high level of bandlimited harmonic content, the exercise for the codec to perform is to be able to reproduce this bandlimited signal at the decoder outputs. Any alteration of this signal will produce audible distortion, and overshoots to the system. This unfortunately degrades the performance of the processor as the sonic quality of the audio suffers, as well as a loss of modulation efficiency. This is due in part to the level of codec generated overshoot that now must be compensated for. In this form of configuration there are two possible methods to try and make best of the situation.

First, is to employ another, yet more, simple peak limiter at the transmitting location, or each network affiliate. This would control any systemic overshoots that were created. While it might not relieve the added distortion from the transcoded non-linear process, it will provide good modulation efficiency. This limiter must also restore the needed pre-emphasis to the signal that was de-emphasized prior to coding. It should be noted that additional generations of pre-emphasis does contribute yet another form of signal conditioning that may have negative sonic implications. These exist because most secondary limiters will utilize another non-linear final limiter, and that will add more distortion to an already prior limited and possibly distorted signal!

A second alternative, but one that requires a bit more engineering, is to split apart the processing system into two separate sections. Realizing that the audible problems from the coding process are mostly related to the non-linear operations of the processor. It would make logical sense to split the non-linear section off of the processor and locate this at the transmitter location, Figure-5.


Figure-5, Proposed Alternative Digital Transmission Path

Thus keeping the AGC and dynamic limiting at the origination point. In this manner, almost all of the processing adjustments can be made at the head end, while the afilliate locations will provide the needed final limiting, pre-emphasis and stereo generation/coding. By minimizing the amount of non-linear signal needing to be passed through the codec, the qualitative content of the signal will remain high, while also providing the optimum coupling of the processing system to the transmitter. The only drawback here, is that will require some thought on how to break apart the processing so that the respective sections are appropriately in the right locations.

This would appear to be the best compromise as it allows for the following to occur. The codec can operate on a "flat" audio signal. The need for pre-emphasis is limited to one generation, and most importantly, all non-linear processes are also performed once and located at the optimum location which is the transmitting facility. In addition, equipment cost can be kept to a reasonable level.

In this day and age where competitive signal processing is a given, generating additional distortion to the transmission path would seem to be a sort of fatalistic approach when considering the value that is placed upon sonic presentation of a radio station. If this is due in part to reasoning why an installation must be accomplished in a digital format, then the question must be asked, is the digital path an improvement?

Therefore it is the opinion of this author, and designer, to promote the use of either the initial configuration of utilizing the processing at the transmitter location, or the use of the latter part of the second alternative. In this manner the benefits of the digital path can be realized, and more importantly the integrity of the "station sound" can not only be maintained, in fact, it might be improved!

DAB, DSP, and the Future!
As mentioned earlier, DAB is upon the World. It has been through the research, development, and deployment of DAB that audio coding has become commonplace in this industry. While still very much in the domain of testing and research, it will no doubt become a factor in the world of radio broadcasting.

Also, the issue of transmission processing has yet to be addressed in the realm of DAB. The thinking here is that some form of processing will become adapted. Considering that the need for any pre-emphasis will no longer be needed or required, the possibilities for uses of signal processing in DAB might be endless! Suffice it to say that generating a perceived loud "station sound" will rise again on the DAB horizon.

In world of processing, the use of DSP has also finally progressed into the transmission path. To date, DSP transmission processors have only been able to approximate the performance of their analog counterparts. So far each DSP device, developed prior to 1996, has fallen short on good sonic performance in competitive signal processing. Be not dismayed, with continuing research and developement in the DSP domain, there will no doubt exist one day a digital signal processor for broadcasting that can provide greater performance than that made possible by analog, and initial digital systems of the present.

In conclusion, the coded STL system has already made a sizeable impact on the broadcast industry. Through proper deployment, these systems can be very beneficial in delivering high quality audio signals to transmitting facilities. This can only happen if the user understands the possible pitfalls of mixing the processing and codec together. Hence the title of mixing oil with water! As coding algorithms improve and DSP processors work to prosper, the pitfalls of the present will simply be a nasty thought of yesterday!

References

[1] Oppenheim, A. and Willsky, A. Signals and Systems, Prentice Hall, 1983

[2] ISO/IEC 11172. International Standard: Coding of moving pictures and associated audio for digital storage media up to 1.5 Mbit/s.

[3] Mendenhall, G. White Paper Pre-emphasis and Limiting Considerations for Audio Processors and Digital Studio-to-Transmitter Link, 1995

[4] Wegel, R.L. and Lane, C.E. Physical Review, The Auditory Masking of One Pure Tone by Another and Its Probable Relation to the Dynamics of the Inner Ear, 1924

Top