Return to Omnia hompepage

Digital Broadcast Audio Processing: Finally, the New Frontier

Frank Foti
Omnia Audio
Cleveland, Ohio


Abstract
DSP (Digital Signal Processing) based broadcast audio processors have been, at best, a digital clone of analog counterparts. DSP is a very powerful technology. Why then, has it been so difficult to create successful-sounding digital signal processors for broadcast? This paper addresses this issue and discusses advancing the signal processing artform. Induced aliasing distortion, time delay, and sampling rate were never a concern in the analog signal processor. With DSP however, these issues can create added audible distortion, propagation delay, and/or overshoots. Algorithms that intelligently conform to the natural dynamics of an audio waveform will also be examined. Finally, with the advent of DAB and Netcasting, we'll explore processing for these new and important mediums.

To Boldly Go ...
In borrowing part of a phrase from Capt. James T. Kirk of the Starship "Enterprise," processing in the digital domain meant going where it had never been before. The introduction of DSP for transmission processors for broadcast promised to open a new frontier. Although the first wave of digital processors became available almost seven years ago, this new territory remains largely unexploited. What happened?

Initial attempts at digital processing were mere 'clones' of established analog designs. To those versed in the technical 'black-magic' of audio processing, developing a digital processor requires more than just porting over an analog design into DSP. (More on this later.) If key issues aren't dealt with, the result will add audible distortion. The following discussion is based on extensive experience developing a DSP-based processor.

In the quest for the fully digital broadcast facility, concerns involving transmission processors must be addressed. Codecs implementing an AES/EBU interface and used in digital STL systems can audibly degrade the transmission path. Depending on the coding algorithm and bit rate employed, codecs may be a source of distortion. While placement of the codec as well as the signal processor in the broadcast transmission path are critical, such discussion is beyond the scope of this paper. Suffice it to say, coded STL must be taken into account.

Also of concern are sample rate converters used in the AES/EBU interface. We will discuss the possible overshoot problems that may arise when transposing a sampling rate. Early research discovered that a processed signal that was properly peak controlled by the signal processor had the potential to cause modulation overshoots. Propagation delay can also become a problem with digital processing. This is related to the amount of time required to complete all of the processing tasks. During this period, there is an input to output delay. If this delay is too long, live monitoring off the air becomes quite difficult. For announcers especially, such delays can be very distracting. Further discussion will detail what is involved in resolving these issues.

Given the problems, is processing in the digital domain of any benefit? Yes. There are solutions along with a multitude of benefits from numeric-based processing.

Research and Design Criteria
To start, key performance criteria were established: • the digital system must operate in a transparent manner in relation to the audio signal, • the digital processor must sound at least as good as, or better than, its analog counterpart, • no additional coloration to the audio from A/D, D/A, sample rate converts, jitter, and/or the AES/EBU interface must result.

To determine the strength of the issues presented here, subjective listening and technical tests were needed. Analog and digital processing systems had to be compared, along with aural evaluations of the converter systems. Most important was evaluating analog to digital processors. Testing was meant to identify existing weaknesses, resolve such weaknesses, and help develop new algorithms.

Subjective Listening Tests
This phase of testing was conducted using the following listening conditions. The device under test was provided a clean program source from either a CD or DAT tape. Output was monitored either discrete left/right, or multiplex via a stereo monitor, amplified and heard through well known reference monitor speakers. For direct A/B processor comparison, a switching device was used that could select either multiplex A/B, or discrete A/B processors. Much the same as would be done in auditioning processing equipment by a radio station. Outputs of the units tested were peak monitored using both a modulation monitor and oscilloscope. For fair and honest evaluation, the peak levels of each system were always set at exactly the same level.

In direct comparisons of digital and analog processors, both operating with equal amounts of processing, listening tests revealed that digital systems seem to generate additional annoying distortion characteristics. In addition to the known artifacts generated in analog processors, there seems to be an added distortion signal. Where most processing artifacts can usually be described in some form of Total Harmonic Distortion, (THD) or Intermodulation Distortion, (IMD) this artifact appears as something completely different. In adjusting audio processing it is generally understood that an increase in IMD is derived from an increase in dynamic control characteristics. Items like increased release time of compressors/limiters, ratio, or amount of gain control will contribute to this. An increase in THD will usually result from an increase in hard limiting, or final clipping.

'That Digital Sound'
Further listening reveals that this artifact seems to produce a 'harsh' or 'metallic' effect on the 'presence' and 'brilliance' ranges of the audio spectrum. Almost as if some additional synthetic component is being added into the signal. With moderate amounts of processing, cymbals sound like 'nails on a blackboard', 'S' sounding material begins to sound like paper tearing, and high frequencies lose definition and detail! All of which has been characterized by engineers as 'that digital sound.'

For example, in the song 'Big Love' by Fleetwood Mac, at precisely eleven seconds into the song, there is a moderate crash of the cymbals. The digital processor repeatedly distorted these cymbals as compared to the analog system. Even a reduction in the amount of processing by the digital system only reduced this distortion effect, never eliminated it. To try and illustrate, this effect sounded like some form of 'smashing glass' instead of the crisp, detailed crash of the cymbals.

By contrast, the analog system did not have this problem. Only with the processing increased, would the cymbals distort, but it would happen in the known 'spitting' sound of THD generated from the increase in hard limiting. Even this form of distortion was 'easier on the ears' when directly compared to the digital distortion. The reader may want to use this song as a test source. In the listening tests done while researching this topic, there were multitudes of programs that would cause this effect. What is it? What causes it? Is it a question of primitive digital design, or is it a technological weakness? Considering the numerous attempts at digital processing, these questions had to be answered before any development could proceed!

This problem was perceived during tests performed on numerous digital processors in comparisons with numerous analog processors. In each case, 'that digital sound' was observed.

Technical Tests and Research Data
As a researcher, engineer, and designer working with signal processing, I have a dual view of technical testing and research data. Where the data gathered from research and conducted tests can be used to prove or disprove a theory, it offers a subjective limit as to what may or may not 'sound' acceptable under dynamic processing conditions. Therefore testing and research may be used as a tool to aid in assisting 'possible' benefits or drawbacks to problems, issues, and answers. In the end, tests were developed that would break down ideas, and algorithms into the smallest common denominator so that dynamic performance could be monitored or judged.

The first phase of testing was an effort to discover the cause of that problem identified as 'that digital sound'. Since most processors are dynamic in nature, there are not any specific, static tests that will provide common results. Most systems can be evaluated with weighted noise, timed pulse bursts, or IMD signals. Frequency response can be measured, but usually with levels that are below the processing threshold to avoid generating leveling and phase errors dues to the gain control action.

The Culprit
For this presentation, a digital processor operating with a 32kHz sampling rate was evaluated. A system like that used in conventional FM broadcasting, and consisting of AGC/dynamic limiting, emphasis, and the final limiting low pass filtering function was employed. In performing an audio sweep, an interesting item was noticed upon spectral analysis of the frequency range. A large level of aliasing components were observed above 4kHz! These aliasing signals would fill the entire spectrum starting at 15kHz, and work their way down the complete spectrum. This was beginning to get interesting! Figure-1 is an example of a 5kHz tone that has been clipped in the digital domain, and now produces aliasing products.



Figure-1

As can be seen in the diagram, there are aliasing products sitting near the fundamental of 5kHz. As frequency increases, more products develop in quantity, and size. At 12kHz the aliasing products are almost as significant in value as the fundamental. Now try to consider what this picture will look like if music program were used in place of a single audio tone.

After making some adjustments to the processor with the frequency sweep applied, it appeared that reduction of the final limiting would reduce the aliasing substantially. Reducing the processing on the upper frequencies totally eliminated the problem. What was this indicating? Considering that aliasing distortion is caused by signal energy that is trying to exceed the Nyquist frequency, it would appear to indicate that the final limiting seemed to be the culprit. Since the final limiting function does generate harmonic content, it is likely that energy exceeding the Nyquist frequency would result. Further spectral evaluation proved this to be true.

In a subjective test to confirm or reject the research data, music was applied to the processor. With the final limiting operating in the 'normal' range, the 'metallic' sound was evident. Upon reducing the amount of final limiting, the 'metallic' sound disappeared! This seemed like the 'proof in the pudding' for the cause of 'that digital sound'. By contrast, in the analog processor, it is the final limiting function 'where the rubber meets the road' with regard to clarity and loudness. If for some reason the digital system could not utilize a moderate amount of final limiting without clarity, then no loudness benefit would result!

Further testing revealed that indeed the final limiting function generated the majority of the aliasing distortion. However, a few other processing functions were found to be possible contributors as well. Certain cross-over designs and/or filter banks, along with system headroom are all areas that must be designed properly or processing induced aliasing will occur.

Also observed were certain processing time constants. If a timing signal was operating at a rate that exceeded the Nyquist frequency, aliasing would be generated. Given the system discussed here, any timing signal exceeding 62.5 µs (1/16000kHz) would create an aliasing component. With the problem found, what are the options or remedy?

Sampling Rate: Can of Worms?
When discussing anything related to aliasing, the sampling rate must be part of the equation. It is well known that increasing the sampling rate, will raise the point at which aliasing will occur. The question now becomes, how far must the sampling rate be increased to eliminate processing induced aliasing? Testing and research indicated that with a 32kHz sampled system, a multiple of at least 4 times the sampling rate, 128kHz, would be sufficient for broadcast transmission purposes.

Creating a 128kHz sampling rate can be done in one of two ways: Using a high speed A/D converter operating at the 128kHz rate, or up-sampling the 32kHz rate by a factor of four to create the new higher rate. The latter is preferable as it allows use of the industry standard A/D conversions that support popular 32kHz, 44.1kHz, and 48kHz rates. A converter at 128kHz is available but is generally more expensive and requires additional ancillary input filtering, further adding to the cost. In addition, there is no need to operate the entire system at the higher sampling rate, since that would reduce the amount of machine cycles in the DSP.

This points up another problem: When a faster sampling rate is required to remove aliasing distortion, how much DSP power is compromised to accomplish the goal? The obvious answer is 4x the power, but with all of the final filtering and overshoot control needed, is this the most efficient method to rectify the situation? Of importance here is the final low pass filter. Since it must provide ample bandstop rejection in the 16kHz Nyquist region, a filter of high magnitude is required. This will take multiple machine cycles in itself at 32kHz sampling, with 128kHz, it will be 4x the requirement. Might there be an alternative to this process that will save machine cycles, yet accomplish the same result, eliminate aliasing?

Why 32kHz Sampling Rate?
Another issue for discussion is the base sampling rate itself. Digital processors thus far have all used 32kHz as a base sampling rate, which in turn sets the Nyquist at 16kHz. Considering that conventional FM Stereo broadcasting requires 15kHz of audio bandwidth, this leaves only 1kHz of guard band spectrum before the Nyquist point. To facilitate this, a filter of very large magnitude must be employed in order to suppress all energy by at least 96dB at the Nyquist, or aliasing occurs. This can be done in DSP using a finite impulse response filter (FIR). The only drawback is that it will require many 'taps' within the filter to achieve this level of stopband rejection. The significance of the 'taps' is that for every two taps in the filter, it requires one sample to perform its duty. For a 15kHz FIR filter of this magnitude, it will need 101 taps. This in turn results in 50 required samples which equates to 1.56 milliseconds of propagation delay through the filter.

It must be noted that even when up-sampling, where a 'new' Nyquist frequency would now preside at a multiple of the original, the problem still remains. This is due to the down stream requirements of the AES/EBU interface. With an up-sampled signal operating internally within a host system, higher speed D/A converters can be used in the conversion process to analog without overshoots and distortion. On the other hand, the AES/EBU is a standard protocol that will only support a system sampling rate up to 48kHz. Therefore any filtering that must be done in the up-sampled domain must still adhere to the original Nyquist frequency or aliasing will result. In the case of this discussion, that frequency remains 16kHz. Still at a faster sampling rate, say 128kHz (4x the original) the number of FIR filter taps remains the same as described above.

A broad question posed to the global processing forum is why the use of 32kHz as a base sampling rate? I think, based on tests and research, that a base of 48kHz would make all of the aforementioned problems much easier to deal with. The guard band to the Nyquist is much farther out, which in turn moves out the aliasing point. This would also allow a final filter with less time restriction. Coupled with the fact that the propagation delay associated with 48kHz is much faster in itself and makes this rate more desirable.

My best guess as to why 32kHz sampling was chosen in the past, is that at that rate, there would be more machine cycles available to handle the workload. That would be the only reason to possibly support a lower sampling rate.

Alternative Anti-Aliasing Limiter Method
Since it is now apparent that aliasing, sampling rate, and machine cycles are all of importance in digital processing, what alternative is there that might allow the best performance, and yield the most efficiency? The answer is within the method used to accomplish the final limiting function. Through a proprietary process researched and developed by the author, a mathematical analysis provided a means to accomplish the final limiting function, without aliasing, and at the base 48kHz sampling rate!

Considering that the math involved exceeds the scope of this paper, and that an entire presentation could be based on explicit digital processing design alone, in depth discussion of this analysis is best suited for another forum. Suffice it to say that this alternative method removes all of the previously discussed problems of digital processing. Should the reader desire more detail of this analysis, please contact me for more information.

Confirmation of performance of this alternative method was achieved with spectral analysis and subjective auditioning of music with this new function employed. In aural monitoring it was decided that the 'metallic' digital aliasing distortion component mentioned earlier disappeared! Now it is possible to define what 'that digital sound' is, and more importantly how to eliminate it!

Sample Rate Converters
Another innovative device used in the digital realm is the sample rate converter. This function will transform one system sampling rate to another. This becomes necessary when interfacing digital devices that use different sampling rates, and thereby adding compatibility among different systems.

This function is accomplished by scaling up, or interpolating the original sampling rate, usually by a factor of ten. Then, at the 10x rate filtering the signal with a low pass filter that is set to the Nyquist of the new desired sampling rate. Finally, the signal is scaled down, or decimated by the factor needed to the achieve the new rate. While this sounds quite simple, and basically it is, there are a few issues to consider.

All transmission processors, both analog and digital apply some form of overshoot control to the output filtering section. Our concern is not the method used, rather that control is achieved. In most designs, this function is a form of integrated protection clipper working around the final low pass filter to obtain control. In each case the overshoot component can be calculated as a product of what is known as the 'Gibbs Phenomenon'[FF1]1, which states that an overshoot will occur at one-third the cut-off frequency of any low pass filter whenever a non-linear waveform is passed through it. In the case of broadcasting, the non-linear waveform would be that of a clipped waveform. Knowing that the audio bandwidth used in FM Stereo is 15kHz, overshoot components will begin with any non-linear waveform above 5kHz. In this example, this would effect any signal above 5kHz that was clipped. Should the slope of the previously described up-sampled interpolation filter appear greater than the slope of the final filter in the audio processor, then output overshoots may result! Unfortunately, these overshoots are generated after the processing unit. To remove them would require another device.

This does not necessarily indicate that all sample rate converters will cause overshoots. But in most cases the filtering used in the sample rate converter will be of a large magnitude in the bandstop rejection area. In all probability it will be an FIR filter with at least 96 dB rejection in the stop band. Also of interest will be the direction of rate conversion. Should the host sampling rate be lower in value, than the transformed rate, chances of overshoot are small. This happens due to the up-sampled filter being set to a broader spectrum than the spectrum of the host signal. Potential problems may arise when transforming a larger sampling value to a lower rate. Then the details of the above description apply.

Processing and Coded STL Systems
A technique that is very popular is the use of the audio codec to reduce the data requirement for a digital STL system. These devices make use of 'lossy' data reduction algorithms to compress the bitrate down to a size that will fit within the existing bandwidth of the STL system. While there are a number of specific algorithms to choose from, most STL manufactures have made use of proprietary digital formats that are derivatives of prior development. Most common usage has been done with ISO/MPEG Layer-II, ISO/MPEG Layer-III, apt-x, and Dolby AC-2.

Detailed operation of the above mentioned algorithms is not needed for this discussion, as my focus is on the actual effects that data reduction algorithms have upon signal-processed audio. Each system possesses many strengths and possible weaknesses for this application. It is not my intention to advocate any audio coding technique to be considered as a standard or preference when compared to one another.

Dealing with Pre-emphasis
All broadcast applications make use of some form of pre-emphasis boost. For FM broadcasting, North American Countries utilize a 75µs emphasis, whereas 50µs is used elsewhere the World over. Medium-wave, or AM makes use of an optional modified 75µs emphasis.

Transmission signal processors employ pre-emphasis within their system architecture. Since emphasized audio must also fit within the imposed modulation limits, the processor employs specialized high frequency control sections that provide both the emphasized boost and control of the high frequency energy. In this manner, efficient high levels of modulation are easily obtained since the processor is designed and set to limit any tradeoffs resulting from pre-emphasis and high frequency limiting requirements. Basically, these two sections work in concert with one another to allow pre-emphasis to be employed, and yet control the emphasized energy content.

One of the critical requirements of any codec is that the audio signal must be "flat" in spectral origin. The term "flat" is used here in the context of a signal where no additional EQ has been added to the original component. This is due to the operation of the masker signal used in the coding process. Any significant change, or imbalance of the frequency spectrum can cause the threshold curve of the coding system to possibly have a profound effect on the output of the coded audio2.

At issue here are what the coded STL encoder/decoder requires and what the processing system will provide. A paradox exists because the processor is designed to output a pre-emphasized signal, and the codec is designed to accept a "flat" signal. To accomplish this, the output of the processor must be de-emphasized so that the output signal is returned to a "quasi" flat form.

The weakness that this function creates, is that the output of the coded STL must restore the emphasis to the signal. Thus adding another generation of emphasis which might add some distortion, but in all probability will add modulation overshoot to the total transmission system. To eliminate the added overshoot, another limiter must be employed. Unfortunately, tests have shown that operating a transmission processor with an emphasized output into a codec will generate audible high frequency distortion. This occurs because the spectral balance to the codec masking process is not spectrally flat, which is what the masker signal wishes to operate on.

A discussion of codecs in the transmission system exceeds the scope of this paper. For further reading, see a paper by the author presented at another industry forum3.

What about That New Frontier?
So far, the focus has been on finding the weaknesses of and remedies for prior digital implementations. The most important of which was the discussion of the non-aliasing limiter. Although an accomplishment I believe that resolving this problem only brought the technology up to date. Now, how might this technology finally move forward?

Phase Linear Dynamically Flat Cross-Over
A topic of vast interest in any multiband signal processor is the cross-over network. The goal is to achieve maximal flat response, with gain control employed, and maintain as linear as possible phase response over the entire spectrum. Easier said than done!

In the analog derivative this was virtually impossible. With gain control active, phase errors between audio bands would develop due to the difference in propagation delay of each cross-over filter. As gain levels would shift near the cross-over frequency, additional gain or even loss would occur at the final summation point. This might result in possible 6dB of gain or loss at this juncture. Some designs would offset phase at the cross-over region in an effort to minimize this problem. However, the compromise disrupted the linearity of the phase response over the whole spectrum.

The digital cross-over makes use of time aligning to each of the audio bands. In this manner, true phase linearity can be maintained, while maintaining dynamic flatness of the program signal. This in turn eliminates any added gain or loss at the final summation. This analogous to using time aligned loudspeakers.

Program Dependent RMS Calculation
Within the design of many AGC sections for compression, there is some function to calculate an average level. This in turn is used within the compressor's control function to alter the gain structure. It has been found that the use of RMS detection seems to produce a natural sound to the AGC operation. In essence, the RMS function calculates the averaged root mean square value of a signal as it occurs over a period of time. Within a block diagram, this average over time is achieved by the use of a simple time constant that is nested within the square root of the squares function. Figure-2 is a block diagram of the theoretical concept behind the RMS calculation.


Figure-2

This style of RMS detector has been implemented in numerous forms recently. All of which have found their place in processing applications. The drawback is that the averaging time is normalized for a broad range of control. This is sufficient for RMS calculation of static signals, but when processing any audio signal, there might be the need to alter the averaging time, as if to create a rolling average.

It is important in the design of the RMS detector, that the lowest frequency passed through the detector does not generate any AC ripple to the control signal. This is possible if the averaging filter is set too fast4. If this occurs, distortion and gain control errors will result. Therefore many RMS detectors must operate with a compromised averaging time to not generate distortion and control errors.

Using program material as content, there will be instances when certain audio frequencies would benefit from a rolling average, as compared to a compromised static value in the detector. This is easily done in DSP as the processor can calculate different averaging times, as well as insert them to create the rolling average. Also, rules can be implemented to allow this process to occur only during desired situations. Through the use of on board memory, it is easy to store a 'history' of what has transpired over time regarding a signals content. Then gain changes can be calculated based upon historic statistics, and figured into the processing algorithm.

Due to this paper's limitations, please accept that volumes can be generated about the subjective nature of controlling an audio signal. The point is that with the digital processing function, the realm of possibilities for control now becomes endless!

Intelligent Interactive Processors
This discussion of digital processing has centered on specific individualized operations. With this in mind, there are a few systematic operations that can be explored.

Part of the workload of a dynamic processor, compressor, limiter, or clipper, is to calculate a value for an audio signal and then use that as a control operation to alter the gain. The transmission system processor is usually made up of a combination of all of the aforementioned processes. Through that, it is possible to know the precise amount of RMS and peak level present. Along with having the historical values mentioned earlier. This information can be used on an interactive basis among the AGC, peak limiter, and clipper sections to systematically provide information to one-another.

An example of this would be the operation of the final clipper. Should significant clipping occur, information can be routed back to the previous AGC and limiter sections altering the gain in an effort to reduce the amount of clipping beyond a specific amount. Through this additional processing, induced distortion can be controlled. This can even be performed on a frequency dependent basis.

DAB and the Web
Past designs for transmission system processors have focused on FM, AM, and television. It is time to enlarge the focus and assess the requirements of Digital Audio Broadcast (DAB) and netcasting, transmitting audio via the World Wide Web. These new mediums have very specific needs, and in each case, signal processing can be used to augment their performance.

For DAB, processing can do more than create a radio station's signature sound and control overmodulation. It can be used to help minimize the effects of the data reduction algorithms that are required by digital transmission systems. Considering there is no need for an emphasized signal in the transmission path, this will further reduce the rigor set upon the processing system. That alone, will improve audio clarity and reduce distortion.

As for the Web, processing will play an even larger role. Knowing that the bit rate requirements for audio on the web are quite small, signal processors can be used to pre-condition the signal so that aural enhancement, and intelligibility, at the lowest bit rates, will improve. Here is a truly untapped area of potential in the world of communications.

Net Results
The goal was to review the progress of audio processing in the digital domain. In my opinion, quality digital processing was always possible, but early implementations were very primitive in nature, design, and most importantly in aural presentation. As we've seen, DSP dynamic processing has been given a 'bad name' because of its disappointing performance thus far. But, I believe through the implementations presented here, this technology may yet clear its name. Only time will tell!

REFERENCES

[1] Baher, H. Analog & Digital Signal Processing, J. Wiley & Sons, 1990

[2] Mendenhall, G. Pre-emphasis and Limiting Considerations for Audio Processors and Digital Studio-to-Transmitter Link, White Paper, 1995

[3] Foti, F. Broadcast Signal Processing and Audio Coding: Are We Trying to Mix Oil with Water?, AES Pre-print #4203, 1996

[4] Kitchin, C. and Counts, L. RMS to DC Conversion Application Guide, Analog Devices, 1986

Top