Return to Omnia hompepage

Digital Peak Modulation Control: An Alias Free Limiting/Filtering Method Utilizing 48KHz Sampling And No Overshoots


Frank Foti
Cutting Edge
Cleveland, Ohio

Abstract
Peak modulation control is easily accomplished in the analog domain. The digital counterpart requires more consideration and sophistication, as sampling rate, dynamics generated aliasing distortion, and the transmission medium all play an important part. There are questions and concerns about the utilization of 32kHz sampling as the sole means of interconnection in FM transmission. Real world experience shows that whenever peak limiters sampled at 48kHz are integrated into a 32kHz sampled transmission path, problems such as overshoots arise. This presentation offers mathematical reasons why this occurs. Discovery is presented about a digital peak limiter that eliminates overshoots, enabling a 48kHz sampled limiter to exist in a 32kHz sampled environment.

Not exactly…
The radio broadcast transmission path, in the all-digital domain, has been marketed as a plug and play system: Just purchase the processor, STL, and exciter of your choice, connect the gear together via AES/EBU, and away you go. A wonderfully clean, loud, and precise peak controlled signal, all rolled up into one! Well, as one rental car commercial tells us…"Not Exactly!" While some of us might be fooled by the mystic wonders of what a digital system brings to the table, the rest of us have found, or are finding out, that the digital system has some issues of its own. It’s how we choose to deal with these issues that determine whether or not the digital transmission path is providing added benefit, or is merely a weak clone of the analog predecessor.

Within a digital processor and transmission system, numerous factors can affect the absolute peak control. Some of these are related to processor design, and others are a product of overall system performance. Sample slipping, sample rate converters, low pass filters, and emphasis can all contribute to generating overshoots in a transmission system. Later in this presentation, you will see how the implementation of these functions and their placement within the system will have an important effect on peak control.

It is imperative that precise peak control be attained or loudness will be lost due to overshoots. Most countries institute critical modulation limits and restrictions, deeming that any overshoots that occur must be compensated for by reducing the overall modulation by an equal magnitude. Therefore, it is mandatory that any overshoot be minimized–usually to 2% or less. This ensures maximum modulation density, which yields increased perceived loudness. The following sections discuss each of the aforementioned items and how they create overshoots.

Sampling rate and aliasing distortion
Recently, there has been significant discussion and debate within the broadcast industry about sampling rate for transmission purposes. At issue is the choice of either 32kHz or 48kHz sampling. Both will work, yet each bring different benefits and caveats to the table.

The choice of 32kHz sampling was employed in older digital transmission systems. At the time, both DSP hardware and service bandwidth for signal transportation were at a premium, as is, the FM Stereo broadcast system only uses 15kHz audio bandwidth. So, theoretically, the 16kHz Nyquist frequency of the system will fit in an efficient manner and maximize spectrum space in the digital system. While this looks very nice on paper, real-world performance has indicated otherwise.

Research has shown that the 32kHz based dynamics processors generate a high level of aliasing distortion due to weaknesses in their final peak control methods. Even when over-sampled, these systems still generate aliasing products that are audible in program material. A recent AES paper1 detailed research and testing on this matter, and strongly recommended that dynamics processing should employ at as high a sampling rate as possible to reduce dynamics generated aliasing artifacts. In addition, the severity of slope for the 15kHz low pass filter will affect the sonic performance. This will be covered in a later section.

Through virtual up-sampling methods, innovative final limiter algorithm design, and sampling at 48kHz, the dynamics generated aliasing problem is reduced to insignificance, thus rendering it inaudible. Yet a concern of a 48kHz sampled system is this: How well will it interface to a lower sampled transmission system? Converting the sampling rate is not the issue, that’s easily done using a device known as a Sample Rate Converter. The issue for discussion is what happens to the peak control integrity when the conversion process is applied.

Sample Rate Converters (SRC)
This device transforms one system sampling rate to another, which is necessary when interfacing digital equipment that uses different sampling rates.

Scaling up, or interpolating, the incoming signal accomplish the conversion by a factor that allows the desired rate to be divisible by the system’s internal rate. The signal is then low pass filtered at the Nyquist frequency of the desired sampling rate. This filter is required to smooth out the added samples, which would otherwise create aliasing products. Finally, the signal is scaled down, or decimated, by the factor needed to achieve the desired rate. When converting from 48kHz to 32kHz sampling rates, for example, a 10x multiplication rate will up-sample the incoming signal to 480kHz, which can then be divided by 15 to achieve 32kHz. Figure-1 shows a block diagram of a SRC.

While this sounds quite simple—and basically it is—there are a few issues to consider. Of primary interest is the interpolation filter, typically an FIR filter. It must provide a stop-band rejection of 96dB at 16kHz in order to suppress aliasing distortion. A steep slope in the transition area will be required.



Sample Rate Converter
Figure-1

Since nearly all audio processors apply some form of overshoot control in conjunction with the output filtering section, the overshoot component can be determined by the Gibbs Phenomenon2. Should the slope of the up-sampled interpolation filter be greater than the slope of the final filter in the audio processor, then output overshoots may result in the sample rate conversion process. But since these overshoots are generated after the audio processor, removing them requires another limiting device—thus a need for an added limiter further downstream in the system.

Of interest is the direction of rate conversion. When converting from a lower rate to a higher rate, the chances of overshoot are small, because the frequency of the up-sampled filter is set to a higher frequency than the Nyquist frequency of the incoming signal. Overshoots are a significant problem only when transforming a higher rate to a lower rate, as described above.

Our testing has shown that the use of sample rate converters in the digital audio path between an audio processor and exciter will cause overshoots whenever down-converting from 48kHz to 32kHz sampling. In our test lab, we have also confirmed that any 32kHz sampled system, for transmission processed audio, must have tight low pass filtering at the Nyquist frequency of 16kHz. Any non-linear products that exceed 16kHz will cause overshoots in succeeding SRCs or additional low pass filter stages. This constrains any processing system, regardless of sampling rate, to a tight 16kHz bandwidth. Because of this restriction, it renders other benefits of a higher sampling rate useless. Why should this penalty be paid? It has been proven time and again that 48kHz sampling is a superior rate for digital audio, even when a lower audio bandwidth, such as 15kHz for FM Stereo broadcasting, is used.

It is possible to implement a 15kHz low pass filter with a tight 16kHz stop-band in a 48kHz sampled system. There is no problem with doing that. There is however, a subjective and sonic choice for using a filter with a broader slope, as it sounds better. Consider the landscape in the analog processor/transmission system. The need for the 15kHz low pass filter is to protect the 19kHz pilot frequency. Thus, analog low pass filters all were designed to create their stop-band somewhere around 18kHz. Creating an analog low pass filter with a stop-band at 16kHz is theoretically possible, but quite difficult in the real-world, as component tolerances and group delay issues are a problem. Using that example, it shows that low pass filtering with a stop-band beyond 16kHz is not detrimental to the FM Stereo system used in broadcasting, as it has been done for many years and with no problems. It’s only important for usage in a 32kHz sampled transmission system.

Another point to consider: All existing analog processing systems, when digitized and connected into a 32kHz sampled transmission system, will overshoot! This is due to the same reasons as stated above, where energy beyond 16kHz will ring in the low pass filters of the transmission system or SRC, if a conversion is being applied. Considering that analog processing equipment will not become extinct overnight, this problem of 32kHz sampled transmission system overshoots exists even when a hybrid of analog processing is coupled to a digital transmission path.

Therefore, those who argue about the strict use of 32kHz sampling for processing and transmission purposes have given no regard to older technologies that must continue to exist in today’s environment. In essence, the 32kHz sampled system is not backward compatible, whereas a 48kHz sampled system is, as far as overshoot control is concerned.

The following persistence display is of a digital oscilloscope that measured the output of the mpx test point in a Harris DigitR FM Exciter. The test used an OptimodR 8100 connected to the exciter through a SymetrixR 20bit A/D converter. This is the exact same configuration that a radio station choosing to use an analog FM processor and digital exciter would have. This test configuration was setup as follows:

·         Program audio connected to 8100 processor.

·         Left/Right Output "Test Jacks" of 8100 connected to A/D converter inputs.

·         A/D converter AES/EBU output connected to digital input of exciter.

·         Test Point J-1 of Digit mpx board connected to scope.

Following is the result of that test:



Figure-2

As Figure-2 shows, overshoots occur! Thus proving the point that subsequent filters in SRC’s and the exciter are the culprits as the bandwidth control in the 8100 does not have the required stopband suppression at 16kHz! The above illustrated problem exists today with every radio station that is choosing to use an analog processor that operates in a 32kHz sampled transmission path.

Remember that a system using 48kHz sampling does not require tight filtering at 16kHz; it must provide tight filtering at 19kHz to protect the pilot frequency and the remainder of the composite spectrum. Thus, there will be some non-linear products beyond 16kHz which could overshoot when down-converted to 32kHz sampling. Note that 32kHz sampling is not a standard for the FM transmission system, nor was it intended to be. A recent AES Journal3 recommendation for sampling rate instructs that 32kHz may be used for broadcasting, but it does not suggest it as a standard. Digital FM exciters should be able to accept a 48kHz sampled signal and modulate it without generating any overshoot. The broadcaster should not be penalized for desiring to use 48kHz sampled systems in the transmission path of their radio station!

Since this sampling rate issue is primarily based in the broadcast environment, here’s another perspective to add into the mix: Digital Audio Broadcasting (DAB) specifies a 20kHz audio bandwidth. Since many existing broadcast facilities will, no doubt, employ this technology when it becomes available, they will need to provide transmission systems capable of 20kHz bandwidth. Each of the proponents who argue in favor of 32kHz sampling will be out in the cold with regard to DAB. So here is another reason to embrace 48kHz sampling throughout the broadcast facility and transmission path.

Digital transmission systems and their effect on sound quality and peak control

FM Exciters
These are the latest entry to the digital audio transmission path. Capable of exceptional modulation performance, they offers two forms of signal input: Analog composite (MPX) for the non-digital transmission site and AES/EBU.

The composite input connects to the modulator by way of a high speed A/D converter, and requires a faster sampling rate than normally used for the discrete channels. Since the modulation spectrum for FM can range up to 99kHz, the exciter must use a sampling rate of at least 200kHz, for a Nyquist at 100kHz, which covers the baseband spectrum.

The AES/EBU input accepts the signal in the discrete left/right format. Thus, the exciter must perform the stereo generator function. Here is where the story gets interesting.

Consider the AES/EBU input signal to the exciter. It might be at a different sampling rate than that of the exciter. If so, a sample rate converter is employed to make the proper transition. This can pose problems, as the digital filter within the sample rate converter can generate overshoots, adversely affecting the tightly peak-controlled audio data being converted.

The audio, having already been emphasized, peak-controlled and band-limited by the audio processor, needs only matrixing and MPX encoding for stereo modulation to occur. But what’s present in most digital exciters is a sample rate converter, another low pass filter, and in some cases, the addition of, yet again, pre-emphasis. There is a final limiter included in some digital exciters to help remedy some of these overshoot problems, albeit with adverse sonic consequences.

In essence, the signal that only needed to be matrixed and MPX encoded now has additional conditioning applied to it which can degrade sonic performance and modulation efficiency. To learn why, let’s review low pass filters and emphasis networks.

Low Pass Filters: The Sonic Effects
Not all low pass filters sound the same, even when they are designed to the same cutoff frequency and are of the same type. Differences in their transition range will affect how they sound.

Let’s take a look at a few of the restrictions in using tight low pass filtering to provide stop-band rejection at 16kHz in a 32kHz sampled system. The stop-band must provide 96dB of rejection at 16kHz in order to be effective, or aliasing distortion will result. To achieve a filter of this magnitude along with phase linear group delay, a FIR filter is used. With the design specification of 96dB stop-band rejection and 0.1dB passband variance, an equiripple style of filter is suited for the job. Unfortunately, this filter will require 119 taps to create the tight slope that provides 96dB of stop-band rejection. A filter of this length will create 1.8ms of throughput delay.

By contrast, a 15kHz low pass filter designed to provide 96dB stop-band rejection at 19kHz to protect the pilot, and operating at 48kHz sampling, requires only 47 taps. This generates a throughput delay of only 0.47ms, almost 4 times less than the above-mentioned filter. When we consider that time delay in digital transmission systems is a cumulative function, every millisecond counts, as it can add to the comb-filter effect that disk jockeys perceive when monitoring themselves off-the-air.

Another aspect to consider is the sonic differences of filters with different slopes. Psychoacoustic tests have proven that the transition slope of a filter will affect the timbre of the audio. As the filter slope is made tighter, "ringing," which degrades the clarity of the audio, is increased. Therefore, a low pass filter which utilizes a gentler slope is sonically superior. Figures 3-4 depict the differences in the slopes of 2 different 15kHz low pass filters.

 

Example of Tight Slope
Figure-3


 

Example of Broader Slope
Figure-4

The design of transmission equipment is both simplified and results in superior performance with a 48kHz sampling rate. Down-conversions don’t affect tightly-controlled audio, and filtering requirements are eased.

Pre-emphasis/De-emphasis Considerations
Most exciters let you add pre-emphasis. Optimally, however, the addition of pre-emphasis is best left to the audio processor, as it employs specialized high frequency control sections that provide both the boost and control of the high frequency energy. In this manner, high levels of modulation are easily obtained, since the processor is designed to balance the tradeoffs between pre-emphasis and high frequency limiting.

In situations where a codec-based STL system and audio processor are inserted before the stereo generator, the codec must pass "flat" (non pre-emphasized) audio. This requires adding de-emphasis to the output of the processor; pre-emphasis is then re-applied in the exciter’s stereo generator. Figure-5 illustrates this:

 

Codec-based STL System
Figure-5

A flat signal is required by the codec because of its reliance on masking principles. Any significant change or imbalance of the frequency spectrum can cause the codec to expose artifacts that would normally be masked.

Whenever multiple stages of frequency contouring are applied, the phase response of all stages must match, or overshoots will result. To eliminate the added overshoot, another limiter must be employed as a "band aid" in the exciter. Even though emphasis networks are derived from a first order filter process, it is possible to create networks that may not match up in phase with each other.

The following are the formulas for determining emphasis response that correlate to first order analog RC networks for pre-emphasis and de-emphasis. Specific frequency gain, along with phase response, can be calculated. Any emphasis networks implemented in DSP should follow these calculations:

To Calculate Pre-emphasis/De-emphasis:

 
(equation - 1)

 
(equation - 2)

Where: Ratio = Emphasis gain at a given frequency
fr = Audio Frequency
T = Time in milliseconds (50µs or 75µs)

Equation-1 is used to calculate the gain at a specific frequency along the emphasis curve. Equation-2 converts the gain ratio to dB. Taking the reciprocal of the ratio in equation-1 provides the calculation for de-emphasis. These values represent the exact response that is obtained in an emphasis network that is implemented with a single pole RC filter in the analog domain. The phase relationship for both pre-emphasis and de-emphasis are represented in equation-3.

 
(equation - 3)

Where: Ø = Degrees of phase shift at a given frequency
fr = Audio Frequency
T = Time in milliseconds (50µs or 75µs)

Unless each processor, STL, and exciter manufacturer follow these same equations when designing emphasis networks, the resultant phase mismatches will cause overshoots.

Based upon the previous discussion, you can see why it’s best to install the audio processing system as close to the exciter as possible and to use the processor’s pre-emphasis. By doing so, internal limiting in the exciter becomes unnecessary and allows the processing system to provide all of the required peak control.

Sample Slipping
Within a digital system, the resolution of the audio data is determined by the number of samples for a given frequency. Lower frequencies will be sampled more often than higher frequencies. According to Nyquist theory, there will be at least two samples at the highest frequency. This does not leave much resolution when trying to determine the exact peak level at the upper portion of the spectrum, since the two sample points can occur over a 360 degree range. If this happens within the hard limiter algorithm of an audio processor, overshoots will result!

When hard limiting is performed, the precise level of the upper frequencies in the spectrum can be missed, as some of their peaks will occur between sample points. Should these peaks exceed the threshold of the clipper, what the final output level will be after the clipping function is performed becomes uncertain. This is technically known as unquantized intra-sampled peaks, or sample slipping. Figure-6 shows a worst-case example of this:


 

Worst-case example of unquantized intra-sample peak
Figure-6

Notice how the missed peak reaches its crest factor exactly between the two sample periods. At each sample point, the value that is registered as data is significantly less than the peak value. If this missed peak is at a level that would cross the clipper threshold, nothing would happen, as the clipper is not aware of it. The problem is most severe when the signal in question approaches the Nyquist frequency. We can calculate the acquired level, and hence the error between the acquired level and the peak level, by using the following equations:

 
(equation - 4)

Where: Ø = Degrees between Upper Audio Frequency & Sampling Rate
fa = Upper Audio Frequency
fs = Sampling Rate

 
(equation -5)

Let’s have a look at some examples. With 32kHz sampling and a test frequency of 15kHz (the upper bandwidth limit in FM broadcasting), the acquired level equals 0.098, or 10%.

  • 168.75° = 360 * (15kHzfa/32kHzfs)

  • 0.098A = Cos(168.75°/2)

In other words, there is less than 10% level acquisition, or 90% detection error in a 15kHz peak, sampled half-way between two samples with 32kHz sampling. On the other hand, with 48kHz sampling, there is 55% level acquisition, or 45% error. A 128kHz system generates 7% error. In a virtual 192kHz sampling method, there is 97% level acquisition, which generates only 3% error. Table-1 summarizes the effect of sampling rate on the efficacy of peak acquisition:


 
Summary of Peak Acquisition Error as a Function of Sampling Rate
Table-1

This is why using a higher sampled system will reduce this problem to insignificance. Thus, when an audio processing system is being evaluated for any tendency for peaks to "slip between the samples," you only need to determine what sampling rate is used. Furthermore, a peak limiter will control peaks regardless of the sampling rate, and nothing will "slip between the samples," as long as a clever limiter algorithm is utilized.

Looking at peak control performance
Before having a look at peak control performance through a digital path, let’s first verify the normal peak control operation at the output of a processor under test. A digital processor that employs 48kHz base sampling rate will be used. This processor employs a 192 kHz virtual up-sampled hard limiter. Intra-sample peak problems are virtually non-existent, being limited to about a worst-case 3% error. Following is a test that describes a look at the discrete Left/Right outputs of the system as viewed by a digital storage oscilloscope, to verify peak control.


Left/Right Channel Overshoot Test Methodology
Using program material, the audio processor was set to process aggressively. The song "The Real Thing" by Lisa Stansfield was used, because it contains substantial low frequencies and clean high frequencies, thus providing a good challenge for the control of overshoots. The analog output was connected to a Tektronix TDS-744A digital storage oscilloscope. The ’scope was set to the infinite persistence mode, which will "hold" the monitored waveform on the screen. Each waveform was stored for at least one minute. The Tek ’scope can store its display as a bitmap file; these files were used for this document.

Over time, the persistence will "fill in" the block with traces of audio waveforms, and the "flat" lines along the top and bottom of the filled in section represent clipper performance. Any little "dots" that exceed the reference level of 1.35 volts are overshoots. Figure-7 shows the performance of the system.



Persistence Display of Processor’s Left Channel Analog Output
One Minute Time Period
Figure-7

Notice that there are few little "blips" above the 1.35 volt reference level. These are of insignificant level and of very short duration: approximately 200µs. In real life, they wouldn't be detected by any modulation monitor!

A solution to aes/ebu transmission overshoots: prediction analysis clipping
In each of the above discussions, it is shown how and why overshoots can develop using the AES/EBU connection between processor and exciter. It does not matter if they are co-located or separated by an STL system. As discussed earlier, this is especially true whenever a down-conversion is required between 48kHz and 32kHz sampling, where overshoot components can reach 20%. Using the final limiter in the exciter as a remedy has its own disadvantage—degraded audio quality. What is needed is a final limiter that can analyze and predict what will happen to the signal downstream, and correct for that—a Prediction Analysis Clipper.

Early performance of the 48kHz processor connected to a digital exciter via AES/EBU exhibited the overshoot phenomenon described above, compromising ultimate loudness by up to 2dB. The following oscilloscope image, Figure-8, was taken from a test point within the digital exciter after the discrete left/right input has been stereo encoded, and it shows the overshoot components.

 

Persistence Display Showing Overshoots
Figure-8

In this display, there are "spikes" representing overshoots 15 to 20 percent beyond the reference peak level of ±650 mv. Compare this figure with that of the earlier figure, which showed the tightly-controlled output at the output of the processor. Clearly, there is a loss of peak control as the signal makes its way to the output of the MPX generator in the exciter, this can be attributed to all the problems detailed in the above discussions. What can be done?

Prediction Analysis
In trying to devise a solution to what seems to be an unsolvable problem, let’s consider what is known about the problem:

·         Overshoots occur whenever down-conversion of 48kHz to 32kHz sampling is performed.

·         The tight transition slope of the 16kHz filter in the sample rate converter is a significant contributor to the problem.

·         The problem occurs only with signal components above 5kHz.

·         It is not desirable to reduce the slope of the low pass filter in the audio processor, as it degrades sound quality.

·         Adding more clippers and filters only increases distortion.

Might it be possible to pre-compensate for predicted occurrences of overshoots by the use of supplementary control signals applied to the upper audio spectrum—some type of dynamic, self-adjusting coefficient that could anticipate an overshoot situation, and then correct for it in advance? The answer, amazingly, can be found within the system’s main clipper algorithm—the same one employed to eliminate aliasing distortion4...aka: digital grunge!

Since it’s known what mechanisms contribute to overshoots, the severity of the overshoots can be calculated. Then, this information can be combined with the effects of a network that simulates the sharp slope of the 16kHz filter in a sample rate converter. This analysis provides the actual overshoot components that could occur later in the system. By dynamically applying both results to the non-aliasing clipper algorithm, the predicted overshoots can be eliminated!

Note that when analyzing the effects of the 16kHz low pass filter used in the SRC, it is not desirable to actually bandlimit the audio for the tighter requirements of the SRC filter. The broader low pass filter in the processor’s design is maintained, which provides two benefits: it does not add further time delay to the system, and it preserves sound quality.

The use of the Prediction Analysis Clipper method reduces overshoots in the sample rate converted signal path from a worst case of 20% to considerably less. Testing was done using very aggressive processing settings, under normal processing operation, overshoots were controlled to within 3% or less. As Figure-9 shows, overshoots in the AES/EBU sample rate converted path are insignificant.

 

Persistence Display Showing Performance of
Prediction Analysis Clipper with Sample Rate Converter
Figure-9

The Prediction Analysis Clipper eliminates overshoot problems associated with the use of lower sampling rates in the transmission path. Now, the processor can be utilized with 32kHz digital uncompressed STL systems and 32kHz exciters, and tight peak control will be achieved. Systems can "mix and match" sampling rates with little or no problem incurred regarding overshoot.

It is still recommend that, when using a coded STL link, the processor be located at the transmitter site, as it is proven that codecs will undo the tight peak control of any processing system. For further discussion on this topic, please refer to the technical paper "Broadcast Signal Processing and Audio Coding: Are We Trying to Mix Oil with Water?" This can be found elsewhere on our website.

While this new clipping method solves the overshoot problems associated with sample rate conversion, it may not be able to compensate for additional variables that may exist in a broadcast chain. Furthermore, it does not remedy the sonic degradation associated with the added amount of up/down conversions and increased time delay associated with an AES/EBU connection.

Conclusions
The sampling rate of the audio processor and transmission system have a direct effect on both system peak control performance and subjective sound quality. It has been discussed and shown through research and on-air evaluation that usage of higher sampling rates improves the overall performance in each of these areas. Yet, we live in a world where older technologies, that employed sample rates at 32kHz are in use. What has been shown here is an example where the use of a higher sampled system for processing can co-exist in a lower sampled environment, and without modulation overshoots. Unfortunately, the same process cannot be applied to remaining analog processing systems that must make use of a 32kHz sampled digitized transmission system. There, the lack of backward compatibility is impossible to overcome.

With DAB already on-air in some countries, and hopefully here soon in the USA, it makes all the more sense to realize that we will soon live in a world where 48kHz sampling is at least the minimum.

REFERENCES

[1] Mapes-Riordan, D.: A Worst-Case Analysis for Analog-Quality (Alias-Free) Digital Dynamics Processing, 105th Convention of the Audio Engineering Society (AES), San Francisco 1998, Preprint 4766

[2] Steiglitz, K.: A Digital Signal Processing Primer, Addison-Wesley Publishing, 1995

[3] AES5-1998: AES Recommended Practice For Professional Digital Audio – Preferred Sampling Frequencies For Applications Employing Pulse-Code Modulation, Journal of the Audio Engineering Society, Volume 46, Number 10, October 1998

[4] Foti, F.: Digital Audio Broadcast Processing: Finally The New Frontier!, National Association of Broadcasters Convention, Las Vegas, April 1997

Top