Return to Omnia hompepage

DTV Audio Processing: Exploring The New Frontier!

Frank Foti, Cutting Edge
Cleveland, Ohio
November, 1998

Abstract
Digital Television has the potential for exceptional audio quality and effects. Considering that the system can provide 5.1 Dolby Digital¨ multichannel sound, as well as existing stereophonic transmission, there are signal processing issues that need to be considered for this exciting new medium. The Dolby Digital method proposes an internal 'meta' data technique for signal processing. This is designed to provide commercial loudness control, and allow the end-user to setup his listening domain as he/she sees fit. While the theory behind this appears good on paper, it leaves some concerns in the realworld. The whole concept of 'meta' data must be understood, as a total system, or it will not work.

This presentation will address these issues, along with providing information on how audio processing can be used as a 'tool' to sonically improve the audio coding process that is used in the Dolby Digital (AC-3) 384kbps transmission system. Discussion will center on a new and innovative method of processing for DTV using the 'meta data', in lieu of the 'tried and true' compression/limiting means that can be inserted in the audio transmission path. This new approach will discuss a method of meta data processing that provides exceptional loudness control, yet keeps all of the program dependant dynamics in tact.

Further discussion will show that signal processing can improve the intelligibility of the 5.1 multichannel effects used in many motion pictures. Early studies are showing that many motion pictures that are produced for use in a theater will create intelligibility problems when listened to in the average home viewing environment. Signal processing can again be used as a 'tool' to eliminate these problems.

With the implementation of DTV transmission in it's early stages, these items and issues need to be dealt with in the immediate future! Having a better understanding about how audio processing can be employed as a value-added 'tool', upon which qualitative and consistent sound presentation is delivered, will be of benefit to the entire television industry.

Television Audio: A Second Class Citizen . . . Not Anymore!

Digital Television, DTV, creates a wide range of new opportunities for the audio portion of the signal. High Fidelity 5.1 multichannel, Dolby-Surround¨, stereo, or monophonic sound are all possible. Associated services have provision for multi-lingual, visual/hearing impairment, and emergency audio capabilities built into the Dolby Digital (AC-3) transmission system.

This represents new ground that's being undertaken in television transmission, also it's a break from the conventional model that analog transmission affords with stereo, sap and the pro channel. All too often, television stations have relegated the audio segment to the rank of "second class citizen" status. It has become the afterthought to the visual component. That changes now in the DTV world!

This has also created the need to reconsider signal processing applications for audio in the DTV system. While the Dolby Digital system provides a meta data method of 'imbedded' signal control, there are still further enhancements that will be realized through deployment of ancillary audio processing, when applied as a 'tool' to the overall system. Following will be a quick look back at the conventional analog television audio processing model, along with an overview of the Dolby Digital meta data. Finally, some thought given to methodology that would enhance the meta data through the use of audio processing.

The Conventional Analog Model

Processing as it's applied in analog television is basically implemented in the same manner as radio. A transmission processor that provides dynamic compression and peak limiting as inserted before the aural transmitter. The sonic presentation is the subjective result of the audio processor at the television station. In most cases it's setup to try and yield an aural texture to sound consistent over a wide range of television sets employing various sized speakers and/or enclosures. The difficulty here is that television listening environments differ on a wide basis and this makes it nearly impossible to create a "one sound fits all" processing method. In addition, television programming provides an extremely wide range of audio situations on which the processing must create adjustment.

From wide ranging dialog in movies and sitcoms, too highly produced commercials, music videos and sporting events. The signal processor's job is to try and create a consistent presentation of these elements. That, along with the above mentioned differences in the viewing/listening environment, increase the challenges for any processing system. Most stations setup processing to try and be as 'general' as possible, and just accept those situations where some commercials are still too loud, and certain dialog is too soft or unintelligible.

Dolby Digital DTV Audio Transmission System

Digital audio transmission is achieved using the Dolby Digital (AC-3) multichannel method. It's the same system as employed with DVD technology. It uses a 'lossy' audio compression algorithm, AC-3, and operates at a bitrate of 384kbps. This is intended for affiliate transmission purposes only. Audio distribution from program providers, production, and network facilities, will employ a 'mezzanine' level of coding. This will incorporate a coding method that operates at a higher bitrate so that multiple codec 'passes' can occur without audio degradation. As of this writing, there is still discussion and debate over a few proposed systems. In the end, the industry will probably standardize on one method of mezzanine delivery. But for the purposes of this discussion, we will center on the 'last mile,' the point of transmission to the viewer.

Audio Meta Data

Within the Dolby Digital system is the inclusion of an imbedded meta data bitstream. This is included with the audio data and is designed to provide the viewer's decoder information about the incoming audio signal. This meta data will provide information about the 'content' of the signal. Just to paraphrase: It will tell the decoder what mode of operation it's in, multichannel, surround, or etc. What Associated Services are provided, and information about the dynamics of the audio levels. It is this last section about the audio levels that we wish to have a closer look at.

Imbedded within the meta data are three functions known as: Dialnorm, Dynrng, and Compr. These pertain to the dynamic conditions of the audio signal. These are inserted by the Dolby Digital encoder. Following is a short glossary and explanation about the segments of these functions, and how they relate to this discussion:

DIALNORM: This stems from the term 'Dialog Normalization,' which is used by the motion picture industry as they try and decide what level to establish as the 'average' operating volume for movies. For DTV purposes, it's a normalization level that is subjectively set for the dialog channel. It is intended to keep the overall operating level within a consistent range so that it does not produce audio peaks that exceed digital full-scale 0dBfs. It is adjustable over a 31dB range, in 1dB increments. Dialnorm can be set once, or it can be updated, as often as every 32ms. It can be utilized as a loudness controller for commercial content, which is a continued annoyance to viewers, in the analog television system. Dialnorm is to be subjectively set by personnel who can evaluate, and ascertain the proper dialog normalization levels based upon an established reference. This can be done by the program originator, producer, or at the transmission point.

NOTE: At the present time, there isn't any convenient method to link program origination, or networks to the affiliate. Therefore, the only place that meta data can be inserted is at the transmission point. Ultimately, it should be possible to generate meta data at the origination point, and then allow it to follow the entire signal stream from beginning to end. Then the meta data could be edited, modified, or allowed to pass 'As Is' through the system.

DYNRNG: A dynamic range compression method that is implemented in the decoder. The meta data carries the dynrng information bits that can control how much compression is utilized by the decoder. The values for dynrng are 'authored' at program origination, or in the transmission encoder. (The above note indicates that origination occurs in the encoderÉfor now.) Dynrng can be updated every 5.3ms within the meta data stream. Operating over a range of +/-24dB, dynrng can be set to allow full dynamic range, or 'scaled' amounts of compression at the decoder outputs. This provides the end user the capability of setting their own listening environment to suit their needs.

COMR: Known as 'heavy compression.' This mode operates over twice the range of dynrng. This function is designed for use whenever the decoded signal is required to be re-transmitted after decoding.

Meta Data: More To Consider . . .

The idea behind meta data is very good, but there are some aspects that need further exploration, and some ideas to ponder that could further enhance the performance of the meta data system. From a signal processing point of view, there are certain conditions where meta data may not be able to provide maximum performance. Following are 'other' considerations about meta data:

DIALNORM

  • It is a wideband process, and coupled to all channels. This can cause unnatural 'ducking' of some material; especially audio that will possess strong bass energy. Consider the situation where an action movie is used, and a loud effect is generated in the front-left channel. This loud passage could cause the operation of dialnorm, if it's being 'authored' dynamically, to 'gain-modulate' the audio in all the other channels. Dependent upon the algorithm of the gain control, and time constants thereof, it might also 'gain-modulate' other frequencies within the audio spectrum. Wideband gain control in generally avoided in broadcast applications for this very reason.

  • The question, "will subjectivity prevail" must be asked? It is already believed, within the industry, that the temptation to author dialnorm to a louder value will be considered. Then we are back to square one regarding loud commercial content, or program material that was incorrectly set too loud or too soft. Dialnorm can be set to whatever values the 'author' wishes it to be! Unless intercepted and reset, which is possible, the audio level through the system might still be inconsistent from source to source.

  • It can be interrupted or authored at numerous points within the system. While we can not yet pass meta data from beginning to end of the entire path, it will be possible in the foreseeable future, once the mezzanine level is established. This leaves possibilities that the 'normalized' value being transmitted to the viewer could again become inconsistent. There will be a need to monitor and adjust dialnorm with regularity.

  • Local program insertion must be factored in. The value for dialnorm will need to be generated for local and network programming. This will need to be consistent with the values that are embedded within network content.

DYNRNG

  • A wideband process. As with the case of dialnorm, the same issues described above are relevant here. This leaves out the possibility of providing 'dynamic equalization' to program material. Many times there is a need to reset the spectral EQ in an effort to create a consistent sonic texture. Wideband processing also affects music signals differently from dialog. There appears to be no differentiation method for this.

  • No method to enhance older source material. Example, older movies soundtracks that are noisy and of limited audio spectrum: There is no method of suppressing source noise, or enhancing weak, dull sounding dialog and/or music. Dynamic range control can be used to enhance these aspects, but with a wideband application, it could be more cumbersome.

  • How will multichannel dynamic effects be controlled? Considering that all channels are coupled together, a loud passage, or effect in one channel, may 'gain-modulate' the audio in other channels. There appears to be no dynamic linking, or unlinking of the channels whenever effects of a wide dynamic range occur. This could be disorienting whenever strong sound effects are generated which could lead to loss of intelligibility of the dialog channel.

  • With the above issues brought forward, it does not mean that the meta data method is flawed. What it needs is something to augment its performance, and in turn improve the performance of the entire system. Multiband dynamics processing, when used as a 'tool' will provide the exact enhancements needed to enable the meta data to realize all of it's benefits.

Multiband Processing: The Missing 'Tool'

Audio processing can be viewed in one of two ways: As an enhancement 'tool' to improve the sonic presentation of an audio signal, or it can be abused by those who wish to get every last dB of level out of a system. FM radio is an example of the latter.

As stated earlier, DTV offers a system that can transport exceptional audio quality to the viewer. Meta data is a good idea, but as described above, it needs help. Multiband processing provides a wide range of features that supports the meta data model. When viewed in the following context, it yields numerous benefits:

  • Enhancement of the aural presentation is achieved through dynamic spectral balancing. Dialog in older material is easier to understand. Tonal balance is maintained from source to source. An old movie soundtrack will sound more robust when paired next to the "News At 11" promo.

  • Precise loudness control. A wealth of research and design in loudness control has been accomplished since the CBS Loudness patents were filed in 1981. The analog version of the CBS Loudness controller falls a bit short in controlling heavily produced loud commercials. New developments in loudness detection through filterbank modeling create precise control. Here, loudness control can be forwarded onto the transmission encoder to assist with the authoring of dialnorm.

  • Source noise, or hiss, can be 'filtered' or suppressed as the system provides dynamic noise reduction.

  • It has been discovered that some motion pictures that were mixed for theater presentation, do not replicate well in the home listening environment. This happens due to the different physical dimensions of the size of a theater as compared to a living room. When the theater mix is presented in the home listening area, dialog can get lost as effects will drown out the dialog due to the differing time delays in speaker placement that exists between full sized theater and home living room. Multiband processing will improve intelligibility in these instances as the time delay differences can be modified in the processor.

  • Channel Coupling: Through intelligent means, channels can be coupled, or uncoupled, in an effort to preserve natural tonal qualities and dynamics to the source signal when wide dynamic range effects occur. This is very important in the multichannel 5.1 environment. As stated earlier, it is not desired to have a loud crash sound effect in the front-left channel affecting the audio level in the rest of the channels. That would be unnatural sounding. A mutliband 5.1 processing system will avoid this situation. Knowledge of the dynamics in the multichannels will allow the processor to link, or unlink its control signals as it sees fit.

  • Processing improves the efficiency of the audio encoder. It should be noted that the transmission encoder for the Dolby Digital system uses an aggressive compression level, 384kbps for the entire 5.1 signal. Therefore, any additional tool that will reduce the chance of coding artifacts is desired. Subjective listening tests have revealed that the use of dynamics control, in front, of a codec will improve its sonic performance. In this manner, the audio processor can be told to understand what signal conditions the audio codec will not like. Then it can adjust for these conditions, in advance, and improve the overall result.

  • The processor can be user to author the meta data. Since all of the information that meta data requires will be generated, or known, by the processor, it would make sense that the processor and Dolby Digital encoder communicate between themselves. This further supports the meta data model, and it ensures that the ancillary processor is not trying to 'fight' the functions of the meta data. In essence both systems begin to work together to create an exceptional output signal.

Processing Applications . . .

One of the benefits of the meta data method is that it can be authored at numerous points along the path. Upon full implementation of the DTV system, it will be possible to know a considerable amount of information about the incoming audio signal. This information can then be passed along, modified, or reset by the affiliate locations.

In the same model, ancillary processing should be thought of in the same light. Program content providers can utilize this tool in final mixdown or mastering. The meta data generated would be the basis that follows the program origination. This information can be used to inform processing devices along the way; as to what dynamics control is still needed, or not needed at all! Here is another benefit in having the dynamics information ahead of time.

For those facilities that require a meta data editor, the processor can work as that function too. In the early stages of DTV this will be a major concern. There is the need for someone to generate, monitor, and adjust the meta data for local live programs. The nightly news programs are an excellent example of this. The processor will perform this as an ongoing function that works automatically. There will be no need for an operator to manually monitor and adjust the meta data.

Multiband processing supports the various forms of DTV, from the 2-channel Lt/Rt Surround to the full 5.1 multichannel presentation. As the discussion provided here illustrates, multiband processing will enhance the performance of the meta data system. The Lt/Rt method is available now; multichannel methods are still in development, as it requires a new approach to multiple channel control when employing multiple audio bands.

A whole new and wide range of possibilities opens up for DTV processing. As research and experience continues in multichannel sound, the need for dynamics control as a tool for this new medium grows with it. Coupled with meta data, the sonic aspects of DTV should never be thought of as a second class citizen again!

Top