 |
|
DTV
Audio Processing: Exploring The New Frontier!
Frank Foti, Cutting Edge
Cleveland, Ohio
November, 1998
Abstract
Digital Television has the potential for exceptional audio
quality and effects. Considering that the system can provide
5.1 Dolby Digital¨ multichannel sound, as well as existing
stereophonic transmission, there are signal processing issues
that need to be considered for this exciting new medium. The
Dolby Digital method proposes an internal 'meta' data
technique for signal processing. This is designed to provide
commercial loudness control, and allow the end-user to setup
his listening domain as he/she sees fit. While the theory
behind this appears good on paper, it leaves some concerns in
the realworld. The whole concept of 'meta' data must be
understood, as a total system, or it will not work.
This presentation will address these
issues, along with providing information on how audio
processing can be used as a 'tool' to sonically improve the
audio coding process that is used in the Dolby Digital (AC-3)
384kbps transmission system. Discussion will center on a new
and innovative method of processing for DTV using the 'meta
data', in lieu of the 'tried and true' compression/limiting
means that can be inserted in the audio transmission path.
This new approach will discuss a method of meta data
processing that provides exceptional loudness control, yet
keeps all of the program dependant dynamics in tact.
Further discussion will show that
signal processing can improve the intelligibility of the 5.1
multichannel effects used in many motion pictures. Early
studies are showing that many motion pictures that are
produced for use in a theater will create intelligibility
problems when listened to in the average home viewing
environment. Signal processing can again be used as a 'tool'
to eliminate these problems.
With the implementation of DTV
transmission in it's early stages, these items and issues need
to be dealt with in the immediate future! Having a better
understanding about how audio processing can be employed as a
value-added 'tool', upon which qualitative and consistent
sound presentation is delivered, will be of benefit to the
entire television industry.
Television Audio: A Second Class Citizen . . . Not Anymore!
Digital Television, DTV, creates a wide range of new
opportunities for the audio portion of the signal. High
Fidelity 5.1 multichannel, Dolby-Surround¨, stereo, or
monophonic sound are all possible. Associated services have
provision for multi-lingual, visual/hearing impairment, and
emergency audio capabilities built into the Dolby Digital
(AC-3) transmission system.
This represents new ground that's
being undertaken in television transmission, also it's a break
from the conventional model that analog transmission affords
with stereo, sap and the pro channel. All too often,
television stations have relegated the audio segment to the
rank of "second class citizen" status. It has become
the afterthought to the visual component. That changes now in
the DTV world!
This has also created the need to
reconsider signal processing applications for audio in the DTV
system. While the Dolby Digital system provides a meta data
method of 'imbedded' signal control, there are still further
enhancements that will be realized through deployment of
ancillary audio processing, when applied as a 'tool' to the
overall system. Following will be a quick look back at the
conventional analog television audio processing model, along
with an overview of the Dolby Digital meta data. Finally, some
thought given to methodology that would enhance the meta data
through the use of audio processing.
The Conventional Analog Model
Processing as it's applied in analog television is basically
implemented in the same manner as radio. A transmission
processor that provides dynamic compression and peak limiting
as inserted before the aural transmitter. The sonic
presentation is the subjective result of the audio processor
at the television station. In most cases it's setup to try and
yield an aural texture to sound consistent over a wide range
of television sets employing various sized speakers and/or
enclosures. The difficulty here is that television listening
environments differ on a wide basis and this makes it nearly
impossible to create a "one sound fits all"
processing method. In addition, television programming
provides an extremely wide range of audio situations on which
the processing must create adjustment.
From wide ranging dialog in movies and
sitcoms, too highly produced commercials, music videos and
sporting events. The signal processor's job is to try and
create a consistent presentation of these elements. That,
along with the above mentioned differences in the
viewing/listening environment, increase the challenges for any
processing system. Most stations setup processing to try and
be as 'general' as possible, and just accept those situations
where some commercials are still too loud, and certain dialog
is too soft or unintelligible.
Dolby Digital DTV Audio Transmission System
Digital audio transmission is achieved using the Dolby Digital
(AC-3) multichannel method. It's the same system as employed
with DVD technology. It uses a 'lossy' audio compression
algorithm, AC-3, and operates at a bitrate of 384kbps. This is
intended for affiliate transmission purposes only. Audio
distribution from program providers, production, and network
facilities, will employ a 'mezzanine' level of coding. This
will incorporate a coding method that operates at a higher
bitrate so that multiple codec 'passes' can occur without
audio degradation. As of this writing, there is still
discussion and debate over a few proposed systems. In the end,
the industry will probably standardize on one method of
mezzanine delivery. But for the purposes of this discussion,
we will center on the 'last mile,' the point of transmission
to the viewer.
Audio Meta Data
Within the Dolby Digital system is the inclusion of an
imbedded meta data bitstream. This is included with the audio
data and is designed to provide the viewer's decoder
information about the incoming audio signal. This meta data
will provide information about the 'content' of the signal.
Just to paraphrase: It will tell the decoder what mode of
operation it's in, multichannel, surround, or etc. What
Associated Services are provided, and information about the
dynamics of the audio levels. It is this last section about
the audio levels that we wish to have a closer look at.
Imbedded within the meta data are
three functions known as: Dialnorm, Dynrng, and Compr. These
pertain to the dynamic conditions of the audio signal. These
are inserted by the Dolby Digital encoder. Following is a
short glossary and explanation about the segments of these
functions, and how they relate to this discussion:
DIALNORM: This stems from the term
'Dialog Normalization,' which is used by the motion picture
industry as they try and decide what level to establish as
the 'average' operating volume for movies. For DTV purposes,
it's a normalization level that is subjectively set for the
dialog channel. It is intended to keep the overall operating
level within a consistent range so that it does not produce
audio peaks that exceed digital full-scale 0dBfs. It is
adjustable over a 31dB range, in 1dB increments. Dialnorm
can be set once, or it can be updated, as often as every
32ms. It can be utilized as a loudness controller for
commercial content, which is a continued annoyance to
viewers, in the analog television system. Dialnorm is to be
subjectively set by personnel who can evaluate, and
ascertain the proper dialog normalization levels based upon
an established reference. This can be done by the program
originator, producer, or at the transmission point.
NOTE: At the present time, there
isn't any convenient method to link program origination,
or networks to the affiliate. Therefore, the only place
that meta data can be inserted is at the transmission
point. Ultimately, it should be possible to generate meta
data at the origination point, and then allow it to follow
the entire signal stream from beginning to end. Then the
meta data could be edited, modified, or allowed to pass
'As Is' through the system.
DYNRNG: A dynamic range compression
method that is implemented in the decoder. The meta data
carries the dynrng information bits that can control how
much compression is utilized by the decoder. The values for
dynrng are 'authored' at program origination, or in the
transmission encoder. (The above note indicates that
origination occurs in the encoderÉfor now.) Dynrng can be
updated every 5.3ms within the meta data stream. Operating
over a range of +/-24dB, dynrng can be set to allow full
dynamic range, or 'scaled' amounts of compression at the
decoder outputs. This provides the end user the capability
of setting their own listening environment to suit their
needs.
COMR: Known as 'heavy compression.'
This mode operates over twice the range of dynrng. This
function is designed for use whenever the decoded signal is
required to be re-transmitted after decoding.
Meta Data: More To Consider . . .
The idea behind meta data is very good, but there are some
aspects that need further exploration, and some ideas to
ponder that could further enhance the performance of the meta
data system. From a signal processing point of view, there are
certain conditions where meta data may not be able to provide
maximum performance. Following are 'other' considerations
about meta data:
DIALNORM
-
It is a wideband process, and
coupled to all channels. This can cause unnatural
'ducking' of some material; especially audio that will
possess strong bass energy. Consider the situation where
an action movie is used, and a loud effect is generated
in the front-left channel. This loud passage could cause
the operation of dialnorm, if it's being 'authored'
dynamically, to 'gain-modulate' the audio in all the
other channels. Dependent upon the algorithm of the gain
control, and time constants thereof, it might also
'gain-modulate' other frequencies within the audio
spectrum. Wideband gain control in generally avoided in
broadcast applications for this very reason.
-
The question, "will
subjectivity prevail" must be asked? It is already
believed, within the industry, that the temptation to
author dialnorm to a louder value will be considered.
Then we are back to square one regarding loud commercial
content, or program material that was incorrectly set
too loud or too soft. Dialnorm can be set to whatever
values the 'author' wishes it to be! Unless intercepted
and reset, which is possible, the audio level through
the system might still be inconsistent from source to
source.
-
It can be interrupted or
authored at numerous points within the system. While we
can not yet pass meta data from beginning to end of the
entire path, it will be possible in the foreseeable
future, once the mezzanine level is established. This
leaves possibilities that the 'normalized' value being
transmitted to the viewer could again become
inconsistent. There will be a need to monitor and adjust
dialnorm with regularity.
-
Local program insertion must be
factored in. The value for dialnorm will need to be
generated for local and network programming. This will
need to be consistent with the values that are embedded
within network content.
DYNRNG
-
A wideband process. As with the
case of dialnorm, the same issues described above are
relevant here. This leaves out the possibility of
providing 'dynamic equalization' to program material.
Many times there is a need to reset the spectral EQ in
an effort to create a consistent sonic texture. Wideband
processing also affects music signals differently from
dialog. There appears to be no differentiation method
for this.
-
No method to enhance older
source material. Example, older movies soundtracks that
are noisy and of limited audio spectrum: There is no
method of suppressing source noise, or enhancing weak,
dull sounding dialog and/or music. Dynamic range control
can be used to enhance these aspects, but with a
wideband application, it could be more cumbersome.
-
How will multichannel dynamic
effects be controlled? Considering that all channels are
coupled together, a loud passage, or effect in one
channel, may 'gain-modulate' the audio in other
channels. There appears to be no dynamic linking, or
unlinking of the channels whenever effects of a wide
dynamic range occur. This could be disorienting whenever
strong sound effects are generated which could lead to
loss of intelligibility of the dialog channel.
-
With the above issues brought
forward, it does not mean that the meta data method is
flawed. What it needs is something to augment its
performance, and in turn improve the performance of the
entire system. Multiband dynamics processing, when used
as a 'tool' will provide the exact enhancements needed
to enable the meta data to realize all of it's benefits.
Multiband Processing: The Missing
'Tool'
Audio processing can be viewed in one of two ways: As an
enhancement 'tool' to improve the sonic presentation of an
audio signal, or it can be abused by those who wish to get
every last dB of level out of a system. FM radio is an example
of the latter.
As stated earlier, DTV offers a system
that can transport exceptional audio quality to the viewer.
Meta data is a good idea, but as described above, it needs
help. Multiband processing provides a wide range of features
that supports the meta data model. When viewed in the
following context, it yields numerous benefits:
-
Enhancement of the aural
presentation is achieved through dynamic spectral
balancing. Dialog in older material is easier to
understand. Tonal balance is maintained from source to
source. An old movie soundtrack will sound more robust
when paired next to the "News At 11" promo.
-
Precise loudness control. A wealth
of research and design in loudness control has been
accomplished since the CBS Loudness patents were filed in
1981. The analog version of the CBS Loudness controller
falls a bit short in controlling heavily produced loud
commercials. New developments in loudness detection
through filterbank modeling create precise control. Here,
loudness control can be forwarded onto the transmission
encoder to assist with the authoring of dialnorm.
-
Source noise, or hiss, can be
'filtered' or suppressed as the system provides dynamic
noise reduction.
-
It has been discovered that some
motion pictures that were mixed for theater presentation,
do not replicate well in the home listening environment.
This happens due to the different physical dimensions of
the size of a theater as compared to a living room. When
the theater mix is presented in the home listening area,
dialog can get lost as effects will drown out the dialog
due to the differing time delays in speaker placement that
exists between full sized theater and home living room.
Multiband processing will improve intelligibility in these
instances as the time delay differences can be modified in
the processor.
-
Channel Coupling: Through
intelligent means, channels can be coupled, or uncoupled,
in an effort to preserve natural tonal qualities and
dynamics to the source signal when wide dynamic range
effects occur. This is very important in the multichannel
5.1 environment. As stated earlier, it is not desired to
have a loud crash sound effect in the front-left channel
affecting the audio level in the rest of the channels.
That would be unnatural sounding. A mutliband 5.1
processing system will avoid this situation. Knowledge of
the dynamics in the multichannels will allow the processor
to link, or unlink its control signals as it sees fit.
-
Processing improves the efficiency
of the audio encoder. It should be noted that the
transmission encoder for the Dolby Digital system uses an
aggressive compression level, 384kbps for the entire 5.1
signal. Therefore, any additional tool that will reduce
the chance of coding artifacts is desired. Subjective
listening tests have revealed that the use of dynamics
control, in front, of a codec will improve its sonic
performance. In this manner, the audio processor can be
told to understand what signal conditions the audio codec
will not like. Then it can adjust for these conditions, in
advance, and improve the overall result.
-
The processor can be user to
author the meta data. Since all of the information that
meta data requires will be generated, or known, by the
processor, it would make sense that the processor and
Dolby Digital encoder communicate between themselves. This
further supports the meta data model, and it ensures that
the ancillary processor is not trying to 'fight' the
functions of the meta data. In essence both systems begin
to work together to create an exceptional output signal.
Processing Applications . . .
One of the benefits of the meta data method is that it can be
authored at numerous points along the path. Upon full
implementation of the DTV system, it will be possible to know
a considerable amount of information about the incoming audio
signal. This information can then be passed along, modified,
or reset by the affiliate locations.
In the same model, ancillary
processing should be thought of in the same light. Program
content providers can utilize this tool in final mixdown or
mastering. The meta data generated would be the basis that
follows the program origination. This information can be used
to inform processing devices along the way; as to what
dynamics control is still needed, or not needed at all! Here
is another benefit in having the dynamics information ahead of
time.
For those facilities that require a
meta data editor, the processor can work as that function too.
In the early stages of DTV this will be a major concern. There
is the need for someone to generate, monitor, and adjust the
meta data for local live programs. The nightly news programs
are an excellent example of this. The processor will perform
this as an ongoing function that works automatically. There
will be no need for an operator to manually monitor and adjust
the meta data.
Multiband processing supports the
various forms of DTV, from the 2-channel Lt/Rt Surround to the
full 5.1 multichannel presentation. As the discussion provided
here illustrates, multiband processing will enhance the
performance of the meta data system. The Lt/Rt method is
available now; multichannel methods are still in development,
as it requires a new approach to multiple channel control when
employing multiple audio bands.
A whole new and wide range of
possibilities opens up for DTV processing. As research and
experience continues in multichannel sound, the need for
dynamics control as a tool for this new medium grows with it.
Coupled with meta data, the sonic aspects of DTV should never
be thought of as a second class citizen again!
Top
|
|