|
|
Reproduced from
Gramophone, November 1999
© Haymarket Magazines Ltd _________________________________
PACKING IT ALL IN The ultra high resolution requirements of DVD-Audio and Super Audio CD have necessitated the development of lossless digital audio compression systems, as Keith Howard explains Audio
history, like history in general, does not always proceed
in logical fashion. So it is that, with DVD-Audio and
Super Audio CD almost upon us, we are belatedly being
introduced to lossless digital audio compression systems
following some years of prior exposure to their lossy
relatives, as incorporated in the perceptual coding
processes that underpin MiniDisc, Digital Radio and Dolby
Digital. There
would have been less potential for confusion had matters
developed logically, in reverse order. In the mind of
dyed-in-the-wool audio enthusiasts
compression is already a dirty word because
of its association with the lossy form: a term which
smacks of compromise in a context where an uncompromising
approach to sound quality is a central tenet. Little
wonder in the circumstances that lossless compression
which, unlike the lossy alternative, involves no
compromise of signal quality has been
alternatively termed lossless packing, in a
deliberate attempt to distance it from those technologies
which discard notionally inaudible signal components,
such as ATRAC, MPEG and AC-3. Compression
is a potentially confusing term in any case because it
has two distinct meanings in the audio context. In its
old usage compression refers to a reduction in dynamic
range, deliberate or otherwise. In digital audio,
however, it is also used as a shortened form of
data compression and refers to methods of
trimming back on the large amount of data required to
represent the signal. If this reduction is achieved
without any modification to the signal content in
other words, if the decompressed signal is a bit-exact
reconstruction of the input then the compression
is lossless; if output and input are not identical then
the compression is lossy. The
latter form of compression has been more widely used to
date because of the limitations in data capacity imposed
by various means of music delivery. To compress a
two-channel, full-spectrum, wide dynamic range audio
signal on to MiniDisc, for example, and still retain
CD-competitive playing time requires data compression of
such an order that it cannot consistently be achieved
without data loss. Likewise fitting 5.1 channels of high
quality sound on to a film print or broadcasting two
channels of Digital Radio from terrestrial transmitter
sites. In all these cases the compression process
generally involves loss of signal data, necessitated by
limitations on data capacity. (The word
generally is appropriate here because lossy
compression schemes usually incorporate lossless encoding
techniques, which potentially means that simple signals
will be encoded without data loss. In the PASC lossy
compression system of Digital Compact Cassette, for
example, half the 4:1 data compression was achieved by
lossless encoding processes.) Lossless
compression is now making an appearance within DVD-A and
SACD because, with their vastly increased data storage
capability, these high-density media significantly reduce
the compression requirement. Although data compression is
still employed to provide the desired combination of
sound quality, channel provision and playing time, the
amount of data saving required has fallen sufficiently
for lossless compression to suffice, guaranteeing the
uncompromising hi-fi requirement that input and output be
identical. Reasons To
understand why compression is still required for DVD-A
and SACD its only necessary to perform some simple
arithmetic. Lets take DVD-A as the example. On a
single-sided disc the maximum data capacity is 4.7
gigabytes (4.7GB). The maximum supported sampling rate is
192kHz (i.e. 192,000 samples per second) and the maximum
supported resolution 24-bit. Using these figures we can
calculate the maximum playing time for a two-channel
audio signal stored at the highest available quality
(ignoring, for the sake of convenience, the additional
data required for error correction and other subcode
purposes). Each channel of 24-bit/192kHz audio generates
(24 x 192000 =) 4,608,000 bits per second, equivalent to
562.5 kilobytes (KB). For two channels the total data
rate is therefore 1.1MB per second. At that rate the
4.7GB capacity of the disc is used up in 4,380 seconds or
73 minutes. For a two-channel signal that might suffice,
but any multi-channel provision would clearly demand an
unacceptable reduction in maximum playing time and/or
sacrifice in either the signals resolution and/or
sampling rate. Its to offset this compromise
between signal quality and playing time while maintaining
signal integrity that DVD-A and SACD both incorporate
lossless compression. Unsurprisingly
given that SACD uses 1-bit DSD coding while DVD-A uses
linear PCM, the two utilize different compression
schemes. SACDs goes by the name of DST (a potential
confusion here with both DSD and DTS) and was developed
by Philips. For DVD-A a competition was organised by
Working Group 4 of the DVD Forum to assess the best
compression technology, and interested parties invited to
submit their offerings for independent testing. Four did
so, the eventual winner being Meridian Lossless Packing,
a technology developed in the UK principally by the late
Michael Gerzon, Peter Craven of Algol Applications and
Bob Stuart of Meridian. Although the other three
competitors remain officially unidentified I understand
them to have been Digital Theater Systems (DTS), JVC and
Matsushita - information which was not, I should stress,
given me by Meridian. Although
they specify the compression figure in different ways,
DST and MLP appear to achieve broadly similar orders of
data saving. In the case of DST the typical compression
ratio is quoted as 2.3-2.6 to 1 ie the signal data
is reduced to 38-43 per cent its original size. Figures
for MLP are quoted in terms of bit reduction per sample
per channel and vary according to the sampling rate of
the input signal. At 48kHz the average reduction is 5-11
bits, rising to 9-13 bits at 96kHz and 9-14 bits at
DVD-As maximum permitted sampling rate of 192kHz. A
12-bit saving per sample on a 24-bit input signal
corresponds to a compression ratio of 2 to 1. How
is this order data reduction achieved without any
compromise to signal content? Before exploring this using
MLP as the example, first a terminological aside about
the use of the word entropy in this context. If you know
something of thermodynamics youll understand
entropy to be a measure of disorder, the entropy of a
gas, for example, being higher than that of a solid
because of its lack of an organised structure. In
communication theory the term is used similarly, as a
measure of the disordered nature of a signal. Disorder
and the transmission of information might intuitively
seem incompatible (disorder suggests noise), but for
information to be conveyed disorder is essential. The
steady, unvarying carrier wave of a radio transmitter,
for example, conveys no information other than that the
transmitter is active: only if the carrier wave is
interrupted (e.g. Morse code) or modulated (e.g. AM or FM
radio) can it convey information. In what follows, then,
entropy and information can be
regarded as synonyms. Essentials All
lossless compression systems incorporate three key
functional elements: a framing process (which divides up
the incoming signal into appropriately sized chunks for
processing), a predictor and an entropy encoder (Figure 1). The predictor is
alternatively called a decorrelator but for the general
reader the former term gives a more ready insight into
its function. It and the entropy encoder operate in
series to reduce the signals data requirement at
two distinct levels. First the predictor reduces the
amount of data required to describe the signal waveform
itself, then the entropy encoder reduces the data
required to represent the output of the predictor.
Essentially the entropy encoder is analogous to the
zipping software used to compress files on
computer. Although the algorithms used in the audio and
general data contexts may differ, the process is
essentially the same. What distinguishes dedicated audio
compression from zipping is the formers signal
predictor element, without which the amount of
compression that can be achieved is much lower as
anyone who has tried zipping computer sound files will
know. Whereas dedicated real-time audio compression
systems can achieve compression ratios in excess of 2:1,
general purpose compression algorithms typically perform
only half as well, despite the inherent advantage of
processing off-line. As
its name suggests, the role of the predictor is to
estimate what the signal will do next. To do this it
analyses the signal using a suite of digital filters; in
the case of MLP a suite of both FIR (finite impulse
response) and IIR (infinite impulse response) filters are
available, of up to eighth-order. Having made its
estimate, the predictor then generates an error signal
which represents the difference between its prediction
and the actual signal waveform. These two pieces of
information prediction and error almost but
not quite constitute the predictors output; not
quite because if they did there would be no data saving.
Instead the predictor outputs the error signal plus the rules
it used to generate the prediction, which the decoder can
later employ to rebuild the predicted signal. In this way
a significant data saving can be achieved. The
output of the predictor then enters the second stage of
the compression process, the entropy encoder. What this
does is look for patterns in the predictor output which
can be exploited to reduce the data requirement still
further. Various methods of doing this are provided
within MLP, a proprietary algorithm first examining the
data to decide which of them Huffman coding, run
time coding etc will provide the most effective
data reduction in each instance. Entropy
encoding is a subject in itself but a simple example
suffices to illustrate the basic concept. Imagine you
have to code the English alphabet digitally. As there are
26 available letters (ignoring upper/lower case
distinctions) you would in the normal way require a 5-bit
digital word to identify each uniquely. For example, you
could arrange for a to be represented as
00001, b as 00010, c as 00011,
etc. But in any sufficiently large average English text
we know e will be the most frequently
occurring letter, which means 00101 will appear more
times than any other data sequence. If we code this most
common sequence as, say, 1, and the next most common
letter (t) as 10, etc then we will only have
to use a full 5-bit word to represent the most
infrequently occurring letters. In this way we can
potentially save a lot of data without losing any
information. This is an example of Huffman coding, which
is ideally suited to any input, like language, which is
highly variable from sample to sample but conforms to a
statistical pattern overall. Another
possible pattern type is temporal: for example, the
multiple repetitions of pixel colour that commonly occur
in a raster (bitmap) image, much of which may comprise
sky or sea or other large areas of consistent colour. In
this case run time coding is likely to be the most
efficient method of compressing the data. Instead of
sending multiple repeats of a particular code sequence
you simply send it once, appending an instruction to the
decoder as to how many times to repeat it. Still other
methods of entropy coding are particularly well suited to
other situations, depending on the nature of the patterns
within the data. MLP
extras While
framing, prediction and entropy encoding are common
features of any lossless compression system, individual
realisations will differ both in the details of these
processes and in the provision of other processing
elements which may be added to enhance performance. If we
look at a block diagram of the MLP encoder (Figure 2) we see the
expected predictor (decorrelator) and entropy encoder
stages, but there are other processing elements too.
Preceding the predictor stage are channel remap, shift
and lossless matrix stages, while after the entropy
encoder there is an optional output buffer stage (not
illustrated). The first three assist the data compression
or expansion processes while the third tackles another
important issue, that of data rate. Channel remapping, the first of the additional elements, has the capability to subdivide incoming channels into two or more data substreams. This allows the compressed signal to be recovered using a simpler decoder architecture, thereby saving on cost. A shift process is then applied to each data channel to recover any unused bit depth capacity, which occurs either when the input data is of less than 24-bit precision or when the channel is not fully modulated, as is the case for much of the time with typical audio content. Lastly before passing to the predictor stage the data channels are processed by a loss matrix which exploits any correlation between the signal content of different channels to cut the data requirement still further. In a conventional stereo recording, for example, correlation is typically high between the two channels as a result of central images being represented by signals of similar amplitude and phase in either channel. Similar correlations usually exist in multi-channel recordings also. An additional path from the lossless matrix, labelled LSB bypass in the diagram, is provided to route the least significant bits of the signal around both the predictor and entropy encoder stages. The signal at these low levels typically comprises noise (often deliberately added dither noise), a high-entropy signal component that can advantageously bypass the data compression process. Its
a feature of lossless compression that the output data
rate is variable. Whereas in a lossy compression system
more or less information can be discarded in order to
keep the output data rate constant, in a lossless process
the amount of data in the output necessarily reflects the
entropy of the input signal. When the amount of
information in the signal (its entropy) is low, so is the
output data rate, but when the signal entropy is dense
the output data rate must increase to reflect this. In
the case of a transmission channel or storage medium with
no limit on data rate capability, this characteristic of
lossless compression is of academic interest only. It
becomes very important, though, if the channel or medium
has a data rate limit sufficient to accommodate the average
requirement (as of course it must) but which is less than
the maximum that the lossless coder might generate
on certain high-entropy signals. This is the case with
DVD-A which has a maximum data rate of 9.6Mbps (megabits
per second) but is specified to carry up to six channels
of 24-bit/96kHz data, which potentially demand a peak
data rate of 13.824Mbps. This
where the provision of buffering becomes important. If
the data rate from the entropy encoder exceeds the
maximum allowable, the excess data is temporarily
diverted to a FIFO (first in, first out) buffer memory
and only read out again once the data rate has fallen
sufficiently. Figure 3
shows an example of buffering at work in MLP, the signal
in question being a 30-second excerpt from a six-channel
24-bit/96kHz recording which features closely-miked
cymbals in all six channels. Because of the virtually
random nature of this signal its entropy is unusually
high and the underlying compressed data rate reaches
12.03Mbps. As soon as the output of the entropy encoder
exceeds 9.2Mbps, however - just below the maximum 9.6Mbps
data rate supported by DVD-A - data begins to accumulate
in the buffer, awaiting sufficient fall in the entropy of
the input signal. When this occurs the buffer is
progressively emptied again. In the example the required
buffer memory is around 85kB and the graph scale goes up
to 256kB, but Meridian declines to identify just how
large a buffer MLP incorporates for DVD-A. In the extreme
case of the data rate requirement exceeding the buffer
provision, MLP offers the recording engineer various
options for reducing data within the source signal, by
trimming back the sampling rate or reducing the bit depth
on a channel by channel basis. This provision also allows
a producer to increase playing time if required. A block diagram of the MLP decoder (Figure 4) reveals, as you would expect, a mirror image of the encoder structure. What isnt apparent from the diagram is the decoders relative simplicity a key practical requirement since decoder complexity determines the cost of implementation in the end product. Meridian says that the computing power required to decode a two-channel data stream at 192kHz sampling rate is 27MIPs (millions of instructions per second), while six channels at 96kHz requires 40MIPs. These figures are well within the capability of inexpensive modern DSP chips. Dolby Laboratories is handling the licensing of MLP and will provide technical support in the same manner as for its own products. To date ten semiconductor manufacturers have expressed an interest in developing and selling MLP decoders, two of whom Motorola and Cirrus Logic (Crystal Semiconductor) have publicly announced that they will do so. With DVD-Audio set for launch next year, it isnt long before the first chips will be needed. |
||
|
|
|
||
You can view or download this article as an Acrobat file here |
||||
|
||||
|
Home | | Archive |
| Biog | |
Contact |
| Freeware |
| Links |
| Web words | |
||||
|