CD - Istvan Mihalcz's Audio Pages

The CD Format

Understanding the CD-format and other formats at a deeper level, will make it much easier to pursue high fidelity audio playback and recording.

For understanding the CD-format we must first understand PCM.

PCM means "Pulse Coded Modulation" which is a rather stupid and misleading name for what it really does, as there are neither pulses nor modulation present in the process of PCM.

What PCM does?

Well, it does two things:

1) A analog signal (which has the attribute to be continuous) is captured by only looking at it from time to time.

If we look at a continuous signal at equally spaced time instants, what we see is only an amplitude: As we look NOW, the amplitude is high. The next NOW-instance (equally spaced) the amplitude is low.

This is process called discrete time sampling. We look at a continuous wave, but only at discrete (and equally spaced) time-points.

The wave is continuous, but our sampling is time-discrete!

Every discrete time-point, we look at the wave signal, we write down the amplitude that is present at this exact time-point.

PCM is time discrete sampling of a continuous wave and writing down (storing) the amplitude values in a binary coded form.

PCM has exactly Two Parameters

how fast the sampling
how high the amplitude resolution

As for the CD-format the parameters are:

we sample as fast as 44.100 times in a second
we sample the momentary amplitude as accurate as 16-bit which represents 65536 levels

The following diagram shows you a 1kHz sine wave, sampled with CD parameters (16bits, 44.1kHz):

Measured an 1kHz sine wave signal on DAC's output (reality=theory!):

Wow, this looks like a sine wave. It seems that this PCM works pretty well.

What happens, if we pull some throttle and choose to sample a higher frequency sine, let's say 6 kHz:

With 6kHz, we have 6-times less sample points per wave-period, compared to 1kHz, so the 6kHz wave looks a little coarse, as one wave is sampled at only 8 discrete time-points. But hey, if we smooth out the squares with some kind of filter, it will come very close to a real sine.

So let's stay optimistic and further increase the frequency of our sampled sine wave to14kHz:

This looks strange. Does this look like a sine wave? There seems to be some real strong amplitude modulation.

As we see, there are less than 4 sample-points per wave-period. How can we guarantee, that when we sample, we always get the highest and lowest point of the wave ? Well, actually we cannot, as our sampling-frequency of 44.1kHz is not synchronized to our sine wave and we have so few sampling-points per wave.

However, we still see that there is a 14kHz wave, but its volume (amplitude) has been modulated by a lower frequency of about 2kHz. Let's call this a beat-frequency.

To make thinks even stranger, the above recording looks as if a second tone has been added.

In fact, we always get that beat-frequency with any sampled wave that is not exactly a whole division of the sample-rate. So we expect no beats with 22.05 kHz, 14.7kHz, 11.025 kHz, 8820Hz, 7350Hz, 6300Hz, and so on.

Let's give that a try and take a look at 14.7kHz

Yes. There's no beat to see. All exactly the same amplitude.

But what happened to the waveform? This looks more like a sawtooth than like a sine-wave.

Well, with exactly 3 sample points per wave-period, it cannot be any other way.

Above and below any whole division of the sample-rate we get beats. The higher the frequency (or the lower the available sample-points per wave-period) the higher the amplitude of the beat. From app. 18kHz on we got a beat (amplitude-modulation) of 100%.

So this is 18kHz:

We can hardly tell, what this one wants to be. However, as our recorded frequency approaches the next whole division of the sample-rate, the beat frequency slows down. Look at 20kHz:

This has a 4kHz beat with 100% amplitude modulation.

The following is 21kHz:

The beat frequency goes down to app. 2kHz.

When we increase the frequency of our sampled-sine to 22kHz, we get a real slow beat:

Let's step back a little:

Wow, a 22kHz sine wave, sampled with the CD format of 44.1kHz results in a sine wave with a 100Hz beat.

As a matter of fact, this does not only happen when we sample very high-frequencies, but with every frequency that approaches a whole division of the sample-rate. For lower frequencies we just have a lower amplitude-modulation.

What did Nyquist say?

maximum data rate in a noiseless channel = C = 2*W log base2( L ) bits/sec

* where 2W is 2 times the highest frequency contained in the noiseless channel, and

* where L = number of discrete levels (e.g., binary = two levels, 0 and 1)

As Nyquist seems to have been more interested in data transmission than in high-fidelity, we should not wonder, that his statement just defines a maximum data-rate of a communications channel.

If we consider the presence of a frequency in a communication channel to be a piece of information, we can agree, that we need twice the sampling rate in order for this frequency to show up. And as we see in the above diagrams, although those signals can look awful, the frequency 'as an information' shows up.

Later, Claude Shannon said:

If a function s(x) has a Fourier transform F[s(x)] = S(f) = 0 for |f| > W, then it is completely determined by giving the value of the function at a series of points spaced 1/(2W) apart. The values sn = s(n/(2W)) are called the samples of s(x).

This goes much further than Nyquist's words, in that it states, that a signal which consists of sine waves with a maximum frequency of W is completely described by recording its values twice as fast as W.

The real cool thing is that Shannon also gave an interpolation formula to get back to the original signal:

Unfortunately this formula includes an infinite sum...

This mean: if we do not take into account infinite samples, we cannot get exactly back to the original signal!

Correctly, we never get back exactly the original signal, in finite time.

A consequence of the sampling theorem, that a signal cannot be band-limited AND time-limited. If a signal is 100% band-limited, it cannot be time-limited anymore. If it is time-limited, it cannot be band-limited.

The first prerequisite of Shannon's sampling theorem is that the Fourier transform or the input signal is zero for all signal frequencies above half of the sampling frequency. That means that the input signal must be 100% band-limited, which at the same time means, it cannot be time-limited anymore, and therefore infinite samples would be needed in order to exactly reconstruct it.

If this 100% band-limit is not exercised prior to sampling, we will get what is called alias distortion. Alias distortion means, that more than one input signal is able to generate the same samples, and then the samples do not describe an identical signal, but a variety of signals (including the aliases).

If we do not 100% band-limit the input signal prior to sampling, we cannot get exactly back to it !

Of course it is not possible to realize a 100% band-limit during recording. Therefore we will always get at least some small amount of alias distortion.

Please note, that all of the above green diagrams were constructed from mathematically pure sin waves which are 100% band-limited and infinite. So we have met the band-limit criterion. But why do some of them look so bad, and how can we get back to the original sinus?

The green diagrams show just the samples taken. You will get the same picture if you play those samples with an ideal R2R DA converter that has infinitely fast slew rate and infinitely short settling time and infinite precision (conditions we can never achieve in reality).

Now theory says, that if we run that stair signal to an ideal low-pass filter (with a infinitely sharp transition at half the sampling frequency), we will get back to one exact input signal, which is the original pure sine in this case.

An ideal lowpass filter has a step response that is infinite.

There a damn much infinites involved in this game of sampling.

Oversampling

Oversampling is an attempt to make use of Shannon's interpolation formula, in order to get the beat frequencies (alias) out of the sampled signal (that we interpret as correctly recorded samples).

As Shannon's formula requires infinite calculation, an oversampling filter has to work with something that take less time than infinity.

As a result, the outcome is less than perfect. The beats are reduced the more, the sharper the filter works and the longer it rings... It is actually the ringing that bridges the beats.

With the DF1704 digital filter which has a stop-band attenuation of -115dB, I was able to recognize beat-products starting to appear at about 14kHz, with a simple analog oscilloscope.

The following scope-shot is 21kHz and shows a real-world beat:

If we compare that to the 21kHz sample-diagram above, we realize, that the oversampling filter was able to reduce the beat amplitude. This is possible if the beat goes over few cycles of the sampled wave. If the beat frequency becomes slower, the oversampling filter is less able to filter it away.

Just as in the following measurement of 22kHz:

This looks very similar to the 22kHz diagram above. Note that the beat frequency can become infinitely small, as the sampled sine wave approaches a whole division of the sample-rate. This implies that the interpolation filter must process infinite samples at the same time, and it must ring eternally.

Adding to the confusion is the fact, that DA converter designs, that employ less or no over-sampling at all (that play all the beats undiminished), usually can sound very good (more musical). This is believed to be related to the absence of filter ringing in a non-oversampling design.

In a way, we can say that our imperfection to meet the infinite requirements to exact sampling and filtering make PCM a complex tremolo machine with a variety of Speed and Level knobs.

A tremolo machine is an effects device that modulates the volume (loudness) of an input signal.

With the Speed knobs, we determine the speed of the tremolo effect and with the Level knobs we can adjust how much of that effect will be operated on the signal.

The closer the sampled frequency comes to a whole division of the sample-rate, the slower the beat frequency.

The Level knob of PCM seems to work in ranges that are determined as follows:

How many sample-points are available for sampling a single wave period ?

If we have between 2 and 3 sample points (14.7kHz .. 22.05kHz), the tremolo level is high (up to 100%).

If we have between 3 and 4 sample points available for a single wave period (11.025kHz .. 14.7kHz), we have up to 50% tremolo level.

For 4 to 5 sample points per wave-period (8.82kHz .. 11.025kHz) tremolo Level goes further down to about 25%.

It seems, that each time we add one sample point to a wave-period, the tremolo level is cut in half.

That means that for frequencies as low as 1 tenth of the sample-rate, the tremolo level is below 1%.

What can We do About it ?

Let's take a real close look at a simple 14kHz sine wave recorded with the CD format:

How can we improve that one ?

What about going from 16 bits to 24 bits resolution ?

See below the same recording done with 24 bits:

That looks pretty much the same as the first one. How can it be any other way ? With 16 bits we already have an amplitude resolution of 65536 steps, which is something like ... Wow ...

What about increasing the sample-rate? Okay, let’s try it out with twice the speed. See below 14kHz sampled with 88.2kHz:

That is a much better reminder of a sine wave. Let's add some filters to smooth out the edges, and we have made real progress.

Now, as storage place becomes cheaper and cheaper each day, why not doubling the sample-rate again?

See below 14kHz, sampled with 176.4kHz (just 4-times the CD-sampling-speed):

I like that one, that's music to my eyes!

My opinion goes like this:

We can have higher fidelity with 16 bits amplitude resolution and 10 or more sample-points per wave-period, as we can recover the original signal with a very small amount of beat products (less than1%).

As we cannot meet all those infinity requirements of exact sampling according to Shannon in the real world, and because of those issues that the sharp filters do not necessarily sound best to our ears, we can loosen all those sharp conditions by just sampling faster.

Considering all above mentioned things, I hope you can understand why some HIGH END FUNS preferring to listen LP instead of CD. Exist other digital recording formats, like DVD-Audio which are using 98kHz sampling rate, or the Super Audio DC (SACD) with variable sampling rate till 2.82MHz - but both formats become dead, and their production was stopped.