Print This Page

5. Sampling issues

5.10 Jitter

In any digital audio system, time information is ignored. It is not registered by A/D converters, and it is not passed through the distribution protocol - the packets in AES/EBU bit streams or CobraNet bundles only include level information, not time information. Instead, the sample time is assumed to be reciprocal of the system’s sample frequency - generated by system’s master word clock. Furthermore, it is assumed that all samples are sent - and received - sequentially, and that there are no missing samples. Even the DSP algorithm programmer just assumes that his software will be installed on a system that runs on a certain sampling frequency - eg. filter coefficients are programmed to function correctly at 48 kHz. If the system’s master word clock runs at 47 kHz, the system will probably be perfectly stable, but all filter parameters will be a little off.

All devices in a digital audio system have an internal word clock - most often synchronised to a common external word clock through a PLL circuit. Both the internal word clock signal and the external word clock are pulse trains at a frequency equal to the system’s sample rate that provide a rising edge to trigger all processes in the system.

An ideal word clock will produce a rising edge in constant intervals. But in reality, noise in related electronic circuits (eg. oscillators, buffer amps, PLLs, power supplies) and electromagnetic interference and filtering in cabling will distort the word clock’s waveform - causing the rising edge to come too early or too late, triggering the processes in the digital audio system at the wrong time. The signal that represents the deviation from the ideal time is called jitter. In digital audio products it is normally a noise-shaped signal with an amplitude of several nanoseconds.

In digital audio systems, all devices synchronise to a common master clock through their PLL circuits. The PLL will follow only the slow changes in phase (low frequencies in the external word clock’s jitter spectrum), and ignore the fast changes - keeping the PLL’s own VCO’s jitter spectrum. The jitter spectrum of the PLL’s output (the device clock) is a mix of the low frequencies of the external word clock’s jitter spectrum, and the high frequencies of the PLL’s VCO jitter spectrum. The frequency where the external jitter starts to get attenuated is called the PLL’s corner frequency. In digital audio devices for live applications, this frequency is normally between 10 Hz and 1 kHz with a relatively short synchronisation time (the time it takes for the PLL to synchronise to a new word clock). For studio equipment this frequency can be much lower - offering a higher immunity for external word clock quality, but with a longer synchronisation time.

Jitter in a device’s word clock is not a problem if the device only performs digital processes - a MAC operation in a DSP core performed a little earlier or later than scheduled gives exactly the same result. In digital audio systems, jitter only causes problems with A/D and D/A conversion. Figure 519 shows that a sample being taken too early or too late: the jitter timing error - results in the wrong value: the jitter level error. As time information is ignored by the DSP processes and distribution of a digital audio system, only the level errors of an A/D converter are passed to the system’s processes. At D/A converters however, samples being output too early or too late distort the audio signal in level and in timing. The listener to the system hears both A/D and D/A converter jitter level errors, but only the D/A converter’s timing error.

All major mixing consoles on the market today specify a jitter peak value of the internal PLL’s output of 5 nanoseconds. This value is more than a factor 1000 under the human hearing threshold of 6 microseconds - we propose to assume that such small time errors can not be detected by the human auditory system.

Jitter level errors generated by this time error however fall in the audible range of the human auditory system. For small jitter time errors, with a sine wave audio signal A(t) and a noise shaped jitter signal J(t) with bandwidth B, the jitter level error E(t) is generated as presented in figure 520(*5S).

The result for a single frequency component in the audio signal is presented in figure 520c. For sinusoidal jitter, the term J(t).S(t)/dt can be represented by the expression E(t) ={1/2 sin((ws+wj)t)+ 1/2 sin((ws-wj)t)}. Adding up all frequencies in the jitter spectrum, it can be shown that the jitter spectrum folds to the left and the right of the audio frequency - this ‘jitter noise picture’ can be produced by any FFT analyser connected to a digital audio device that processes a high frequency sine wave. Repeating this calculation for every frequency component in a real life audio signal gives the resulting total jitter level error. The overall peak value of the jitter level error (E) is linear with the derivative of the audio signal: the higher the frequency, the faster the signal changes over time, the higher the jitter level error. The worst case is a 20kHz sine wave, generating a jitter level error at a 64dB lower level. As most energy in real life audio signals is in the low frequencies, the majority of the generated jitter level errors will be far below -64dB.

Low frequency jitter correlation
As PLL circuits follow the frequencies in the master word clock’s jitter spectrum below the PLL’s corner frequency, the low frequency jitter in all devices in a digital audio system will be the same - or correlated. Theoretically, for a signal that is sampled by an A/D converter and then directly reproduced by a D/A converter, all jitter level errors generated by the A/D converter will be cancelled by the jitter level errors generated by the D/A converter, so there is no jitter level error noise. In real life there is always latency between inputs and outputs, causing the jitter signals to become less correlated for high frequencies. If the latency between input and output increases, then the frequency at which the jitter is correlated will decrease. At a system latency of 2 milliseconds, the correlation ends at about 40 Hz - this means that in live systems, low frequency jitter (that has the highest energy) is automatically suppressed, but high frequency jitter in A/D converters and D/A converters just add up. In music production systems - where the audio signals are stored on a hard disk, posing a latency of at least a few seconds - and of course in playback systems where the latency can grow up to several years between the production and the purchase of a CD or DVD, the low frequency jitter signals are no longer correlated and all jitter level errors will add up.

For packet switching network protocols using the Precision Time Protocol*5T (PTP), such as Dante and AVB, the synchronisation is partly covered by the receiver’s FPGA logic, adjusting a local oscillator to run in sync with up to 10 synchronisation packets per second. This means that the equivalent corner frequency of a PTP receiver is under 10Hz - correlating only for very low frequencies. In such systems, the influence of an external wordclock distributed through low latency networks as in figure 513c is not significant.

audibility of jitter

Assuming a 0dBfs sine wave audio signal with a frequency of 10kHz as a worst case scenario, a jitter signal with a peak level of 5ns will generate a combined A/D and D/A jitter noise peak level of:

EA/D+D/A = 20.log(2.5.10-92.π.10.103) = -64dBfs

When exposed to listeners without the audio signal present, this would be clearly audible. However, in real life jitter noise only occurs with the audio signal in place, and in that case masking occurs: the jitter noise close to the audio signal frequency components will be inaudible, so the average audio signal’s spectrum will mask a significant portion of the jitter noise.

Note that the predicted level is the jitter noise peak level generated by 0dBfs audio signals. In real life, the average RMS level of jitter noise will be lowered by many dB’s because of the audio program’s crest factor and the system’s safety level margins used by the sound engineer. Music with a crest factor of 10dB played through a digital audio system with a safety level margin of 10dB will then generate jitter noise below -84dBfs.

The audibility of jitter is a popular topic on internet forums. Often a stand-alone digital mixing console is used in a listening session, toggling between its internal clock and external clock. In these comparisons it is important to know that such comparison sessions only work with a stand-alone device. If any other digital device is connected to the mixer, then clock phase might play a more significant role in the comparison results than jitter.

In uncontrolled tests, many subjective and non-auditory sensations have a significant influence on the result. More details on quality assessment methods are presented in chapter 9.

In multiple clinical tests, the perception threshold of jitter has been reported to lie between 10 nanoseconds(* 5U) for sinusoidal jitter and 250 nanoseconds(* 5V) for noise shaped jitter - with actual noise shaped jitter levels in popular digital mixing consoles being below 10 nanoseconds.

>>6) Distribution & DSP issues

Return to Top