Print This Page

5. Sampling issues

5.9 Temporal resolution

Chapter 5.3 describes a digital audio system with a sample frequency of 48 kHz to be able to accurately represent frequencies up to 20 kHz. For continuous signals, this frequency is the limit of the human hearing system. But most audio signals are discontinuous, with constantly changing level and frequency spectrum - with the human auditory system being capable of detecting changes down to 6 microseconds.

To also accurately reproduce changes in a signal’s frequency spectrum with a temporal resolution down to 6 microseconds, the sampling rate of a digital audio system must operate at a minimum of the reciprocal of 6 microseconds = 166 kHz. Figure 515 presents the sampling of an audio signal that starts at t = 0, and reaches a detectable level at t = 6 microseconds. To capture the onset of the waveform, the sample time must be at least 6 microseconds.

In the professional live audio field, a 48 kHz sampling rate is adopted as standard, with some devices supporting multiples of this rate: 96khz and 192kHz. (Some devices also support 44.1 khz and 88.2 kHz for compatibility with the music recording field, eg. the Compact Disk). However, apart from the temporal resolution of a digital part of an audio system, the temporal characteristics of the electro-acoustic components of a system also have to be considered. In general, only very high quality speaker systems specially designed for use in a music studio are capable of reproducing temporal resolutions down to 6 microseconds assumed that the listener is situated on-axis of the loudspeakers (the sweet spot). For the average high quality studio speaker systems, a temporal resolution of 10 microseconds might be the maximum possible. Live sound reinforcement speaker systems in general can not support such high temporal resolutions for several reasons.

Firstly, high power loudspeakers use large cones, membranes and coils in the transducers - possessing an increasing inertia at higher power ratings. A high inertia causes ‘time smear’ - it takes some time for the transducer to follow the changes posed to the system by the power amplifier’s output voltage. Some loudspeaker manufacturers publish ‘waterfall’ diagrams of the high frequency drivers, providing information about the driver’s response to an impulse - often spanning several milliseconds. The inertia of a driver prevents it from reacting accurately to fast changes.

Secondly, live systems often use multiple loudspeakers to create a wide coverage area, contradictory to the concept of creating a sweet spot. The electro-acoustic designer of such a system will do what ever is possible to minimize the interference patterns of such a system, but the result will always have interference on all listening positions that is more significant than the temporal resolution of the digital part of the system.

Aside from audio quality parameters, the choice of a sampling rate can also affect the bandwidth - and with it the costs - of a networked audio system. Table 504 on the next page presents the main decision parameters.

As a rule of thumb, 48 kHz is a reasonable choice for most high quality live audio systems. For studio environments and for live systems using very high quality loudspeaker systems with the audience in a carefully designed sweet spot, 96 kHz might be an appropriate choice. Regarding speaker performance, 192 kHz might make sense for demanding studio environments with very high quality speaker systems - with single persons listening exclusively in the system’s sweet spot.

table 504: Main decision parameters for the selection of a digital audio system’s sample rate

Audio quality issues
desired temporal resolution  

48 kHz
96 kHz
192 kHz
20 μS - high quality
10 μS - very high quality
5 μS - beyond human threshold
typical latency  

48 kHz
96 kHz
192 kHz
4 ms default signal chain
2 ms default signal chain
1 ms default signal chain
application type  

sweet spot
wide coverage
supports high temporal resolutions
difficult to achieve high temporal resolutions

low power
high power 

might support 96kHz and 192 kHz at sweet spot
supports 48kHz

cost & logistics issues
DSP power  

48 kHz
96 kHz
192 kHz
default DSP power rating
requires double DSP power
requires quadruple DSP power
cable bandwidth (channels)  

48 kHz
96 kHz
192 kHz
default channel count 
reduced to 50% 
reduced to 25%
eg. Dante supports 512 channels
eg. Dante supports 256 channels
eg. Dante supports 128 channels

48 kHz
96 kHz
192 kHz
default storage
requires double storage
requires quadruple storage
eg. 24GB for 1 hour 48ch (24-bit)
eg. 48GB for 1 hour 48ch (24-bit)
eg. 96GB for 1 hour 48ch (24-bit)

>>5.10 Jitter

Return to Top