Print This Page

9. Quality assessment methods

9.2 Quality assessment through listening tests

When it comes to the assessment of audio quality and sound quality of a networked audio system through listening (assessing the quality of the invoked hearing sensation), the human auditory system makes things extremely difficult: a hearing sensation is affected by the quality of the sound source, acoustic environment, listening position and angle, the individual’s hearing abilities, preferences, expectations, short term aural activity image memory, long term aural scene memory, and all other sensory inputs of the human body such as vision, taste, smell, touch. For this reason, quality judgements based on hearing sensations have to be analysed for all factors that play a role in the hearing experience before any quality statement can be extracted about the audio system. Figure 902 presents a selection of the most important factors that play a role in the result of a listening session, with selected factors discussed in the remainder of this chapter..

Aural scene memory
As the definition of quality is ‘conformance to requirements’, always both the measurement - in the case of a listening session the hearing sensation - and the requirement are needed to come to a quality assessment. The problem is that for hearing sensations, requirements do not really exist. Every individual has different preferences or expectations for a hearing sensation, only an average hearing sensation can be assumed as requirement - at this moment only available indirectly through the sales results of theatre tickets, CD’s and music downloads. This data is not very reliable, as it is heavily biased with other factors not even included in figure 902, such as social peer pressure, culture, commercial factors.

The only compatible and relevant reference for quality assessment are previous hearing sensations stored in the brain’s memory as aural scenes. However, these are only relevant if they are undergone at exactly the same conditions as the conditions in the listening session. Any deviation in the environment - from all factors in figure 902 - make the memory invalid for reference. A fundamental conclusion - based on the simplified auditory processing model presented in figure 412 in chapter 4.3 - is that the assessment of the quality of a sound system can never be achieved by a single listening session because the aural scene memory is virtually always invalid. Listening tests always have to be performed in two or more sessions to create a reference and allow differential analysis to come to a quality assessment.

Aural activity image memory
In the auditory processing model, long term aural scene memory and short term aural activity image memory are presented. Because multiple listening sessions can only be conducted one-by-one, the reference used for differential analysis always uses memory. In case of long listening sessions, only the overall aural scene can be used for comparison - as it is stored in long term memory, allowing only aggregated quality assessments. If detailed quality assessments are required, the brain’s short term aural activity image memory of 20 seconds has to be used - which means that comparisons of hearing sensations have to take place within 20 seconds - using identical sound source signals. Listening sessions performed by switching between two situations while listening to an integral piece of music are not valid, as then the 20 seconds before and after the switch are never the same. The conclusion is that audio fragments used in listening sessions for detailed quality assessment have to be identical pieces of sound, shorter than 20 seconds.

Acoustic environment, listening position and listening angle
Listening sessions always take place in an acoustic environment. The environment can be a live space such as a concert hall, or a carefully designed acoustically optimized control room in a music studio. Only listening sessions performed in ‘dry rooms’ in auditory research laboratories, or a desolated open space without any wind such as in a desert on a structure high above the ground, can cancel out the acoustic environment. Apart from the acoustic environment, the listening position and angle significantly affects the hearing sensation - already a displacement between multiple listening sessions of several millimetres or degrees can change the result drastically. Two conclusions can be drawn: first, listening sessions should be undergone strictly in the sweet spot of a speaker system - facing the same direction for every session. In case of listening sessions to assess the quality of loudspeakers, a mechanical rotating system should be used to ensure that listening position and angle are always the same. Second, the result only has relevance for the listening position and angle in the acoustic environment the listening session was performed in. In any other acoustic environment the quality assessments obtained from the listening sessions are no longer valid. For live systems, the quality results are never valid because the audience in most cases consists of more than one person, who can not be located at the same listening position at the same time.

The anticipation of a result can cause test subjects to experience the result, even if there is none. Because of this placebo effect, all clinical trials in the medical field include two groups of test subjects, one receiving the drug under test, and the other receiving the placebo. The placebo effect also plays a role in listening tests - when a change is anticipated, even if there is no change, a change might somehow be detected. This is not a shortcoming of the listener, anticipation is simply one of the many factors that affect a hearing sensation - in fact, anticipation can significantly amplify the pleasantness of the hearing sensation - a concept used in many music compositions and performances. To assess the quality of a sound system however, anticipation should be avoided to prevent it from affecting the quality assessment. The simplest way to achieve this is to perform blind tests - with the test subject knowing that the audio fragment can be either different or the same.

If listening tests are conducted sighted instead of blind, the test subjects can be influenced by non-auditory signals, along with previous experiences associated with those signals. For example, seeing the mechanical construction of test objects can create an expectation of the listening experience, eg. large loudspeakers are expected to reproduce low frequencies very well. Of course, knowing the product brand and remembering the brand’s reputation also strongly affects the test results, rendering the outcome invalid(*9C).

Vision & other sensory organs
In many cases, the brain’s processing of audio signals is affected and sometimes overruled by other sensory inputs - a famous example is the McGurk Ba-Ga test described in chapter 4.3. If in listening sessions the nonaudio sensory inputs differ the outcome can be completely different. Even things often thought of as completely irrelevant for audio, such as the colour of cabinets and cables, or even the colour of the visual signals used to indicate sources in a listening test, can affect the quality assessment. For a series of relevant listening tests, the non-audio environment should be as constant as possible - eg. constant colours and temperature. For this reason, the consumption of food and drinks - constituting smell and taste - shortly before and during listening sessions should be avoided.

Sound source
All hearing sensations are affected by the quality of the sound source as described in chapter 1. To assess the quality of a system, a reference is required to compare results with the same sound source - thus ruling out the quality of the sound source. This is easy using pre-recorded materials - high resolution (eg. 24 bit 96 kHz) audio recordings can be used. It is impossible to assess a system’s quality in multiple listening sessions using real-life musicians - as the musicians will never play two music pieces exactly the same. This causes the references in the aural memory to differ because of the sound source quality, and not the audio system quality. When using pre-recorded sound source materials, it is important to know if the test subject (listener) is familiar with the material because that would allow the aural scene memory (with scenes most probably generated under different conditions) to affect the new hearing sensations.

To allow differential analysis of a listening session comparing a single parameter or process in a system (for example comparing one signal chain with an equaliser applied and one without), all other processes in the signal chains have to be exactly the same. When two physically different analogue devices are used, this is never the case - the gain error alone can cause up to 4 dBu level difference in case of a mixing console - significantly affecting the hearing sensation as louder signals are most commonly perceived as better sounding. This can be eliminated partially by calibrating all signal chains in the listening test to produce the same output volume. As the human auditory system is capable of detecting level differences down to 0.5 dB, listening test systems need to be calibrated within 0.5 dB or lower.

Assessment of preferences vs. assessment of detection thresholds
Listening tests can be performed to assess the preferences of test subjects when the differences between audio systems are high. With small differences however, it becomes increasingly difficult to assess preferences - in that case, first an assessment of detection thresholds can be performed. For this purpose, ABX testing is an accepted method - featuring blind listening to two situations A and B, then confronting the test subject with an unknown situation X, which can be either A or B. Performing an ABX test multiple times gives a statistically significant statement on whether the difference could be detected or not.

Training strongly affects the result of listening tests. Trained listeners have learned to extract detailed information from the aural activity scene information and keep it in long term memory, not only remembering more details than untrained listeners, but also being able to report the results better - knowing the psycho-acoustic vocabulary.

Listening to audio sources as form of short-term training in AAAB test sequences introduces a preference bias as the test persons get accustomed to the A source, and might perceive the B source as less preferred. This makes AAAB tests unsuited for preference tests. For detection tests however, AAAB tests can be suited if the difference between objects are extremely small.

>>9.3 Conducting listening tests

Return to Top