Skip to content

Select your region & language

Global

Region

Technical Report: Sound Quality Evaluation (Part 2)

5. The underlying concept of loudness calculation

The following three phenomena influence how humans perceive the loudness of sound:

  1. Frequency characteristics of the ear

  2. Spectral masking (a phenomenon that occurs on the frequency axis)

  3. Temporal masking (a phenomenon that occurs on the time axis)

When calculating loudness, these three phenomena must be taken into consideration.

5-1 Equal Loudness Curves (Pure Tones)

One of the important factors in calculating loudness is the frequency response of hearing. The equal loudness curves for pure tones are shown below, and you can see that sensitivity is good from 2 kHz to 4 kHz, as shown in orange, and decreases at lower frequencies.

Furthermore, this characteristic varies depending on the sound pressure. At high sound pressure levels (upper data), the characteristic is relatively flat, while at lower sound pressure levels (lower data), the sensitivity at low frequencies becomes smaller. The frequency response of human hearing is very complex. Loudness calculations take these complex characteristics into account. Incidentally, the A-weighted curve uses a filter with characteristics similar to the frequency response of hearing. However, because it uses a filter corresponding to the equal-loudness curves for medium sound pressure levels (the 40 phon curve in the figure), it may differ from the perceived loudness of sound.

5-2 Spectrum Masking

The effect of spectral masking is also an important factor in loudness. Masking is the phenomenon where, when one sound is heard and another sound is played, the second sound is drowned out (masked) by the first sound and becomes inaudible.

Let's say you are currently listening to 1 kHz narrowband noise. In this case, the shaded area in the graph above is masked by the 1 kHz narrowband noise. Even if a new sound is added to this area, the loudness of the sound will not increase. In fact, depending on the sound, it may be completely drowned out and inaudible.

The shape of the spectral masking curve varies depending on the frequency. Furthermore, the curve shape differs with different sound pressure levels. Spectral masking is a nonlinear and complex phenomenon. The resulting curve resembles that of a band-pass filter. The range corresponding to the passband of this filter is called the critical band (unit: Bark).

The reason spectral masking occurs lies in the structure of the ear. The following diagram shows the structure of the ear. Sound entering through the ear canal travels through the eardrum and vibrates the ossicles before reaching the cochlea. The cochlea is an organ that easily performs frequency decomposition of sound. It is tubular in shape, and its coiled form resembles a snail's shell.

The following diagram (shaded area) shows the cross-section of this spiral tube after it has been stretched out.

The inside of the cochlea is divided into two parts, upper and lower, by a membrane called the basilar membrane. Vibrations that reach the cochlea travel from the entrance (left side) to the back (right side), causing this membrane to vibrate and exciting nerve cells on the basilar membrane. At this time, a specific part of the basilar membrane vibrates particularly strongly. The part that vibrates strongly varies depending on the frequency of the incoming sound; for high-frequency sounds, the large amplitude occurs near the entrance, while for low-frequency sounds, it occurs further inside. Therefore, different sound frequencies excite different nerves, which is why we can perceive high and low pitches. The basilar membrane does not vibrate at just one point on its surface, but vibrates with a range. For example, even if you hear a 1 kHz sound, nerves corresponding to surrounding frequencies will also be excited. If you then hear a sound with a frequency slightly higher than 1 kHz, the already excited nerve cells cannot be further excited, and the loudness of the sound will not seem to change much. This is the phenomenon of spectral masking.

5-3 Spectral Masking and Sound Intensity

To understand the relationship between spectral masking and sound intensity, we will explain it using a simplified model of the masking curve.

The top diagram on the previous page shows the case where two sounds, A and B, are far apart on the frequency axis. The square areas represent the areas masked by each sound, and the lightly shaded areas represent the masked areas. The loudness of the sound is proportional to these areas, so when B is added, the sound feels about twice as loud as when only A is present. On the other hand, the bottom diagram shows the case where the frequencies of the two sounds are close together. The areas masked by the two sounds overlap. As a result, even when B is added, the area does not change much compared to when only A is present, and the loudness only increases slightly. The energy of these sounds is twice as much in both the top and bottom diagrams, so when comparing the sound pressure levels, they will be the same value.

5-4 Chart for Loudness Calculation

To calculate sound intensity while taking these phenomena into account, ISO 532B uses a chart to determine loudness.

To simulate frequency masking, a model like Figure A above is used. Because the curve of the frequency masking curve has a gentler slope towards higher frequencies and a greater impact on the surroundings than the curve towards lower frequencies, only the upper curve is considered. To determine the loudness, a 1/3 octave analysis is first performed, and the results are plotted on a chart like the one shown. The area below the plotted curve is calculated, and the corresponding loudness is read. ISO 532B provides 10 different charts depending on the sound pressure and sound field conditions. The appropriate chart is used to determine the loudness.

5-5 Critical Bandwidth

Chart 5.4 takes into account the frequency characteristics of hearing and spectral masking, as discussed previously. Another factor to consider here is the frequency resolution of hearing. The frequency resolution of hearing roughly corresponds to a 1/3 octave bandwidth (when discussing loudness; the resolution for pitch is much finer). However, in the frequency range below 500 Hz, the bandwidth is larger than 1/3 octave, and the resolution becomes coarser. Therefore, in the ISO 532B method, the low-frequency bands are added together to broaden the bandwidth according to the resolution of hearing. This is why the vertical lines in the low frequencies are spaced closer together in the ISO chart, and why 4 or 3 bands are grouped together.

Masking for 5-6 hours

Another important element of loudness is temporal masking (also known as time-based masking or chronological masking). This is masking that occurs along the time axis.

For example, if one sound stops and another sound is played briefly immediately afterward, the second sound may be drowned out by the first and become inaudible. This is because the vibration of the membrane in the ear while listening to the first sound does not stop immediately but gradually attenuates. The excitation of the nerves in contact with the membrane also gradually decreases. Even if the next sound is played before the excitation has sufficiently attenuated, that sound will not be heard.

Incidentally, the loudness calculation method standardized by ISO does not take into account the effect of temporal masking. Therefore, the ISO method of loudness can only evaluate steady-state sounds.

5-7 Ono Sokki 's Sound Quality Evaluation System

Ono Sokki 's sound quality evaluation system calculates loudness using a method based on ISO 532B. However, since ISO cannot evaluate sounds that fluctuate over time, the system also incorporates the effect of temporal masking (post-masking) into its calculations. The calculation results are provided every 2 ms.

5-8 Loudness Criteria

The standard sound for loudness is a pure tone with a sound pressure level of 40 dB and a frequency of 1 kHz. This corresponds to 1 sone (loudness level of 40 phon).
A sound that sounds the same volume as this sound is called a 1 sone, and a sound that sounds twice as loud is called a 2 sone.