gms | German Medical Science

GMS Zeitschrift für Audiologie — Audiological Acoustics

Deutsche Gesellschaft für Audiologie (DGA)

ISSN 2628-9083

Audiovisual realization of the subjective listening effort measurement method ACALES

Research Article

  • corresponding author Saskia Ibelings - Hörzentrum Oldenburg GmbH, Oldenburg, Germany; Institute of Hearing Technology und Audiology, Jade University of Applied Sciences, Oldenburg, Germany
  • Michael Schulte - Hörzentrum Oldenburg GmbH, Oldenburg, Germany; Cluster of excellence “Hearing4All”, Oldenburg, Germany
  • Melanie Krüger - Hörzentrum Oldenburg GmbH, Oldenburg, Germany; Cluster of excellence “Hearing4All”, Oldenburg, Germany
  • Inga Holube - Institute of Hearing Technology und Audiology, Jade University of Applied Sciences, Oldenburg, Germany; Cluster of excellence “Hearing4All”, Oldenburg, Germany

GMS Z Audiol (Audiol Acoust) 2020;2:Doc04

doi: 10.3205/zaud000008, urn:nbn:de:0183-zaud0000083

This is the English version of the article.
The German version can be found at: http://www.egms.de/de/journals/zaud/2020-2/zaud000008.shtml

Published: April 24, 2020

© 2020 Ibelings et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Abstract

Speech intelligibility can be positively influenced by the combination of auditory and visual speech information. Not only do people with hearing difficulties benefit from the additional information provided by the mouth image, but also those with normal hearing. However, the results regarding listening effort have so far been ambiguous. Although one study showed a decrease in listening effort, another showed an increase in listening effort when the mouse image was also presented. The aim of the current study was to measure subjective listening effort in an audiovisual situation when compared to a purely acoustic presentation. The adaptive scaling method ACALES (Adaptive CAtegorical Listening Effort Scaling, see Krueger et al., J Acoust Soc Am. 2017 06;141(6):4680–93) was used. Since ACALES has previously only been used for purely acoustic stimuli, it was first necessary to extend the method to play videos. In the acoustic as well as in the audiovisual condition, sentences of the Oldenburg sentence test (OLSA) were presented in the presence of different background noises. Additionally, in the audiovisual condition the corresponding mouth image of the speaker was shown on a screen. The measurements were performed with 15 young participants that had normal hearing and ten older participants with hearing impairment. Besides measuring listening effort in acoustic and audiovisual conditions for different maskers, the intra- and inter-individual standard deviation, as well as the test-retest-reliability, were determined. A dependence on the masker was found for both conditions. In addition to a significant difference between audiovisual and acoustic conditions, there was also a significant difference between the subject groups. The measurement method is suitable for recording interindividual differences in the evaluation of listening effort and has a good reliability

Keywords: listening effort, ACLES, audiovisual, hearing impairment, speech in noise


Introduction

and intelligible to the listener, the situation can still be perceived as tiring and strenuous. This is evident for both normal-hearing and hearing-impaired people, especially in situations where background noise is high or reverberation is strong [1]. As the signal-to-noise ratio (SNR) decreases, speech intelligibility decreases. At the same time, the concentration and effort required for comprehension increases steadily [1]. This additional effort can be described as listening effort [2]. In addition to background noise (masking) and the resulting SNR, listening effort can also be negatively influenced by factors such as hearing loss [1], [3], [4] and age [5], [6], [7]. The use of hearing aids can have a positive effect on listening effort [1], [3], [8], [9]. Another important, influencial factor is the use of the mouth image. The additional information obtained through the mouth image can be used to complete what was not understood and correct what was misunderstood [10], [11]. Studies have shown that speech intelligibility can be improved by using the mouth image [9], [12], [13]. However, the influence on listening effort is not clear. In contradiction to the finding of a decrease in listening effort [14], the use of the mouth image can also lead to an increase in listening effort [5] or no change [9].

The results of the studies mentioned above were produced using objective measurement methods (such as dual-task paradigms). In contrast, in this study an adaptive measurement procedure was used to evaluate the subjective listening effort (Adaptive CAtegorical Listening Effort Scaling, ACALES, [15]). In ACALES, sentences from the Oldenburg sentence test (OLSA) [16] are presented in background noise. The task of the participants is to evaluate the perceived listening effort on a 13-level category scale from “no effort” to “extreme effort” and a 14th additional category “only noise”. Each category is scored from 1 ESCU (“no effort”, unit: effort scaling categorical unit) to 13 ESCU (“extreme effort”) or 14 ESCU (“only noise”). During the measurement, the SNR is adaptively changed depending on the response of the participants. The result is a function in which each listening effort category is assigned an SNR value [15].

In order to investigate the influence of the mouth image on subjective listening effort, it was necessary to extend the ACALES measurement method, which had previously only been used for purely acoustic stimuli, to incorporate the reproduction of audiovisual stimuli. Based on the findings of previous studies, it was assumed that the mouth image had an influence on listening effort [5], [14] and that there was a difference in listening effort scores between normal-hearing and hearing-impaired persons [1], [3], [4]. Furthermore, it was expected that temporally stationary maskers increase the listening effort more than temporally fluctuating ones [1], [15]. With fluctuating maskers, it is possible to listen in the temporal gaps, which improves speech intelligibility [17] and therefore probably reduces the listening effort. Furthermore, since maskers from a source having the same sex as the OLSA speaker have a similar frequency spectrum, it was assumed that these maskers lead to a greater listening effort than maskers with the spectrum of a speaker of the opposite sex [18]. To evaluate the extended measurement procedure, the intra- and inter-individual standard deviations and test-retest reliability were determined. The study thus had the following purposes:

  • Determination of the subjective listening effort
    • for purely acoustic and audiovisual stimuli
    • for different maskers
    • for normal-hearing and hearing-impaired people
  • Determination of the intra- and inter-individual standard deviation
  • Determination of the test-retest reliability

The investigation was limited to the analysis of listening effort. Although the relationship between listening effort and speech intelligibility would have been of interest under these conditions, the simultaneous measurement of speech intelligibility was not performed because of time and effort required. Since it is known that visual speech features can significantly improve speech intelligibility, which in turn influences listening effort, this circumstance was taken into account through appropriate references at several points in this paper.


Material and method

Participants

A total of 25 participants took part in the measurements. The first group included 15 normal-hearing (NH) participants aged 19 to 27 years (22.7±2.3 years). Seven of these were male and eight female. As used by Krueger et al. [15], normal hearing was defined by a PTA4 (averaged air conduction thresholds at 500 Hz, 1000 Hz, 2000 Hz and 4000 Hz) of better than 20 dB HL. The averaged PTA4 was 2.0±2.8 dB HL in the right ear and 2.3±3.2 dB HL in the left ear. Two participants already had experience with the ACALES measurement procedure. One subject did not know the OLSA stimuli.

Ten hearing-impaired persons (HI) aged 63 to 76 years (70.1±4.2 years) represented the second group of participants. Six of the participants were male, four female. Their average PTA4 was 43.1±5.3 dB HL in the right ear and 40.3±5.3 dB HL in the left ear. Three of the participants had been fitted with hearing aids, but all measurements were performed without these. Most of the participants were familiar with the OLSA, but none of them had any experience in measuring listening effort.

Stimuli

Sentences from OLSA with a female speaker [19] were used as stimuli in both the acoustic and audiovisual conditions. The sentences of the OLSA use the same order of word categories (name-verb-number-adjective-objective), e.g. “Nina paints ten wet armchairs” (in German: “Nina malt zehn nasse Sessel”). Each word category consists of ten different words [16]. In the audiovisual condition, the corresponding mouth movements of the speaker were simultaneously shown on a screen. The speaker, who had also recorded the female OLSA, was filmed afterwards [20]. She was visible from the shoulder up, in front of a green screen as background. In both conditions, different maskers were additionally presented.

Maskers

The measurements were conducted using three different maskers. Besides the female Olnoise (Olnoise with the spectrum of a female speaker [16], [19]), the International Female Fluctuation Masker (IFFM; [21]) and OLSA sentences (male) [16] were used as an interfering speaker (interferer). The Olnoise is a stationary noise generated by multiple random overlays of OLSA sentences [18]. Thus the averaged long-term spectrum of the Olnoise is identical to that of the stimuli. The IFFM is a variant of the International Speech Test Signal (ISTS, [22]). The pauses were shortened to 250 ms [21]. For the ISTS and IFFM recordings of six female speakers of different native languages were used (American English, Arabic, Mandarin, German, French, and Spanish). Due to the segmentation and mixing of the signals, the masker is mostly unintelligible. The averaged long-term spectrum is equivalent to that of a female speaker [22]. For the interfering speaker, 15 sentences of OLSA from a male speaker [16] were concatenated. The average time interval between the sentences was about 500 ms. Consequently, there was a time gap between the sentences without spoken words.

Equipment

The measurements took place in a soundproof room. The D/A converted signals (sound card ADI-8 PRO by RME; Haimhausen, Munich, Germany) were presented via a loudspeaker (Mackie HR 824; Bothell, Washington, USA) that was placed about 1.3 m in front of the participants. In addition, a touch screen was in front of the participants, on which the scale for the rating of the listening effort was shown. This screen was also used in the audiovisual condition to present the videos. Both the acoustic and the audiovisual stimuli were presented via the VLC media player (VideoLan, version 3.0.3; Verden, Germany). The synchronization of audio and video signal was performed by eye. Listening effort measurements were performed using ACALES [15], which was implemented in MATLAB (version 2007b; Natick, Massachusetts, USA).

Listening effort rating

ACALES [15] was used to measure the subjective listening effort. For each SNR, three random sentences of the OLSA were presented. The level of the target sentences was changed adaptively according to the participants’ responses, whereas the level of the masker was constant at 65 dB SPL. In order to avoid too-loud or too-soft levels, the SNR range was limited to –35 to 25 dB SPL. For the normal-hearing participants, the starting SNR was set at 0 dB, but at 10 dB for the hearing-impaired participants. The subjective listening effort was evaluated on a 14-point scale from “no effort” (1 ESCU) to “only noise” (14 ESCU). In the first phase of ACALES, the SNR values were determined for the limits “no effort” and “extreme effort”. In the second phase, seven different SNR values for the named categories were presented in random order within the previously determined SNR range. After a recalculation of the limits, SNR values for the six unnamed intermediate categories were randomly presented twice in the third phase. In total, there were at least 21 SNR presentations per measurement. Figure 1 [Fig. 1] shows the scales employed.

Since in the audiovisual measurement condition it could not be excluded that the participants were able to follow the mouth image of the speaker, although the signal was acoustically not perceptible, the category “only noise” was renamed to “nothing perceptible”. This category means that the sentences or certain segments of the sentences were not accessible acoustically or visually via the mouth image. The meanings of the categories “noise only” and “nothing perceptible” were explained in detail to the participants before the measurements.

Measurement procedure

The study was approved by the ethics committee in accordance with the ethics application Drs. 47/2017. The measurement procedure is shown schematically in Figure 2 [Fig. 2]. The air-conduction hearing threshold was measured after receiving information and written consent of the participants. Subsequently, the subjective listening effort was measured. The measurements of the hearing-impaired participants were performed without hearing aids. In one session, the listening effort in both conditions (acoustic and audiovisual) was determined for all maskers. In addition to the order of the conditions, the order of the maskers within a condition was randomized. In the first measurement condition, each of the maskers was trained before the actual measurement in order to become familiar with the stimuli, maskers and the measurement procedure. In the second condition, the maskers, measurement procedures and OLSA sentences were already known and only one training session was performed at the start to familiarize participants with the new condition (audiovisual/acoustic). The training was similar to the first measurement phase of ACALES. The stimuli were presented in random order both during training and during the measurement. For ten of the 15 normal-hearing participants, the listening effort in both conditions was again evaluated in a second session and using all maskers. Only one training session per condition was performed during this session, as the measurement procedures, stimuli and masker were assumed to be known. On the first day, four training sessions and six measurements were performed per subject. On the second day, the number of training sessions was reduced to two.

Analysis and statistics

For all participants, listening-effort curves were fitted with the BX fitting method integrated in ACALES [23]. Krueger et al. [1], [15] have already shown that this fitting method generally leads to a valid adaptation of the curves. These curves are linear from 1 ESCU to 7 ESCU and from 7 ESCU to 13 ESCU. The point of intersection of the lines was smoothed between 5 ESCU and 9 ESCU [15]. The listening-effort curves were averaged over all participants per condition (i.e. presentation mode), masker and listening-effort category.

The results were analyzed with SPSS 25.0.0. According to the Kolmogorov-Smirnov test, all data were normally distributed. One normal-hearing participant was excluded from the evaluation as an outlier because the data deviated by more than three times the standard deviation. One of the hearing-impaired participants also had to be excluded because the listening effort was never rated lower than 4 ESCU, independent of the condition and the maskers. Consequently, a valid fitting of the listening effort curve was not possible, since the listening effort of this subject was not represented by the values calculated with the BX-fit.

The statistical tests used were analysis of variance (ANOVA) for repeated measurements with a significance level of α=0.05. As within-subject factors, the named categories (1 ESCU, 3 ESCU, 5 ESCU, 7 ESCU, 9 ESCU, 11 ESCU and 13 ESCU) and the maskers (Olnoise, IFFM, interferer) were always chosen. Depending on the research question, there was an additional within-subject factor condition (acoustic, audiovisual) or time (1st session, 2nd session), or a between-subject factor participant group (normal hearing, hearing-impaired). Based on the linear regressions of the listening effort functions, post-hoc t-tests were performed for the categories 1 ESCU, 7 ESCU and 13 ESCU. Nine t-tests per condition were performed based on the three selected categories and the three maskers. For the same reason, nine t-tests were also performed for each condition, to compare the conditions and the results of the first and second sessions. Therefore, the significance level after a Bonferroni correction was in all cases α=0.05/9=Inline_Fig1.


Results

Listening effort in the acoustic condition

In the normal-hearing participants, a dependence of the rated listening effort on the masker was seen in the acoustic (a) condition (see Figure 3 [Fig. 3], above). A shift of the curves to lower SNR values means less listening effort. At low SNR values, the Olnoise was perceived as most effortful, while the masker was perceived as least effortful. Repeated measures ANOVA confirmed the difference in the ratings of the maskers [Greenhouse Geisser ε=0.682, F(1.364, 17.735)=21.481, p<0.001]. Furthermore, a significant difference between the categories [F(6, 78)=380.456, p<0.001] was observed. An interaction between masker and categories was also found [F(12, 156)=77.175, p<0.001]. Using the Bonferroni correction, t-tests for paired samples between Olnoise and IFFM showed a significant difference for the category “extreme effort”. Olnoise and interferer also showed a significant difference in the category ”moderate effort“ (7 ESCU). At 7 ESCU, IFFM and interferer also differed significantly (all p<0.001). In the category “no effort” (1 ESCU), there was a significant difference between Olnoise and IFFM (p=0.018).

In contrast, the listening-effort curves of the maskers showed little difference in the acoustic condition for the hearing-impaired participants (see Figure 3 [Fig. 3], below). Repeated measures ANOVA showed a significant difference between the categories [F(2.6)=3.292, p<0.001], but no significant difference was found in the masker ratings [F(6.48)=143.908, p=0.063]. A significant interaction between masker and category was also observed [F(12.96)=2.710, p=0.003]. This indicates that the differences in the rating of the maskers depended on the listening-effort category. Differences can be seen at low SNRs, i.e. high listening-effort categories. The lower the SNR range, the higher the SNR differences between the maskers needed to achieve the same listening-effort rating.

Listening effort in the audiovisual condition

In the audiovisual (av) condition, a different rating of the maskers could be seen among normal-hearing participants (see Figure 3 [Fig. 3], above). Overall, the lowest SNR values were achieved by the interferer, so that it was perceived as the least effortful, while the Olnoise was perceived as the most effortful, due to having the highest SNR values. The differences between the maskers increased with increasing listening effort. Between Olnoise and interferer, the difference in SNR at 13 ESCU was about 14 dB. For the IFFM, SNR values that were on average 2.5 dB higher than those of the interfering speaker were required.

A repeated-measures ANOVA confirmed a significantly different rating of the maskers [F(2, 26)=15.774, p<0.001], as well as a significant difference in the categories [F(6, 78)=334.995, p<0.001]. The interaction between masker and categories was also significant [F(12, 156)=25.674, p<0.001]. The t-tests for paired samples showed significant differences between Olnoise and IFFM at 7 ESCU (p=0.004) and 13 ESCU (p<0.001), and between Olnoise and interferer, also in the categories 7 ESCU and 13 ESCU (p<0.001 each), considering the Bonferroni correction. For IFFM and interfering speaker, however, no significant differences were found (p>0.05).

Among the hearing-impaired participants also, the interferer tended to be rated as least effortful (see Figure 3 [Fig. 3], below). In the comparison between IFFM and interferer, the difference up to 7 ESCU was approx. 2 dB, and up to 13 ESCU the difference increased to about 4 dB. No clear difference was visible between Olnoise and IFFM. The ANOVA showed that the ratings of the maskers [F(2.16)=6.393, p=0.009] and the categories [F(6.48)=123.614, p<0.001] differed significantly. Furthermore, a significant interaction between category and masker was found [F(12.96)=8.640, p<0.001]. Post-hoc t-tests with a Bonferroni-corrected significance level showed that the SNR values of the interferer and Olnoise at 13 ESCU (p=0.002) and the interferer and IFFM at 7 ESCU (p=0.004) differed significantly.

Comparison of the conditions

For both normal-hearing and hearing-impaired participants, a shift in listening effort curves of the audiovisual condition relative to the acoustic condition towards lower SNR values was seen (see Figure 3 [Fig. 3]). The audiovisual condition was thus perceived as less effortful than the acoustic condition. For normal-hearing participants, the curves of the conditions for IFFM and interferer are almost parallel. To achieve the same listening effort rating for the acoustic condition, these maskers required SNR values that were, on average, approx. 4 dB higher. For the Olnoise, the difference was about 1 dB up to 5 ESCU and increased to 4 dB at 13 ESCU. The repeated measures ANOVA confirmed a significant difference in the rating of the conditions [F(1, 13)=14.656, p=0.002]. In addition, significant interactions were found between category and masker [F(12, 156)=64.328, p<0.001] and condition and category [F(6.78)=3.630, p<0.003]. The post-hoc t-tests for paired samples (see Table 1 [Tab. 1], considering Bonferroni correction) showed significant differences in conditions for the IFFM at 7 ESCU (p=0.004) and for the interferer at 13 ESCU (p<0.001). In contrast, no significant influence of the mouth image was seen for the Olnoise.

For the hearing-impaired participants, independently of the masker, the difference in conditions increased with increasing listening effort. While the difference of the conditions at 13 ESCU was only 2 dB for the Olnoise, it reached 4 dB for the IFFM and 6 dB for the interferer. However, an ANOVA for repeated measurements found no significant influence of the conditions on the evaluation of the listening effort [F(1.8)=4.760, p=0.061]. However, significant interactions were found, not only between condition and category and masker and category (p<0.001 each), but also between condition, masker and category (p=0.012). Due to the significant interactions with the condition factor, t-tests were performed for paired samples. For all maskers, a significant effect on the additional visual presentation of the mouth image was found at 13 ESCU (each p<0.01 considering Bonferroni correction).

Comparison of the participant groups

Figure 4 [Fig. 4] shows the comparison of the groups of participants. The hearing-impaired participants tended to show similar listening effort ratings to those of normal-hearing participants, independent of condition and masker at higher SNR values. While the curves for the Olnoise were shifted almost constantly by about 5 dB in both conditions, the difference for the other maskers and conditions increased with increasing listening effort up to about 17 dB each at 13 ESCU. The repeated measures ANOVA with the additional between-subject factor participant group (normal-hearing, hearing-impaired) confirmed a significant difference between the participant groups. Furthermore, there were significant interactions between masker and participant group, category and participant group, masker, category and participant group, as well as condition, masker, category and participant group (all p<0.05). The t-tests for independent samples with a Bonferroni-corrected significance level showed that the ratings of normal-hearing and hearing-impaired participants differed significantly at 7 ESCU and 13 ESCU, independent of condition and masker. A significant difference was also seen for the Olnoise in the acoustic condition at 1 ESCU (in each case p<0.05 considering the Bonferroni correction).

Intra- und interindividual standard deviation

The intra- and inter-individual standard deviations were determined for the measurement results of ten normal-hearing participants. For this purpose, the SNR values of the named categories were used for each measurement condition and each masker. The intra-individual standard deviation was calculated from the SNR values of the two measurement sessions for each participant. The results for one listening effort category, one masker and one presentation condition were determined by averaging the intra-individual standard deviations of all participants. For the inter-individual standard deviation, the standard deviation of the SNR values of all participants per measurement session was determined for each condition, each masker and each named category and then averaged over the two measurement sessions. Table 2 [Tab. 2] contains the calculated values. For the intra-individual standard deviation, values were between 1.8 and 2.6 dB and the inter-individual standard deviation ranged from 4.3 to 8.7 dB. For both standard deviations, the values in the audiovisual condition tend to be higher than in the acoustic condition.

Test-retest reliability

Figure 5 [Fig. 5] shows the listening effort curves of the two measurement sessions per condition and masker. On average, almost identical results were obtained in the second session. To analyze the test-retest reliability, a repeated-measures ANOVA with the additional factor time (1st session, 2nd session) was performed. There was no significant difference between the results of the two measurement sessions [F(1.9)=0.205, p=0.662]. Furthermore, no significant interactions with the factor time could be confirmed (all p>0.05).

In addition, the intra-class correlation coefficient (ICC) was examined for each condition and each masker of the named categories (see Table 3 [Tab. 3]). At 0.856, the averaged ICC of the acoustic condition was slightly lower than that of the audiovisual condition (0.915). The lowest values were found in both the acoustic (0.699) and the audiovisual condition (0.717) for the Olnoise, both at 1 ESCU. The highest ICC was achieved for the audiovisual condition in Olnoise at 13 ESCU (0.983).


Discussion

Comparison of the maskers

The results show that the rating of listening effort in both acoustic and audiovisual conditions is dependent on the masker. However, significant differences were only found with increasing listening effort. The Olnoise was perceived to be the most effortful, whereas the interferer was perceived to be the least effortful. One explanation for the different perceptions lies in the temporal and spectral structure of the maskers. The long-term spectrum of the Olnoise is identical to that of the stimuli, so that the degree of masking is maximal [16]. Maskers with the same sex characteristics as the target signal also have a higher masking effect [18]. The IFFM also corresponds to female speech in the long-term spectrum, but fluctuates [21], so it is possible to listen into the temporal gaps of the masker [17]. The interfering speaker, however, is male, so that the masking effect is already reduced. In addition, the pauses of this masker are longer than the gaps in the IFFM. As a result, stimuli and masker are more easily distinguishable for the interfering speaker, so that the subjective listening effort is lowest. Overall, all the factors mentioned above can have an influence on speech intelligibility and, as a result, an influence on the listening effort.

One exception to this are the hearing-impaired particpants in the acoustic condition, where there was no significant difference in the rating of the listening effort between the maskers. One possible explanation is that the temporal and spectral resolution is limited or even impossible due to the hearing loss [17].

Krueger et al. [1], [15] also reported that stationary maskers (Olnoise) are perceived as more effortful than fluctuating ones (IFFM and ISTS). Even with comparable speech intelligibility, differences in the listening-effort rating of the maskers were detectable [1]. In another study, a higher listening effort was found when using an interfering speaker than with fluctuating or stationary noise [24]. In that study by Koelewijn et al. [24] instead of a subjective method, pupillometry was used to determine the objective listening effort. In addition, the interfering speaker was spectrally modified, so that this masker had the same long-term spectrum as the target speaker. Besides the energetic masking, the informational masking is also present in the interfering speaker, which increases the masking effect and makes the interfering speaker more effortful. Fluctuating and stationary noise was therefore less effortful.

Comparison of the acoustic and audiovisual conditions

As expected, the listening effort rating of the acoustic and audiovisual conditions differed significantly in both groups of participants. The use of the mouth image reduced the subjective listening effort. However, the advantage of the mouth image only became apparent at negative SNR values, i.e. at higher listening effort. At higher SNR values, speech intelligibility is high [1], so that it can be assumed that the mouth image provides little additional information to improve speech intelligibility and to reduce listening effort. At lower SNR levels, the mouth image provides additional visual information to supplement and correct misunderstandings [11]. Additionally, visual speech can help to differentiate between target and interfering signals [12]. Both factors probably led to an increase in speech intelligibility. According to the results obtained here, the latter factor seems to have a greater effect on the interfering speaker than on the IFFM.

Llorach et al. [20], who used the same stimuli as in this study, found an improvement of the threshold for speech intelligibility of 80% by about 4.5 dB SNR when the mouth image was presented. Overall, it can be assumed that the improved speech intelligibility reduced the listening effort [1]. Furthermore, the significantly steeper curve for the Olnoise compared to both the IFFM and the interferer indicates a relationship between speech intelligibility and listening effort. This is because the Olnoise also has the steepest function when testing speech intelligibility [1]. For both normal-hearing and hearing-impaired participants, studies that have investigated the influence of noise and reverberation on speech intelligibility and subjective listening effort have also shown a decrease in listening effort with increasing speech intelligibility [25], [26].

Not only can subjective evaluation of listening effort be used, but also objective methods. Sommers and Phelps [14] investigated memory performance in young and older normal-hearing participants. They used word lists of varying lengths in quiet, where the last three words had to be repeated. The speech level was not reported, but it can be speculated that this was 60 dB SPL, as in the speech intelligibility measurement conducted by Sommers and Phelps [14]. An increase in correctly repeated words in the audiovisual condition indicated a decrease in listening effort. In contrast to the young participants, no reduction in listening effort could be detected in the older participants, who had a significantly worse hearing threshold. Picou et al. [9] performed a dual-task paradigm with hearing-impaired participants, both with and without hearing aids. The primary task was to understand monosyllables; the secondary task was the measurement of response time to a visual stimulus. The SNR of the acoustic condition was chosen so that 60% of the words were correctly understood. In this study, an increase in reaction time was interpreted as an indication of increased listening effort. However, independent of the fitting status, no decrease in listening effort was found for audiovisual presentations. Gosselin and Gagné [5] also used a dual-task paradigm to measure listening effort, but only normal-hearing participants were tested. The primary task was word recognition and the secondary task was the recognition of tactile patterns. The SNR was selected to achieve an average speech comprehension of 80% within the respective condition. For the evaluation, the costs for the simultaneous performance of two tasks were calculated separately for both the acoustic and the audiovisual condition, compared to the performance of only one of the two tasks. Due to the increased costs for the audiovisual condition, this condition was interpreted as more effortful. In both groups of participants, the use of the mouth image resulted in an increase in listening effort, with a significantly higher increase in the elderly. From these studies, it can be concluded that especially older participants, who sometimes have a hearing loss, do not always benefit from an audiovisual presentation. This is in line with the current study. In the sections “Comparison of conditions” and “Comparison of participant groups” it was shown that hearing-impaired participants, who were older than normal-hearing ones, had less benefit from the visual information. For the hearing-impaired participants, higher SNR values would be necessary to benefit from the visual presentation to the same degree as the normal-hearing participants.

Intra- und inter-individual standard deviation

The inter-individual standard deviation in the acoustic condition was about three times larger than the intra-individual standard deviation for the same condition. In the audiovisual condition, the deviations differed by a factor of about 4, and the inter-individual standard deviation in the acoustic condition was in agreement with the values of Krueger et al. [15]. Although the participants received the same instructions, the SNR values for the listening-effort categories differed noticeably between the participants. The reason for this can be assumed to be that both the task and the scale with the words “no effort” and “extreme effort” were interpreted differently by the participants. Moreover, the question “How much effort does it require for you to follow the speaker?” was asked to evaluate the listening effort. By measuring the subjective listening effort, it cannot be excluded that perceptions such as loudness or speech intelligibility were indirectly included in the evaluation. Another influencing factor could be that not every participant knew the OLSA. Due to the training effect of OLSA [27], it can be assumed that participants who are familiar with OLSA evaluate differently than participants who are not familiar with it. It is possible that in this case the training carried out was insufficient. Furthermore, the standard deviations of the acoustic condition tended to be lower than in the audiovisual condition. One participant reported that the audiovisual condition was not very realistic, because normally, the level of the speaker decreases as the distance to the listener increases. Although the level of the speaker was changed, the distance to the listener was always the same. It cannot be ruled out that other participants also perceived this as irritating.

The intra-individual standard deviation was between 0.8 dB and 2.8 dB in the acoustic condition, and between 1.7 dB and 3.4 dB in the audiovisual condition, and is thus somewhat lower than in Krueger et al. [15], who reported values between 1.0 dB and 3.8 dB in the acoustic condition.

Test-retest reliability

Independent of the condition and the maskers, no significant difference was found between the results of the two measurement sessions. However, Krueger et al. [15] noted a significant difference in the evaluation of IFFM between the first and second sessions, but not between the second and third sessions. The average ICC of this study was 0.89, and is thus comparable with Krueger et al. [15], which reported a value of 0.9. Values between 0.75 and 0.90 indicate a good test-retest reliability [28].

Comparison of the participant groups

For the hearing-impaired participants to achieve the same listening effort score, significantly higher SNR values were required for categories 7 ESCU and 13 ESCU than for normal-hearing participants. A comparison with the results of the acoustic condition of Krueger et al. [1] also shows a difference of about 3 dB for Olnoise and IFFM at 1 ESCU. At 13 ESCU, the results of Krueger et al. [1] differ. The difference of the group of participants in the IFFM was 5 dB, while in the current study, a difference of almost 17 dB was achieved. While here only three of the nine participants were fitted with hearing aids, the participants of the study Krueger et al. [1] were – without exception – experienced hearing-aid users. It can be assumed that the usual fitting status had an influence on the measurements without hearing aids. Since Krueger et al. [1] found a correlation between PTA4 and the listening-effort categories, it can be assumed that hearing loss also had an influence on the listening-effort rating in the current study. Other studies have also shown a dependence of listening effort on hearing loss. [3], [4], [29]. Bernstein and Auer [30] reported that hearing-impaired people who started to suffer from hearing loss at an early age achieved significantly better results in speech intelligibility with purely visual presentation than those of normal-hearing subjects. Given the relationship between speech intelligibility and listening effort, it is reasonable to assume that these persons also experience less listening effort. Since in both the studies mentioned above and in this paper hearing-impaired persons showed less benefit from visual presentation in terms of listening effort, it can also be assumed that factors such as age and cognitive processes have an influence on listening effort.

Measuring method ACALES

The ACALES measurement procedure was easy to understand and simple to carry out. Participants accepted the procedure well, so that the measurements could be performed without any abnormalities. However, the measurement of a listening effort curve in one condition took about ten to twelve minutes, so that a session with six measurements led to fatigue in some participants. Nevertheless, ACALES can be used for the evaluation of subjective listening effort based on its good test-retest reliability and the possibility of recording individual differences in both the acoustic and audiovisual conditions. Comparative measurements, e.g., across different conditions, fittings, or different days, are thus very possible with this method.


Notes

Competing interests

The authors declare that they have no competing interests.

Acknowledgement

The study was carried out as part of the VIBHear project, which is funded by the European Regional Development Fund (EFRE) and the State of Lower Saxony. Special thanks go to the working group of Volker Hohmann for providing the video and audio material. English language services were provided by http://stels-ol.de/.

Notification

The contents of this contribution were presented at the 22nd DGA Annual Conference from 6–9 March 2019 in Heidelberg.


References

1.
Krueger M, Schulte M, Zokoll MA, Wagener KC, Meis M, Brand T, et al. Relation Between Listening Effort and Speech Intelligibility in Noise. Am J Audiol. 2017;26(3S):378. DOI: 10.1044/2017_AJA-16-0136 External link
2.
McGarrigle R, Munro KJ, Dawes P, Stewart AJ, Moore DR, Barry JG, et al. Listening effort and fatigue: What exactly are we measuring? A British Society of Audiology Cognition in Hearing Special Interest Group 'white paper.' Int J Audiol. 2014;53(7):433-45. DOI: 10.3109/14992027.2014.890296 External link
3.
Luts H, Eneman K, Wouters J, Schulte M, Vormann M, Buechler M, et al. Multicenter evaluation of signal enhancement algorithms for hearing aids. J Acoust Soc Am. 2010;127(3):1491-1505. DOI: 10.1121/1.3299168 External link
4.
Ohlenforst B, Zekveld AA, Lunner T, Wendt D, Naylor G, Wang Y, et al. Impact of stimulus-related factors and hearing impairment on listening effort as indicated by pupil dilation. Hear Res. 2017;351:68-79. DOI: 10.1016/j.heares.2017.05.012 External link
5.
Anderson Gosselin P, Gagné JP. Older adults expend more listening effort than young adults recognizing audiovisual speech in noise. Int J Audiol. 2011;50(11):786-792. DOI: 10.3109/14992027.2011.599870 External link
6.
Anderson Gosselin P, Gagné JP. Older Adults Expend More Listening Effort Than Young Adults Recognizing Speech in Noise. J Speech Lang Hear Res. 2011;54(3):944. DOI: 10.1044/1092-4388(2010/10-0069) External link
7.
Degeest S, Keppler H, Corthals P. The Effect of Age on Listening Effort. J Speech Lang Hear Res. 2015;58(5):1592-1600. DOI: 10.1044/2015_JSLHR-H-14-0288 External link
8.
Sarampalis A, Kalluri S, Edwards B, Hafter E. Objective Measures of Listening Effort: Effects of Background Noise and Noise Reduction. J Speech Lang Hear Res. 2009;52(5):1230. DOI: 10.1044/1092-4388(2009/08-0111) External link
9.
Picou EM, Ricketts TA, Hornsby BWY. How Hearing Aids, Background Noise, and Visual Cues Influence Objective Listening Effort. Ear Hear. 2013;34(5):e52-e64. DOI: 10.1097/AUD.0b013e31827f0431 External link
10.
Volz L. Der Unterricht im Absehen der Sprache. Arch Für Ohren- Nasen- Kehlkopfheilkd. 1954;165(2-6):362-372. DOI: 10.1007/BF02134821 External link
11.
Leonhardt A. Einführung in die Hörgeschädigtenpädagogik: mit 88 Übungsaufgaben und zahlreichen Tabellen. 3., überarbeitete und erweiterte Auflage. München Basel: Reinhardt; 2010.
12.
Bernstein JGW, Grant KW. Auditory and auditory-visual intelligibility of speech in fluctuating maskers for normal-hearing and hearing-impaired listeners. J Acoust Soc Am. 2009;125(5):3358-3372. DOI: 10.1121/1.3110132 External link
13.
Nirme J, Haake M, Lyberg Åhlander V, Brännström J, Sahlén B. A virtual speaker in noisy classroom conditions: supporting or disrupting children's listening comprehension? Logoped Phoniatr Vocol. 2018;44(2):79-86. DOI: 10.1080/14015439.2018.1455894 External link
14.
Sommers MS, Phelps D. Listening Effort in Younger and Older Adults: A Comparison of Auditory-Only and Auditory-Visual Presentations. Ear Hear. 2016;37:62S-68S. DOI: 10.1097/AUD.0000000000000322 External link
15.
Krueger M, Schulte M, Brand T, Holube I. Development of an adaptive scaling method for subjective listening effort. J Acoust Soc Am. 2017;141(6):4680-4693. DOI: 10.1121/1.4986938 External link
16.
Wagener K, Kühnel V, Kollmeier B. Entwicklung und Evaluation eines Satztests für die deutsche Sprache I: Design des Oldenburger Satztests. Z Audiol. 1999;38(1):4-15.
17.
Festen JM, Plomp R. Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing. J Acoust Soc Am. 1990;88(4):1725-1736. DOI: 10.1121/1.400247 External link
18.
Brungart DS. Informational and energetic masking effects in the perception of two simultaneous talkers. J Acoust Soc Am. 2001;109(3):1101-1109. DOI: 10.1121/1.1345696 External link
19.
Wagener KC, Hochmuth S, Ahrlich M, Kollmeier B. Der weibliche Oldenburger Satztest. In: Deutsche Gesellschaft für Audiologie, ed. Abstracts der 17. Jahrestagung der Deutschen Gesellschaft für Audiologie; 2014. Abrufbar unter/Available from: https://www.dga-ev.com/fileadmin/daten/downloads/bisherige_Jahrestagung/dga2014_programm_final.pdf External link
20.
Llorach G, Kirschner F, Grimm G, Zokoll MA, Wagener KC, Hohmann V. Development and Evaluation of Video Recordings for the OLSA Matrix Sentence Test. ArXiv E-Prints. 2019; arXiv:1912.04700
21.
Holube I. Speech intelligibility in fluctuating maskers. Proceedings of the International Symposium on Auditory and Audiological Research. 2011;(3):57-64.
22.
Holube I, Fredelake S, Vlaming M, Kollmeier B. Development and analysis of an International Speech Test Signal (ISTS). Int J Audiol. 2010;49(12):891-903. DOI: 10.3109/14992027.2010.506889 External link
23.
Oetting D, Brand T, Ewert SD. Optimized loudness-function estimation for categorical loudness scaling data. Hear Res. 2014;316:16-27. DOI: 10.1016/j.heares.2014.07.003 External link
24.
Koelewijn T, Zekveld AA, Festen JM, Kramer SE. Pupil Dilation Uncovers Extra Listening Effort in the Presence of a Single-Talker Masker: Ear Hear. 2012;33(2):291-300. DOI: 10.1097/AUD.0b013e3182310019 External link
25.
Rennies J, Schepker H, Holube I, Kollmeier B. Listening effort and speech intelligibility in listening situations affected by noise and reverberation. J Acoust Soc Am. 2014;136(5):2642-2653. DOI: 10.1121/1.4897398 External link
26.
Schepker H, Haeder K, Rennies J, Holube I. Perceived listening effort and speech intelligibility in reverberation and noise for hearing-impaired listeners. Int J Audiol. 2016 Dec;55(12):738-747. DOI: 10.1080/14992027.2016.1219774 External link
27.
Schlueter A, Lemke U, Kollmeier B, Holube I. Normal and Time-Compressed Speech: How Does Learning Affect Speech Recognition Thresholds in Noise?. Trends Hear. 2016;20:1-13. DOI: 10.1177/2331216516669889 External link
28.
Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016;15(2):155-163. DOI: 10.1016/j.jcm.2016.02.012 External link
29.
Mackersie CL, MacPhee IX, Heldt EW. Effects of Hearing Loss on Heart Rate Variability and Skin Conductance Measured During Sentence Recognition in Noise: Ear Hear. 2015;36(1):145-154. DOI: 10.1097/AUD.0000000000000091 External link
30.
Auer ET, Bernstein LE. Enhanced Visual Speech Perception in Individuals With Early-Onset Hearing Impairment. J Speech Lang Hear Res. 2007 Oct;50(5):1157-1165. DOI: 10.1044/1092-4388(2007/080) External link