gms | German Medical Science

24. Jahrestagung der Deutschen Gesellschaft für Audiologie

Deutsche Gesellschaft für Audiologie e. V.

14.09. - 17.09.2022, Erfurt

Speech Intelligibility Prediction for Hearing-Impaired Listeners with the bBSIM-STI model

Meeting Abstract

  • presenting/speaker Saskia Röttges - Carl von Ossietzky Universität Oldenburg, Medizinische Physik, Oldenburg, DE
  • Jana Roßbach - Carl von Ossietzky Universität, Oldenburg, DE
  • Christopher Hauth - Carl von Ossietzky Universität, Oldenburg, DE
  • Thomas Biberger - Carl von Ossietzky Universität, Oldenburg, DE
  • Bernd T. Meyer - Carl von Ossietzky Universität, Oldenburg, DE
  • Jan Rennies - Fraunhofer-Institut für Digitale Medientechnologie IDMT, Oldenburg, DE
  • Rainer Huber - Fraunhofer IDMT, Oldenburg, DE
  • Thomas Brand - Universität Oldenburg, Oldenburg, DE

Deutsche Gesellschaft für Audiologie e.V.. 24. Jahrestagung der Deutschen Gesellschaft für Audiologie. Erfurt, 14.-17.09.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. Doc190

doi: 10.3205/22dga190, urn:nbn:de:0183-22dga1904

Veröffentlicht: 12. September 2022

© 2022 Röttges et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Normal-hearing (NH) listeners can benefit from spatially separated target speaker and masker sources compared to situations, where the target is spatially co-located with the masker. Such a benefit is generally reduced in reverberant situations compared to anechoic situations as binaural cues are impaired by reverberation, which also reduces temporal modulation of the target signal and reduces the chance to listen into the dips of fluctuating masker signals. It is known that hearing-impaired (HI) listeners are less capable of taking advantage when the target is spatially separated from the masker than NH listeners, which might be explained by an impaired representation of binaural cues and a reduced audibility of the target signal in the better ear. As part of the first Clarity Prediction Challenge (CPC1) [1], this study predicted the speech intelligibility of hearing-impaired subjects.

Method: The provided data basis of the CPC1 was used, which contains audio signals, characteristics of the HI listeners, and the speech intelligibility scores from listening test (correct response rates given for single sentences). The listeners' task in the listening test was to repeat the words that were understood in the presented test signal, which varied in the acoustic scene and also in speech enhancement algorithm. The different acoustic scenes were generated by convolving the audio signals with binaural room impulse responses (BRIRs). The target speech was masked by continuous noise as interferer. Each acoustic scene consisted of a unique target utterance and a unique interferer segment, which were mixed together. These signals and the audiograms of the HI listeners were processed by hearing aid algorithms and subsequently used for the listening tests.

A hybrid model was used for the predictions, using a blind equalization-cancellation model as binaural front-end. The model presented here (bBSIM-STI) is very similar to the basic CPC1 non-blind model. Based on the individual pure tone audiograms, two internal threshold-simulating noises are added to the left and right input signals to simulate the hearing loss. The back-end of the model employed in this study is a specific version of the STI using a correlation method, which receives bBSIM's output signals of the clean target speech and degraded speech as input.

Results: The bBSIM-STI model is very similar to the baseline model MBSTOI [2] and produces a slightly lower RMSE and a slightly higher correlation than the baseline model. The improved prediction accuracy is probably due to small differences in the back-ends. In our back-end an SNR is derived from the correlation values, which is then limited to -15 to 15 dB which reduces the frequency of outliers in the predictions. Apart from this limitation and somewhat longer time frames for the short-term analysis, our back-end is virtually identical to the back-end of the baseline model.


References

1.
Graetzer S, Barker J, Cox TJ, Akeroyd M, Culling JF, Naylor G, Porter E, Muñoz RV. Clarity-2021 Challenges: Machine Learning Challenges for Advancing Hearing Aid Processing. Proc. Interspeech. 2021;:686-690. DOI: 10.21437/Interspeech.2021-1574 Externer Link
2.
Andersen AH, Haan JM de, Tan ZH, Jensen J. Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions. Speech Communication. 2018; 102:1–13. DOI: 10.1016/j.specom.2018.06.001 Externer Link
3.
Hauth CF, Berning SC, Kollmeier B, Brand T. Modeling Binaural Unmasking of Speech Using a Blind Binaural Processing Stage. Trends Hear. 2020 Jan-Dec;24:2331216520975630. DOI: 10.1177/2331216520975630 Externer Link