gms | German Medical Science

22. Jahrestagung der Deutschen Gesellschaft für Audiologie

Deutsche Gesellschaft für Audiologie e. V.

06.03. - 09.03.2019, Heidelberg

Optimisation of the gaze-based attention model

Meeting Abstract

Search Medline for

  • presenting/speaker Frederike Kirschner - Carl von Ossietzky Universität Oldenburg, Medizinische Physik und Cluster of Excellence „Hearing4all“, Oldenburg, Deutschland
  • Giso Grimm - Carl von Ossietzky Universität Oldenburg, Medizinische Physik und Cluster of Excellence „Hearing4all“, Oldenburg, Deutschland
  • Volker Hohmann - Carl von Ossietzky Universität Oldenburg, Medizinische Physik und Cluster of Excellence „Hearing4all“, Oldenburg, Deutschland

Deutsche Gesellschaft für Audiologie e.V.. 22. Jahrestagung der Deutschen Gesellschaft für Audiologie. Heidelberg, 06.-09.03.2019. Düsseldorf: German Medical Science GMS Publishing House; 2019. Doc147

doi: 10.3205/19dga147, urn:nbn:de:0183-19dga1475

Published: November 28, 2019

© 2019 Kirschner et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Background: Acoustic conversation in difficult conditions such as cocktail party situations is still a problem. State of the art hearing aid algorithms use different techniques like beamforming to enhance speech. Nevertheless, they currently cannot estimate the attended sound source, i.e., the source that is currently in the focus of attention of the user, and thus may not give optimal performance in daily-life dynamic communication conditions. A model that identifies the attended source(s) may help to overcome this problem and to optimise the steering and control of speech enhancement algorithms. In a recent study [1], a gaze-based attention model that uses the gaze direction and the head orientation to estimate the attended sound source was proposed. This study investigates the potential of such method for detecting the attended and unattended sources in multi-source audiovisual environments.

Methods: Gaze direction and head orientation were measured using electrooculography (EOG) and a head tracking system in virtual audiovisual scenes in 6 normal-hearing subjects. The attention model combines gaze information with sound source locations using a naïve Bayesian classification and temporal analysis to estimate the subject's focus of attention. In this study, optimal a-priori knowledge on source locations was used to assess the performance of the gaze-based estimation and to optimise its parameters. In an instrumental test, the optimal model parameters that resulted in the best signal-to-noise ratio (SNR) were determined.

Results: Results indicate that SNR improvements of up to 7 dB can be achieved using optimal model parameters.

Conclusions: The benefits from steering speech enhancement by gaze information found in this instrumental test are promising. The approach thus needs to be further investigated by testing subject performance in audiovisual scenes and with relevant communication tasks.


References

1.
Grimm G, Kayser H, Hendrikse M, Hohmann V. A gaze-based attention model for spatially-aware hearing aids. In: Informationstechnische Gesellschaft im VDE (ITG), Hrsg. ITG-Fb. 282: Speech Communication. 13. ITG-Fachtagung Sprachkommunikation 10.-12. Oktober 2018 in Oldenburg. Berlin: VDE ITG; 2018. p. 231-5.