gms | German Medical Science

GMS Zeitschrift für Audiologie — Audiological Acoustics

Deutsche Gesellschaft für Audiologie (DGA)

ISSN 2628-9083

Armenian numerals test for speech recognition threshold measurement in quiet: evaluation and generation of reference data

Armenischer Zahlentest zur Messung der Sprachverstehensschwelle in Ruhe: Evaluation und Generierung von Referenzdaten

Research Article

Search Medline for

  • Sona Sargsyan - Department of Otorhinolaryngology, Yerevan State Medical University after M. Heratsi, Yerevan, Armenia; Department of Otolaryngology, Head & Neck Surgery, University Medicine Halle, Martin Luther University Halle-Wittenberg Halle (Saale), Germany
  • corresponding author Torsten Rahne - Department of Otolaryngology, Head & Neck Surgery, University Medicine Halle, Martin Luther University Halle-Wittenberg Halle (Saale), Germany

GMS Z Audiol (Audiol Acoust) 2022;4:Doc08

doi: 10.3205/zaud000026, urn:nbn:de:0183-zaud0000268

Published: December 13, 2022

© 2022 Sargsyan et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Abstract

Introduction: The aim of the study was to evaluate a recently developed Armenian speech audiometric test. It consists of twenty test lists, each containing 20 phonemically balanced, familiar, and homogeneous Armenian multisyllabic numbers. Reference thresholds for speech recognition in quiet (SRTs) for native Armenian speakers were determined.

Materials and methods: Digitally recorded Armenian speech material was evaluated by 25 native Armenian speakers with normal hearing. Individual speech discrimination functions were measured for all 20 lists. A logistic function was fitted to the individual speech discrimination functions and the averaged results. The sound pressure level at the inflection point, i.e., the level at 50% speech intelligibility, was defined as SRT.

Results: The mean SRT across all test lists and subjects was 19.3 dB SPL. The measured individual SRTs varied between subjects in a range of 7.3 dB. Very steep slopes of the individual and averaged speech intelligibility functions were observed, ranging from 16 to 29 %/dB. SRTs and slopes did not differ significantly between test lists.

Conclusion: The homogeneity of the test lists and thus of the speech test was demonstrated. The measured SRT can be used as reference data for further application in routine clinical measurements and thus improve the validity of clinical procedures for native Armenian speakers.

Keywords: speech audiometry, numerals, speech recognition threshold, Armenian language, test evaluation

Zusammenfassung

Einleitung: Ziel dieser Untersuchung war es, einen kürzlich entwickelten armenischen sprachaudiometrischen Test zu evaluieren. Dieser besteht aus zwanzig Testlisten mit je 20 phonemisch ausgewogenen mehrsilbigen armenischen Zahlen. Referenzwerte für die Sprachverstehensschwelle (SVS) wurden für armenische Muttersprachler bestimmt.

Methoden: Das digital aufgezeichnete armenische Sprachmaterial wurde 25 normalhörenden armenischen Muttersprachlern präsentiert. Für alle 20 Listen wurden die individuellen Sprachdiskriminationsfunktionen gemessen. An diese und die über die Testlisten und Probanden gemittelten Ergebnisse wurden logistische Funktionen angepasst. Als SVS wurde der Schalldruckpegel am Wendepunkt definiert, also der Schalldruckpegel bei einer Sprachverständlichkeit von 50%.

Ergebnisse: Die SVS über alle Testlisten und Probanden hinweg betrug 19,3 dB SPL. Die individuelle SVS variierte zwischen den Probanden in einem Bereich von 7,3 dB. Im Wendepunkt wurden sehr steile Anstiege der individuellen und gemittelten Sprachdiskriminationsfunktionen im Bereich von 16 bis 29 %/dB beobachtet. Die SVS und Steigungen bei der SVS unterschieden sich zwischen den Testlisten nicht signifikant.

Schlussfolgerung: Die Homogenität der Testlisten und damit des Sprachverständlichkeitstest konnten gezeigt werden. Die gemessenen SVS können als Referenzdaten für die weitere Anwendung in klinischen Routinemessungen verwendet werden und somit die Validität der audiometrischen Testverfahren in armenischer Sprache verbessern.

Schlüsselwörter: Sprachaudiometrie, Zahlentest, Sprachverstehensschwelle (SVS), armenische Sprache, Evaluation


Introduction

Daily life communication and thus participation in social life is based on good speech recognition. In hearing impaired people hearing aids or implants aim on improving speech perception. Therefore, speech audiometric tests play an important role in the assessment of hearing abilities and communication function and are an international standard method. Together with pure-tone audiometry, the degree and type of hearing loss can be diagnosed [1], [2] and audiological rehabilitation management is facilitated [3]. Since the early work of Harvey Fletcher [4], Raymond Carhart [5], Arthur Boothroyd [6] and others, speech audiometry has gained acceptance and is routinely used in a variety of application cases [7].

One measure is the speech recognition threshold (SRT), which is the minimum sound pressure level [8] at which a person can recognize 50% of the words spoken [9] either in quiet environment or in noise. The use of appropriate speech material is crucial to ensure proper metric characteristics of speech audiometry, i.e., satisfactory reliability, validity and diagnostic sensitivity [10], [11]. Different kinds of speech materials have been developed in English and used in clinics over a long period, such us the PB-50 word lists [12], AB word lists [13], W-22 [12] and so on. In German speaking countries, the Freiburg speech intelligibility test introduced by Hahlbrock in 1953 [14], [15] is still a routinely applied speech audiometric test and is considered as reliable standard for many applications [16], [17]. The test material consists of 10 groups of 10 two-digit multisyllabic numbers and 20 groups of 20 monosyllabic nouns. Normal hearing people reach a percentage of 50% recognition of numbers at a level of 18.5 dB SPL on the average, the slope at this inflection point of the speech intelligibility function amounts to about 8 %/dB [18].

In this research we focus on the Armenian language. Recently, a balanced, familiar and homogeneous speech material was developed for the first time as a speech recognition assessment tool, with sufficient validity, reliability and sensitivity [19]. Therefore, this study aimed to evaluate this speech materials in Armenian, in order to facilitate wider use of speech audiometry in Armenia.

The developed speech material consists of twenty test lists of twenty Armenian numerals from 10 to 100 with 2–4 syllables with equal distribution of the numerals. Since not all the phonemes of the Armenian language are represented in the numerals and thus in the 74 selected test items, the comparison of the phoneme distribution of the test lists with that of the Armenian language showed deviations. However, the comparison between the phoneme distributions shows that the test lists represent the language corpus quite well [19]. The phoneme distributions of each single test list correlated significantly and positively with that of the general sample, i.e., the selected numbers and the level of all test items is balanced to achieve comparable SRTs [19]. The test items and the calibration signal were stored as mono signals to an audio compact disc to be used with standard clinical audiometers.

In this paper we focus on evaluating the test material to ensure homogeneity and to generate reference data for normal hearing Armenian speaking listeners. To measure homogeneity of the intelligibility across the test lists an independent set of normal-hearing native speakers was assessed. The speech intelligibility functions derived from that population will be defined as reference data.


Methods

Twenty-five normal-hearing volunteers (twenty-one female and four male), mostly undergraduate and postgraduate students of Yerevan State Medical University with Armenian as their native language were asked for their participation into the study. The subjects’ age ranged from 21 to 35 years (mean=23.6 years). All participants signed an informed consent form. The studies has been approved by the institutional ethics committee and has been performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. Medical histories were unremarkable for otologic or hearing disorders. Before testing otoscopy and tympanometry were performed. A standard clinical procedure based on the ASHA guidelines [20] was used to determine bilateral pure-tone air conduction thresholds for each participant at octave frequencies from 250 to 8,000 Hz with a level resolution of 5 dB. For all included participants, pure-tone thresholds were at 10 dB HL or better at frequencies 0.5, 1, 2 and 4 kHz and 15 dB HL or better at 0.25 and 8 kHz respectively.

All twenty lists of Armenian numbers as developed by Sargsyan and Rahne [19] were used as acoustic stimuli as stored in a repository [21]. Briefly, to create a homogeneous multisyllabic speech corpus, Armenian numerals from 10 to 100 with 2–4 syllables were selected as general sample. Twenty preliminary test lists were created and after adjusting for equal phonetic distribution between the lists, 20 phonemically homogeneous test lists (each consisting of 20 numerals or 20 test items) were manually defined. As reference recording, the items of the final test lists were recorded three times by a trained female native Armenian speaker with clear pronunciation, a speed of 90–100 syllables/minute, as well as neutral emotion and effort in the “Multimedia Kentron TV” broadcast studios (Yerevan, Armenia) and preprocessed in the Audiology Research Department of the University Medicine Halle (Saale), Germany [19].

The test signals were generated by a compact disc player, attenuated by an MA31 clinical audiometer (Präcitronic, Dresden, Germany) and presented through TDH39 audiometric headphones (Telephonics, Huntington, NY, USA). The audiometer was calibrated according to normative requirements by an authorized service (notified body) using an artificial ear according to IEC60318-1. A CCITT (Comité Consultatif International Téléphonique et Télégraphique) noise signal [19], [22] was used, and free field equalization was applied. An artificial ear according to IEC60318-1 and a sound level meter with a flat frequency response between 10 Hz and 20 kHz (“Z”-mode) were used to measure the actual sound pressure level. The level difference between the ears was below 0.25 dB.

DIN EN ISO standards [23] for speech audiometry were followed to prepare and instruct the participants that were unfamiliar with the presented stimuli or the measurement of an SRT. The subjective better or preferred ear (18 right ears, 7 left ears) was used for the monaural presentation of the test items. Participants were familiarized with the stimuli by listening to several stimuli at a comfortable level. The test items within each test list were presented in 5-seconds intervals in a fixed order. Participants had to repeat what they had heard and were instructed and encouraged to guess when in doubt. The initial testing began with presentation of a subset of 8 test items at 40 dB SPL and intensity was then reduced in steps of 5 dB SPL until sets with less around 50% correct repetitions were obtained. This level was used as starting level for the participant’s SRT measurements with the presentation of all 20 test lists in a randomized order (1st run).

After completing the measurements at the starting sound pressure level the next levels were set to the first level increased and finally decreased by 5 dB (2nd and 3rd run). At these levels, the testing procedure was repeated with all test lists.

The responses were registered by one investigator (author SS). Several rests were allowed during the session. Percentages of correct responses were measured for every test list, sound pressure level and subject. A logistic function was fitted to the individual and averaged data. The sound pressure level at the inflection point, i.e., the level at a speech intelligibility of 50% was defined as SRT. All computations and the statistical analysis were done with MATLAB software (Version 2019a, Mathworks, Natick, USA).


Results

The pure-tone thresholds of the included subjects were within the inclusion criteria and ranged from 0 to 10 dB HL. The mean threshold over the frequencies of 0.5, 1, 2, and 4 kHz (4PTA) was 5.05 dB HL (SD: 1.8 dB). Normal tympanometric results were observed for all included subjects.

All subjects could complete the testing procedure and the individual discrimination function could be fitted to the data. The goodness of fits (R2) ranged from 0.90 to 1.00 with a mean of 0.99 (SD: 0.01). Figure 1A [Fig. 1] shows the individual SRTs averaged across all test lists. The SRTs ranged from 15.7 to 23.0 dB SPL with a mean SRT of 19.3 dB SPL (SD: 1.8 dB). Results of the Pearson correlation indicated that there was a significant positive association between 4PTA and SRT (r(25)= 0.44, p=0.03).

Figure 1B [Fig. 1] shows the discrimination function for all test lists averaged across all participants. A sigmoid regression was calculated to fit the proportion of correctly recognized words based on the speech level and to calculate the SRT. Figure 2 [Fig. 2] shows the SRTs for all test lists as average across the participants. It ranged from 18.9 to 19.6 dB SPL. The mean SRT over all test lists was 19.3 dB SPL (SD: 0.2 dB). The SRTs were not normally distributed. A Friedman test was carried out to compare the mean SRTs for the 20 test lists. After Bonferroni correction no significant differences between the test lists were found. For the SRTs the 5th percentile was 18.9 dB SPL, the 95th percentile was 19.6 dB SPL. The SRT difference between the test lists with the maximum and minimum SRT was 0.78 dB. The SRT distribution had a standard deviation of 0.2 dB. The maximum difference from the mean SRT was 0.43 dB; the squared-root mean of the differences was 0.42 dB.

After averaging all subjects and test lists, a normative discrimination function was fitted to the global mean and also displayed in Figure 1B [Fig. 1]. A sigmoid regression was calculated to predict the grouped proportion of correctly recognized words based on the speech level. A significant regression equation was found with an R2 of 0.998. Based on this regression the global SRT was defined by the inflection point at 19.3 dB SPL. Table 1 [Tab. 1] shows the reference speech discrimination function as required by DIN EN ISO 8253-3 [23].

Figure 3 [Fig. 3] shows the distribution of slopes at the inflection point across the 20 test lists averaged over all participants. It ranged from 16 to 29 %/dB. The mean slope over all test lists and participants was 22.6 %/dB (SD: 3.4 %/dB). The slopes were not normally distributed. A Friedman test was carried out to compare the mean slope for the 20 test lists. After Bonferroni correction no significant differences between the test lists were found.


Discussion

We measured the SRTs of 25 subjects which were diagnosed as otologically normal. Our results show that all included normal hearing subjects could perform the test procedure. The measured speech discrimination could very well be fitted by a logistic function. Thus, the results can be used to be plotted as interpolated normative function. The respective reference values are summarized in Table 1 [Tab. 1].

The measured individual SRTs varied between the subjects within a range of 7.3 dB. The 4PTA of the included patients ranged within limits of 1.25 dB HL and 7.5 dB HL and correlated significantly to the measured SRT. Therefore, the SRT should also vary within a range of about 6 dB in the included cohort which explains a major portion of the SRT variance.

The SRT across all test lists and subjects was 19.3 dB SPL. This is a comparable level to that for German multisyllables (18.4 dB SPL) but lower than that for the German monosyllables (29.3 dB SPL) of the Freiburger speech test [18]. It is within the range of the SRT for English spondaic words (10–28 dB SPL) [24]. For other languages, e.g., Japanese trisyllabic words [25] or Taiwanese Mandarin [10], however, no direct comparison between the SRT is possible since those thresholds are reported only relative to the hearing threshold for speech (hearing level for speech).

The slope at the inflection point (SRT) obtained in our study ranged from 16 to 29 %/dB which is more comparable to that for German sentence tests, e.g., the Göttinger Satztest (11 %/dB) [23] or the Oldenburger Satztest (17.1 %/dB) [26]. Slopes of speech intelligibility functions of comparable tests are lower, i.g., for German monosyllables (8 %/dB) [18], spondaic words in English (7.2–10.0 %/dB) [24], [27], [28], [29], Spanish words (9.7–11.1 %/dB) [30], or Japanese trisyllabic words (8.7–10.3 %/dB) [25]. At the inflection point, also lower slopes were reported in standard Mandarin or Pŭtōnguhà (11.3–12.1 %/dB) [31], Taiwanese Mandarin (11.3–11.7%/dB) [32], Polish bisyllabic words (9.8–10.1 %/dB) [33], or Korean bisyllabic words (10.4–11.9 %/dB) [34].

The reason for the very steep slope of the test measured in the sample studied remains unclear. However, since steep slopes are necessary for precise measurement of discrimination function and thus speech reception thresholds, the Armenian material used can be used as a precise diagnostic tool for measuring the speech perception threshold.

The Armenian numbers test includes 20 test lists with 20 test items each. Our results show that across all participants the SRT difference between the test lists is very small (SD: 0.2 dB). Since the SRT difference between the best and the worst test list is 0.78 dB and the maximum difference from the mean over all test lists is 0.43 dB the test lists can be regarded as perceptually balanced. The variance is below the limits of audiometric precision. Thus, a random selection of the test list can be used for the SRT measurement. Test-retest reliability was not measured in this study. However, since the results show no significant SRT difference between the test lists, the different test lists can be considered as perceptually identical. Because the SRT differences between the test lists amount to maximum of 0.78 dB, the intra-session test-retest reliability can consequently be considered to be very high. Also due to the very narrow and precise SRT distribution, the test appears to be very reliable.

To derive normative data for the Armenian numbers test we proved this prerequisite by measuring the pure-tone threshold and included only normal hearing subjects resulting in a very narrow variance of the SRT. Therefore we claim this to be the reference for clinical routine measurement.


Conclusions

This study evaluated a previously developed speech material, the Armenian number test, for measuring the speech recognition threshold in quiet. The results have significant practical implication for the health system in Armenia because they address issues related to the validity of clinical procedures provided to native Armenian speaking individuals.

This study for the first time applied the newly developed Armenian numbers test in quiet to a normal cohort of native Armenian speaking people. The results can be applied as reference data for the further application in clinical routine measurements.


Notes

Competing interests

The authors declare that they have no competing interests.

Funding

This study was supported by grant research from KAAD (Catholic Academic Exchange Service), awarded to the first author.

The funders had no involvement in the research.

Acknowledments

We would like to thank all volunteers for their time and willingness to cooperate throughout all aspects of the study. We are grateful for the intellectual and material support provided by Prof. Armen Muradyan and Prof. Stefan Plontke.

ORCID

The ORCID of Sona Sargsyan is: 0000-0003-0158-8961

The ORCID of Torsten Rahne is: 0000-0003-1859-5623


References

1.
Katz J, Chasin M, English KM, Hood LJ, Tillery KL, editors. Handbook of clinical audiology. Seventh edition. Philadelphia: Wolters Kluwer Health; 2015.
2.
British Society of Audiology. Practice Guidance: An overview of current management of auditory processing disorder (APD). Berkshire; 2011.
3.
Gelfand SA. Essentials of audiology. Fourth edition. New York: Thieme; 2016. DOI: 10.1055/b-006-161125 External link
4.
Fletcher H. Speech and hearing. New York, USA: Van Nostrand; 1929.
5.
Carhart R. Basic principles of speech audiometry. Acta Otolaryngol. 1951;40(1-2):62-71. DOI: 10.3109/00016485109138908 External link
6.
Boothroyd A. Statistical theory of the speech discrimination score. J Acoust Soc Am. 1968 Feb;43(2):362-7. DOI: 10.1121/1.1910787 External link
7.
Saunders E, editor. Tele-Audiology and the Optimization of Hearing Healthcare Delivery. Hershey, PA: IGI Global; 2019. DOI: 10.4018/978-1-5225-8191-8 External link
8.
Acoustical Society of America (ASA). ANSI/ASA S3.6-2010: Specification for Audiometers. Melville, NY: ASA; 2010.
9.
American Speech-Language-Hearing Association. Guidelines for determining the threshold level for speech. ASHA. 1988;30:85-9.
10.
Nissen SL, Harris RW, Slade KB. Development of speech reception threshold materials for speakers of Taiwan Mandarin. Int J Audiol. 2007 Aug;46(8):449-58. DOI: 10.1080/14992020701361296 External link
11.
Tsai KS, Tseng LH, Wu CJ, Young ST. Development of a mandarin monosyllable recognition test. Ear Hear. 2009 Feb;30(1):90-9. DOI: 10.1097/AUD.0b013e31818f28a6 External link
12.
Wilson RH, McArdle R, Roberts H. A comparison of recognition performances in speech-spectrum noise by listeners with normal hearing on PB-50, CID W-22, NU-6, W-1 spondaic words, and monosyllabic digits spoken by the same speaker. J Am Acad Audiol. 2008 Jun;19(6):496-506. DOI: 10.3766/jaaa.19.6.5 External link
13.
Myles AJ. The clinical use of Arthur Boothroyd (AB) word lists in Australia: exploring evidence-based practice. Int J Audiol. 2017 Nov;56(11):870-5. DOI: 10.1080/14992027.2017.1327123 External link
14.
Hahlbrock KH. Uber Sprachaudiometrie und neue Wörterteste [Speech audiometry and new word-tests]. Arch Ohren Nasen Kehlkopfheilkd. 1953;162(5):394-431.
15.
Hahlbrock KH. Sprachaudiometrie. Stuttgart: Thieme; 1970.
16.
Hoth S. Der Freiburger Sprachtest : Eine Säule der Sprachaudiometrie im deutschsprachigen Raum [The Freiburg speech intelligibility test : A pillar of speech audiometry in German-speaking countries]. HNO. 2016 Aug;64(8):540-8. DOI: 10.1007/s00106-016-0150-x External link
17.
Baljić I, Hoppe U. Der Freiburger Einsilbertest auf dem Prüfstand [The Freiburg monosyllabic test put to the test]. HNO. 2016 Aug;64(8):538-9. DOI: 10.1007/s00106-016-0208-9 External link
18.
Kollmeier B, Lenarz T, Winkler A, Zokoll MA, Sukowski H, Brand T, Wagener KC. Hörgeräteindikation und -überprüfung nach modernen Verfahren der Sprachaudiometrie im Deutschen [Indication for and verification of hearing aid benefit using modern methods of speech audiometry in German]. HNO. 2011 Oct;59(10):1012-21. DOI: 10.1007/s00106-011-2345-5 External link
19.
Sargsyan S, Rahne T. Development Of Speech Material For An Armenian Speech Recognition Threshold Test. Russ Open Med J. 2021;10(3). DOI: 10.15275/rusomj.2021.0321 External link
20.
Guidelines for manual pure-tone threshold audiometry. ASHA. 1978 Apr;20(4):297-301.
21.
Rahne T. Armenian numerals test for speech recognition threshold measurement in quiet. 2022. figshare. Collection. DOI: 10.6084/m9.figshare.c.6192664.v1 External link
22.
Technical Committee ISO/TC 43 Acoustics. ISO 8253-1:2010 Acoustics — Audiometric test methods — Part 1: Pure-tone air and bone conduction audiometry. 2010. p. 29.
23.
Technical Committee ISO/TC 43 Acoustics. ISO 8253-3:2012 Acoustics — Audiometric test methods — Part 3: Speech audiometry. 2012. p. 31.
24.
Young LL Jr, Dudley B, Gunter MB. Thresholds and psychometric functions of the individual spondaic words. J Speech Hear Res. 1982 Dec;25(4):586-93. DOI: 10.1044/jshr.2504.586 External link
25.
Mangum TC. Performance intensity functions for digitally recorded Japanese speech audiometry materials. Theses and Dissertations. 2005. 616. Available from: https://scholarsarchive.byu.edu/etd/616 External link
26.
Wagener KC, Kühnel V, Kollmeier B. Entwicklung und Evaluation eines Satztests für die deutsche Sprache III: Evaluation des Oldenburger Satztests. Z Audiol. 1999;38:86-95.
27.
Hudgins CV, Hawkins JE. The development of recorded auditory tests for measuring hearing loss for speech. Laryngoscope. 1947 Jan;57(1):57-89.
28.
Wilson RH, Strouse A. Psychometrically equivalent spondaic words spoken by a female speaker. J Speech Lang Hear Res. 1999 Dec;42(6):1336-46. DOI: 10.1044/jslhr.4206.1336 External link
29.
Hirsh IJ, Davis H, Silverman SR, Reynolds EG, Eldert E, Benson RW. Development of materials for speech audiometry. J Speech Hear Disord. 1952 Sep;17(3):321-37. DOI: 10.1044/jshd.1703.321 External link
30.
Christensen LK. Performance intensity functions for digitally recorded Spanish speech audiometry [Master Thesis]. Provo, UT: Brigham Young University; 1995.
31.
Nissen SL, Harris RW, Jennings LJ, Eggett DL, Buck H. Psychometrically equivalent trisyllabic words for speech reception threshold testing in Mandarin. Int J Audiol. 2005 Jul;44(7):391-9. DOI: 10.1080/14992020500147672 External link
32.
Slade KB. Speech Reception Threshold Materials for Taiwan Mandarin. Theses and Dissertations. 2006. 522. Available from: https://scholarsarchive.byu.edu/etd/522 External link
33.
Harris RW, Nielson WS, McPherson DL, Skarzynski H, Eggett DL. Psychometrically equivalent Polish bisyllabic words spoken by male and female talkers. Audiofonologia. 2004;25:1-15.
34.
Harris RW, Kim E, Eggett DL. Psychometrically Equivalent Korean Bisyllabic Words Spoken by Male and Female Talkers. Korean J Commun Sci Disord. 2003;8(1):244-70.
35.
Brand T, Kollmeier B. Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests. J Acoust Soc Am. 2002 Jun;111(6):2801-10. DOI: 10.1121/1.1479152 External link