gms | German Medical Science

GMS Zeitschrift für Audiologie — Audiological Acoustics

Deutsche Gesellschaft für Audiologie (DGA)

ISSN 2628-9083

Performance optimized signal processing in objective audiometry – Digital tools for the efficient measurement and use of AEP and OAE

Review Article

Search Medline for

  • corresponding author Sebastian Hoth - GMS Zeitschrift für Audiologie — Audiological Acoustics, Schriftleitung, Heidelberg, Deutschland

GMS Z Audiol (Audiol Acoust) 2023;5:Doc04

doi: 10.3205/zaud000030, urn:nbn:de:0183-zaud0000300

Published: March 28, 2023

© 2023 Hoth.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at


The objective methods of audiometry are based on the registration of noisy signals of small amplitude which are contaminated with interferences of diverse and variable sources. The reconstruction of the target signal is performed by selection and averaging of many signal epochs. Beyond these basic digital tools of signal processing, whose practical value is well-proven, other procedures which have never been established in commercial devices are potentially suitable to further improve the signal quality and the reliability of its detection. They are subject of this paper, as well as several new approaches, some of which have already been proven as useful in clinical practice while others are still awaiting their practical application. By implementation of these procedures, the precision and reliability of the diagnostic conclusions derived from otoacoustic emissions (OAE) and auditory evoked potentials (AEP) can be enhanced substantially.

This review aims to exploit the potential of signal processing to the highest possible extent. Its main topics are a digital filter optimized for the detection of AEPs, the extension of averaging to the polarity of the signal, the extraction of parameters qualified as measures for quality and significance, and the time-differential analysis of correlation. Furthermore, the digital superposition of several independent recordings, the use of the amplitude distribution density and an algorithm developed to reduce the impact of residual noise on the response threshold are described for the first time.

Keywords: objective audiometry, auditory evoked potentials, otoacoustic emissions, response detection, residual noise, signal statistics


Dedicated to my mentor Dr. Michael Berg who introduced me to audiology.

Objective audiometry is based on the measurement of signals which are generally superimposed by unavoidable interfering influences of different origin. Since the beginnings of electrical response audiometry (ERA), signals of biological origin have been detected by measuring them in multiple copies and processing the registered signal epochs to an average value already during data acquisition. Signal averaging reduces the influence of the interfering signals of biological and non-biological origin, because the amplitude of the physiological response increases linearly with the number of summations, whereas the amplitude of the noise increases only in proportion to the root of the number of summations: 100 summations lead to a 100-fold increase of the response but only to a ten-fold increase of the interfering noise. However, the resulting simple rule for the gain in noise suppression (e.g. increase of the signal-to-noise ratio by 30 dB for 1,000 iterations) is valid only under the condition of stochastic interference signals whose relevant properties do not change in the course of the measurement. In real life, the condition of stationarity is at best approximately fulfilled, since there is practically no measurement without slow changes in conditions and/or sporadically occurring disturbances. The number of factors influencing these interferences is large and highly variable both inter- and intra-individually.

Needless to say that there is great interest in maximizing the suppression of interferences. In the case of otoacoustic emissions (OAE), the use of which is limited to obtaining the dichotomous statement “signal detection successful or not successful” in the vast majority of practical applications, the efficient reduction of background noise alone constitutes the value of the method, since its impact can go as far as the complete masking of actually present responses. Likewise, the determination of response thresholds by means of the auditory evoked potentials (AEP) is practically impossible or at least inadmissibly inaccurate without a differentiated consideration of the disturbing influences.

These initial remarks are intended to make clear that, firstly, the quality of recording determined by the residual noise must be promoted with all available tools, secondly, that this must be done on a case-by-case basis, and thirdly, that the success achieved must be described with suitable measures and given to the user in the documentation. The assessment “signal detection not successful” can have many reasons – a pathological event is only one of these reasons.

It is the aim of this review to help the signal processing methods suitable and available for the reconstruction of physiological responses to become more popular and to contribute to the intensification of their practical application. This endeavor seems justified to the author, because in medical technology it is often observed that established diagnostic procedures are somewhat sluggish with respect to technical extensions or innovations. However, the quite understandable inclination to stick to the tried and tested should not go so far that, for example, the limitations of computer performance in the early days of objective audiometry still determine the procedures applied today. The author is convinced that the procedures described below contain dormant reserves, the consistent use of which could further improve the performance of objective audiometry.

Digital filters

The measurement probe (i.e. the ear canal microphone in the case of OAE and the electrodes for AEP) always captures many signal frequencies, including those that do not contribute to the detection of the physiological signal. For AEP, it is shown in Figure 1 [Fig. 1] that the EEG signal experienced by the electrodes in the frequency range from 1 to 5,000 Hz contains all components of short, medium, and long latency, but exceeds these target signals by up to 60 dB. For this reason, the digitization of the analog signal is always preceded by electronic hardware filters whose task is to limit the influence of signal components of irrelevant frequencies. It is understandable that a variable choice of the passband or cutoff frequencies is advantageous: For example, in FAEPs (early AEPs) the low frequency components are rather undesirable if the goal is to determine the latencies as accurately as possible, but in near-threshold measurements or in the case of high-frequency hearing loss they are useful or even indispensable (as demonstrated in Figure 2 [Fig. 2] and Figure 3 [Fig. 3]).

However, signal filters can be realized not only as analog electronic circuits, but also as digitally programmed software components. These have the advantage that they can be flexibly designed and applied not only online but also offline, i.e. after data acquisition has been completed. Unlike a hardware filter, the original data remain untouched in a posteriori filtering and are available for optional further processing.

The elimination of (unwanted) high-frequency signal components is most easily achieved with three-point smoothing (the k-th value Mk of the curve is calculated from the original value and its neighbors according to the formula 0.23 · Mk-1 + 0.54 · Mk + 0.23 · Mk+1). This corresponds to a single-pole low-pass filter with a slope of 6 dB per octave. Higher order digital filters allow larger slopes but require more complex calculations. The effect of a four-pole bandpass filter (high-pass 300 Hz, low-pass 1,800 Hz) especially optimized for ABRs is shown in Figure 2 [Fig. 2]. This filter is dimensioned so that the phase shifts of high-pass and low-pass compensate each other at the frequency centroid of the FAEP spectrum (at about 700 Hz) [11]. The programming code comprises only a few lines, relatively independent of the programming language.

The effect of the same filter realized with different parameters on the SAEP (late AEP) is shown in Figure 4 [Fig. 4]. The cortical responses regularly show a vertex-negative maximum N1 at 100 ms and a minimum P2 at 200 ms, with no relevant dependence on stimulus level. This time structure corresponds fairly closely to a frequency of 5 Hz with little energy at other frequencies. Therefore, narrow-band filtering with a passband of 3 Hz to 7 Hz is possible and beneficial for isolating the physiological signal from the background noise.

Any filter causes amplitude loss in both the wanted and the unwanted signal components; it is the task of filter dimensioning to balance the losses in favor of the useful signal. In Figure 3 [Fig. 3] it is shown that the high-pass filter improves the detectability of the FAEP responses J1 and J3 in the range of medium stimulus levels. On the other hand, for the representation of J5 near threshold, the broadband version is better, since the long-wave components of the curves are lost in the high-pass filtering.

Beyond the filter described here, several realizations of phase-error-free digital filters are available in almost all of today’s measuring systems. It is up to the user to select an option for the reversible post-processing of the measured curves whose properties and dimensioning are appropriate for the respective problem.

When measuring OAE, (digital) filtering reduces the effect of the low frequencies dominating in the acoustical background and thus potentially improves the signal-to-noise ratio. High-pass filtering is implemented in special measurement paradigms (e.g. QuickScreen mode in ILO92, Otodynamics Limited, UK), which lead to better results in case of unfavorable acoustic conditions. In addition to attenuating the low frequency components, the time range of the signal recording is shortened (e.g., 12 instead of 20 ms) because the amplitude of the low frequency responses occurring in the late latency range (from 12 to 20 ms) and generated in the apical regions of the cochlea [4] is reduced by the filter. The combination of the omission of low-frequency and long-latency components improves the signal detection for the high frequency physiological responses of baso-cochlear origin and shortens the examination time.

Averaging of signal polarity

The conventional procedure for reconstructing the physiological response out of the background is based on (linear) averaging of the amplitude of stimulus-synchronously recorded signal segments. An alternative procedure, which has both limitations and merits, consists of averaging the polarity of the signal in place of its amplitude and interpreting the result using binomial statistics [27], [10]. The averaging of polarity is only at first glance a “1-bit averaging”; in fact, each sample contributes to the summation result in the manifestation of three possible categories (“negative”, “zero”, and “positive”). The procedure was implemented in many laboratory instruments in the early days of ERA. Only in OAE screening devices it has become a widely used and essential part of signal evaluation in practice even today.

The (indirect) contribution of polarity averaging to signal detection is based on its sensitivity to systematic deviations of a signal from the random process. For a randomly determined time signal (free of DC offset), each individual sample has a positive or negative sign with equal probability. The addition of the sign or polarity over many sweeps will therefore statistically result in the expected value 0 at every point of the resulting curve. A deviation from zero which is significant according to the binomial statistics – which for a sufficiently large number of trials (averaging iterations) can be approximated by the Gaussian distribution – indicates that the underlying signal cannot be described by a random process.

The result “zero” in the polarity average means that the total signal composed of signal and noise was positive as often as negative in the totality of the sweeps (or exactly zero in all epochs). Due to DC voltages or signal components of low frequencies, which may still be present despite the high-pass filtering, the reference line can be different from zero. The (vertical) deviation from this line indicates the predominance of one of the two polarities. The special value of polarity averaging is that a perturbation of large amplitude affects amplitude averaging with the weight of its numerical magnitude, whereas in polarity averaging it contributes only a single binary unit. This makes polarity averaging very robust against perturbations compared to conventional amplitude averaging.

If the polarity average after n averages exceeds the limit

[Formula 1]

Equation 1

– with cp=1.645 for p=0.05, cp=2.326 for p=0.01 and cp= 3.090 for p=0.001 – a signal significantly deviating from the random process is present at the time of the exceedance with probability (1–p) [22]. If the original signal is strongly noisy, a given significance level is reached only after more averaging steps or at a later time. In contrast to the curve which shows the amplitude average of the EEG sections, the polarity average does not yield information about the size of the response, but about its significance.

If the polarity average is calculated only for one sample within the whole time-dependent signal, it cannot be excluded that its “deviation from chance” is just its property to be a zero point. Therefore, the method unfolds its full potential only if the entire acquired time range is included in the algorithm (see Figure 5 [Fig. 5]). The construction of the reference curve is done during the data acquisition, its visual observation allows to judge the progress of the signal detection and to make a qualified decision about abort, continuation or termination of measurement – resembling the method based on the single point variance [7], [9].

Reproducibility and signal-to-noise ratio

In many statistical procedures, the rule applies that two parts contain more information than the ensemble composed of these parts. With respect to the averaging of stimulus-correlated signals of biologic origin, this means that the generation of one single curve emerging from all individual responses does not allow an assessment of the reproducibility of the result. However, it requires neither instrumental nor programming effort to consistently organize the data acquisition into two partial averages A(t) and B(t) and to offer the option to calculate and display the overall mean (A+B)/2. The construction of A(t) and B(t) must be done during data acquisition by alternately summing the signal segments registered after the stimuli into one of two buffers. This “quasi-simultaneous” construction of partial averages is very helpful in deciding whether the response is to be judged as “real”, i.e. significantly protruding from the background (Figure 6 [Fig. 6]).

According to [24], the reproducibility r (correlation coefficient calculated from the curves A(t) and B(t)) is nearly equivalent to the signal-to-noise ratio q according to the almost exactly valid equation

[Formula 2]

Equation 2


[Formula 3]

Equation 3

where (s+n) stands for the total signal composed of signal and residual noise and n for the residual noise determined from the (half) difference of the partial averages. The user is free to decide whether he prefers to consider the reproducibility or the signal-to-noise ratio. In both cases, a suitable time window must be selected for the calculation, which can optionally be shifted along the time axis in order to display the correlation differentially in time (see chapter below and [19], [16]).

The availability of two partial average time functions A(t) and B(t) further opens the useful option to calculate the cross power spectrum from real part Re and imaginary part Im of the spectra A(f) and B(f):

[Formula 4]

Equation 4

which corresponds to the coherent spectral components common to the partial averages A and B, and the incoherent component

[Formula 5]

Equation 5

which reproduces the spectrum of the residual noise N (FFT = Fast Fourier Transformation). The separation of the coherent from the incoherent components is particularly useful for TEOAE and is also common practice there [26].

It seems appropriate to note that ipsi- and contralateral derivations of the AEP are not independent of each other to the same extent as two partial mean curves registered quasi-simultaneously under the same conditions. Their comparison is therefore only of limited use as a control of reproducibility.

Quality measures

The detection of the physiological response hidden in the noise depends decisively on the extent to which it is possible to keep the residual noise as small as possible. When the stimulus intensity approaches the response threshold, the response disappears in the noise – the value zero is assumed only at even lower stimulus levels. Therefore, the detection threshold is always higher than the hearing threshold; the greater the residual interference, the greater the distance between the two thresholds. In the limiting case of very large residual noise, no suprathreshold responses will be detectable.

These considerations make clear that the residual noise σ is a suitable quantity for assessing the quality of AEP or OAE recordings. It is defined as the rms voltage of half the difference of the partial averages A and B:

[Formula 6]

Equation 6

Here, the temporal integration extends over a suitable time window (or the summation over the time series of the assigned samples), e.g. from t1=2 ms to t2=12 ms for the FAEP. The rms voltage is equivalent to the standard deviation of the residual noise.

The standards DIN EN 60645-6 and 60645-7 [5], [6] require the display of an (estimated) quality measure. This requirement is met by many but not all commercial devices. Especially for FAEPs, the graphical combination of the residual noise with the amplitude growth function is useful to facilitate the identification of significant responses. A user-friendly, practical and clear implementation of this concept is shown in Figure 7 [Fig. 7].

The residual noise is a suitable measure for the formation of weighted averages (labeled “GMW” in the table of Figure 7 [Fig. 7]) from parameters that have been determined several times from different recordings. For example, the differences of latencies, especially the cochleo-mesencephalic latency difference t5–t1 (“central conduction time”), are independent of the stimulus level to a good approximation, and therefore a mean value can be calculated from the individual values measured at different stimulus levels, where the individual values are usefully weighted by a factor proportional to the reciprocal of the respective residual noise. This ensures that the parameters derived from recordings with good quality have a greater impact on the final result. The weighting and the differentiation between reliable and less reliable values can be omitted if care is taken to ensure that the residual noise is constant across all recordings.

In combination with the amplitude of the response, the residual noise is the relevant measure for the significance of the signal detection. Therefore, it is useful to calculate this quantity continuously already during data acquisition, to display it to the examiner and to use it optionally as a stop criterion for the completion of the measurement. For transitory evoked OAEs (TEOAE), it has been demonstrated that expert identification of the response is successful in 87% of cases when the level of residual noise is below –1.5 dB SPL, but in only 71% of cases when this value is above +1.5 dB SPL [16], [15]. For the “numerical” signal detection based on reproducibility, the corresponding detection rates are 94% and 84%, respectively, each based on expert judgment. For the otoacoustic distortion products (DPOAE), qualitatively quite similar but quantitatively less critical rules are valid since the residual noise is much lower for metrological reasons [18].

Digital superposition

In electrical response audiometry (ERA), a large part of the effort to increase quality and reliability is aimed at objectifying signal detection and determining the response threshold. In addition, however, ERA is also used in clinic and practice for the purpose of obtaining evidence of maturation delays or of space-occupying lesions (such as a vestibular schwannoma or a vascular loop in the internal auditory canal or cerebellopontine angle). Obtaining answers to these questions is also based on the dichotomous and primarily qualitative utilization of the final recording with respect to the identification of the peak J5, but additionally for the detection of J1 and the most accurate quantitative measurement of the latencies of both components.

With rare exceptions, the amplitude of J1 is smaller than that of J5. In addition, the amplitude of J1 is affected to a much greater extent than that of J5 by high-frequency hearing loss, which is present in many of the suspected cases. Therefore, this component, indispensable for the determination of the cochleo-mesencephalic latency difference t5–t1 (“central conduction time”), is not clearly discernible in many cases. The scheme of later offline processing of the data described below serves to highlight more clearly the peak J1 without additional measurements.

Typically, two or more click-evoked FAEP recordings gained at different stimulus levels with a component J5 identified as significant response are available as starting material for the application of the method. Among these, the investigator makes a selection for further processing using a special software module, if available. The software corrects for the level dependence of the latencies by shifting each of the selected traces along the time axis according to the respective value of t5. Since the latency difference t5-t1 does not depend on the stimulus level to a good approximation [22], this measure also synchronizes J1. An average curve is calculated from the shifted curves and displayed (“aligned overlay”). The intended and achieved compensation of the level-dependent latency t5 in a systematic way follows the same principle as the elimination of the frequency dependence in the formation of the “stacked derived-band response” [8]. From the resulting “latency-corrected digital superposition”, the desired latency difference t5–t1 can be determined (Figure 8 [Fig. 8]).

In many cases, the outcome of the latency-compensated superposition consists in a clear reconstruction of all peaks from J1 to J5, which allows an unambiguous determination of the diagnostically important latency difference t5–t1. If this is not successful, the examiner formulates the statement “central conduction time cannot be determined” on a much better basis than without this digital evaluation aid.

According to general rules, processing several curves improves the signal-to-noise ratio, e.g. by a factor of two with four curves of comparable amplitude. Alternatively, the number of averages could be increased by the same factor, but this results in a correspondingly higher burden on the patient and prolongation of the examination time. The practical benefit of digital superposition cannot be described quantitatively in empirical figures. However, in the author's clinical work it has proven to be a tool used almost daily and highly appreciated for its usefulness.

RMS amplitude as an alternative to amplitude difference

There are good reasons for taking a closer look at a parameter of the FAEPs that is generally rather neglected, namely the amplitude A5 of component J5. In general, it is reasonable to assume that the magnitude of a physiological response is closely related to the strength of the stimulus and to the functionality and vitality of the stimulated biological structure and at best even diagnostically useful. In this context, in view of the inevitable presence of noise, the question of an unambiguous and robust definition of the target quantity deserves attention.

Conventionally, A5 is obtained from the difference of two amplitude values, mostly maximum and minimum (Figure 9 [Fig. 9]). This quantity corresponds to the vertical extent of the peak. The use of this relatively simple linear amplitude difference A5lin is not the result of an optimization in terms of accuracy or robustness, but rather due to the fact that its calculation requires little effort. In technical signal processing, however, it is common practice to consider the rms amplitude rather than the difference of two individual values for the quantitative description of time-dependent signals, especially in the case of irregular signal characteristics. One of the reasons for this is that the effective amplitude is more likely to reflect the relevant dimension, namely the physical power contained in the signal. This is the case with the linear amplitude difference if and only if all changes of the component J5 caused by e.g. stimulus parameters or pathological processes are purely linear scale transformations (without deformations or distortions such as flattening).

The rms amplitude A5eff is defined as the root of the continuously or discretely calculated mean square deviation of the measured voltage U(t) from its mean value Ū:

[Formula 7]

Equation 7

The quantity U(t) is the mean value the EEG voltage calculated from the partial mean values A(t) and B(t), Ū is the mean amplitude of this curve in the time interval of integration or summation, which extends from t1=t5+–Δt/2 to t2 = t5–+Δt/2; position and duration of this time interval are derived from the latencies of maximum and minimum (see Figure 9 [Fig. 9]).

The calculation of the effective amplitude A5eff does not involve any additional effort for the examinator. The only intervention consists in the anyway usual identification of maximum and minimum of the component J5. From the latencies t5+ and t5– thus determined, the time range of integration or summation given by the double horizontal distance Δt between peak and trough is obtained. With this choice of the time window the coverage of the entire relevant signal section is ensured.

The rms amplitude is not the area enclosed by the potential curve and the zero line, but the root mean square (RMS) of the time-dependent amplitude. It is therefore a statistically defined quantity which, except for the time window, is defined exactly similar to the residual noise of the recording as defined above (and can therefore be directly compared with it). In contrast to the linear amplitude difference, this parameter arising from much more than just two samples is less susceptible to variations in the background than either of its components and is thus a more accurate measure of signal strength than the linear amplitude difference.

The rms amplitude can be used as a signal descriptor even if no response is present. For this purpose, in registrations derived with stimulus levels below the response threshold, the time window for the calculation of A5eff is determined manually or automatically from the extrapolation of the exponential latency characteristic [12]. At low stimulus levels, the rms amplitude is distributed randomly around the background level; at the response threshold, its value increases systematically with increasing stimulus level L (Figure 10 [Fig. 10]). A kink in the amplitude function A5eff (L) marks the response threshold. It is close to triviality to state that for a reliable determination of the threshold, measured values below as well as above the threshold must be available [13]. In addition to the kink in the graph identified visually by the investigator, a numerical criterion is determinative of responses significantly exceeding the noise level. The condition is that A5eff is at least one standard deviation larger than the residual noise σ identical to this standard deviation [14].

The concept of effective amplitude opens up the possibility of adding a useful option to electrical response audiometry. Through its application, the classification as “clear response” is not solely "in the eye of the beholder", but is based on a precisely defined numerical criterion. This leads to a considerable increase in the reliability of the objective response threshold.

Time differential correlation analysis (gliding reproducibility)

Most of the physiological responses registered in objective audiometry are only present in a limited fraction of the time window. Nevertheless, the correlation coefficient or signal-to-noise ratio (often referred to as “repro”) is usually calculated from the data in the entire time window. This can lead to a situation where there is little or no correlation between the result of this calculation and the response. In the worst case, a response can be overlooked, but in any case, considering integral reproducibility alone at least does not exhaust the potential of the parameter. However, with the help of the local, temporally differential evaluation of individual sections of the time window, even responses that are limited in time and variable in their position (latency) do not escape detection.

Figure 11 [Fig. 11] shows the result of a TEOAE measurement which, according to the visual evaluation, contains a clear physiological response. However, since the duration of this response is limited to the first part of the time window, the integrally calculated reproducibility is just 48.2%. According to the usual rules, this leads to the classification as “OAE-negative”, although the limit of 60% is consistently exceeded in the interval 3 to 9 ms – with a peak value of 94.9%. The gliding time-differential calculation of the correlation coefficient (gliding reproducibility) enables the unambiguous OAE detection [16].

For the evaluation by the user, the quantitative graphical representation of the reproducibility in an additional coordinate system is not optimal. More intuitive and more directly graspable for the eye of the observer in comparison to the two-dimensional graph is the coding shown in Figure 12 [Fig. 12] in which the local correlation coefficient is presented in graded shades of gray [19]. The degree of blackening varies from white (correlation coefficient r≤0%) to black (r=100%). Since the reproducibility as a correlation coefficient is a statistical quantity calculated from many numerical values, the shorter the underlying time window, the more it underlies the influence of chance. This explains that large contrasts can occur in the degree of shading of neighboring fields.

The example shown in Figure 12 [Fig. 12] makes clear that detectable and normal OAE are not the same and that the distinction between the two is definitely possible and also clinically meaningful. The global reproducibility is the basis of the purely dichotomous signal detection, thus it does not say more than “signal detection was successful”. On the other hand, its temporally differential observation allows differentiated statements about deviations from the normal appearance of the response and thus about possible pathological causes. Also in the frequency domain, the differential consideration of the spectrum and the coding of the local signal-to-noise ratio in a gray scale extends the information yield. Visual detection of empty time regions or frequency bands, aided by graphical processing, facilitates the derivation of clues to the frequencies affected by possible hearing loss.

Among the physiological signals used in objective audiometry, TEOAEs occupy a special position in that the response normally extends over the entire duration of the registered time window. In contrast, for transient AEPs (e.g., FAEPs and SAEPs), the response occupies only a limited portion of the time window and its location within this window is variable because the latency depends on stimulus parameters, physiological conditions, and pathological changes. It is in this situation that the display of temporally differential correlation, similar to the polarity average, contributes to the identification of the response and to the assessment of its significance (Figure 13 [Fig. 13]).

The correlation considered in this and the previous sections emerges from the amplitude of the curves involved and their time course. In addition, the correlation of the slope of the curves can be determined from the first time derivative of the curves. The interest in this arises from the fact that the latencies of the individual response components, which are so important for diagnosis, are characterized as local maxima and minima by a horizontal tangent. Thus, the slope must be equal to zero in both partial average curves. This equality is accompanied by an intensive blackening of the gray scale display (Figure 13 [Fig. 13]). If the maxima are offset in time from each other, the display is more gray than black and the latency determination is less reliable.

Amplitude histograms

Detecting the signal against the background of residual noise is the key to the use of OAEs and AEPs. As tools in accomplishing this task, all features that can help to distinguish between signal and noise are useful. These features include, first, the elementary parameters of frequency, phase, and amplitude, by which all time-dependent processes are fully described. However, each of these quantities not only has an instantaneous value, which is included, for example, in the polarity average and the correlation coefficient, but also underlies a statistical distribution, which is characteristic and different for deterministic and stochastic processes. These frequency distribution densities or histograms of the mentioned parameters are specific for the process underlying the generation of the signal. In general, signal and noise will have different distribution densities. When signal and noise occur as a mixture, their specific distribution functions are overlayed to an integral histogram.

The statistical distribution density contains the answer to the question, with which frequency the individual values of the considered parameter occur. Among the parameters whose histograms could be useful, only amplitude is considered here, although frequency and phase spectra also play a role in statistically based signal detection [3], [2].

The amplitude distribution of a stochastic process is known to have the shape of a Gaussian curve: The value zero occurs most frequently, and with increasing amplitude the probability of occurrence becomes smaller and smaller. The amplitude distribution of a harmonic oscillation is less popular: Because the sine wave is flat at its extreme values and steep in between, it has the shape of a trough with two poles (infinities or singularities). In graphical terms, this is due to the fact that the vertical probability density of the system is given by the inverse of the slope, if the samples of electric voltage are equidistantly distributed in the horizontal dimension (time axis).

This observation makes clear that the amplitude distribution density of the additive superposition of a stochastic process and a sinusoidal signal differs in a characteristic way from the bell-shaped curve characterizing the random distribution by peaks at large absolute values of the amplitude (Figure 14 [Fig. 14]). The height of these lines at the margins of the spectrum depends on the signal-to-noise ratio; in the case of isolated residual noise, the peaks disappear altogether.

In clinical practice, the observation of the amplitude histograms has proved particularly useful for the N1–P2 complex of the SAEPs, since here an evoked signal is present that resembles a pure harmonic oscillation (corresponding to the time interval of 100 ms between maximum N1 and minimum P2, the frequency of this oscillation is approx. 5 Hz). The comparison of the frequency distribution of the sum of signal and noise with that of the isolated residual noise makes the presence of a response convincingly clear (Figure 15 [Fig. 15]). Beyond a single prototype realized so far in the author's laboratory, optimization of this tool is certainly still possible, for example with respect to the length of the analyzed time window. With increasing length, the influence of the noise increases and the difference in the amplitude histograms loses clarity.

At present, the practical use of the amplitude histograms is limited to the observation and evaluation of their graphical representation; a quantitative description of the benefit for diagnostic certainty is not possible. Of course, the deviation of the positive response from the normal distribution which is typical for pure noise, can be described with the help of statistical tests; however, the visualization rather meets the needs of the users, who are supported in this way by a machine and without additional effort in judging numerical parameters.


For some of the commercially available devices for measuring OAEs or AEPs that are in practical use, the final presentation of the examination results could still be improved in terms of expediency and ergonomics. It is not expedient and even less ergonomic to follow the precept of completeness, which may be quite appropriate in many other areas of life. Even if it is undoubtedly correct to document all parameters concerning the stimulation, the signal acquisition, the measurement conditions and the results, the completeness of the information on the (hard copy) report intended for the patient's file or for passing on to colleagues is rather disadvantageous than purposeful. A rationally determined and purpose-oriented documentation should be limited to the information with diagnostic or therapeutic consequences. The amplification factor or the filter limits are certainly not among them; since their setting values are changed only for rare special questions, they need not be indicated on the daily report.

Just like the overloading with irrelevant parameters, the omission of substantial data is one of the frequently encountered deficiencies. The deepness of anaesthesia, which may be fundamentally important for interpreting the measurement result, and a quantity describing the residual noise, which is indispensable according to the DIN EN 60645-6 and 60645-7 standards [5], [6], cannot be found at all in many measurement systems, or only after a lengthy search and only on the screen. Avoiding these deficits was the goal of the development of a software environment at the Audiology Laboratory of the Heidelberg University ENT Clinic, with the help of which the results of OAE and AEP measurements can be displayed completely and compactly at the same time.

For the FAEPs, the compilation of graphical displays and numerical data, which has been optimized over decades, is presented here in more detail (Figure 16 [Fig. 16]). The advantage for the viewer is the presentation of primary and secondary examination results at a glance – without the need to turn pages or to jump between different screen windows.

With special emphasis we would like to advocate the display of two partial averages for each take. In the usually shown total average, a peak can practically always be found at the points where the evaluator looks for it. Only by consideration of the partial averages a decision about the reliability of this maximum is possible. If necessary, the two parts can be fused with ease to form the total average, but the benefit is merely an aesthetic enhancement – at the cost of a loss of information.

In principle, all components of the documentation can already be displayed during data acquisition. After completion of the measurement series, evaluations for finding the response threshold can follow, as described in more detail in the next section.

Threshold determination

Many of the procedures and tools described in this review aim to reliably identify the response and thus contributing to the determination of the response threshold. The audiological interest in the response threshold – i.e. the lowest stimulus level at which a response is detectable – is based on the close relationship between this quantity and the hearing threshold as the target quantity actually relevant for hearing diagnosis. Hearing threshold and response threshold are not identical, but they can be very close to each other. The hearing threshold is generally below the response threshold, because the auditory system is definitely activated if a response has been detected.

The distance between auditory and response thresholds is variable and depends on many factors (Figure 17 [Fig. 17]). The starting point for a closer look is the undoubtedly correct observation that the disappearance of the response is never observed, but only its sinking into the noise. To be recognized with sufficient certainty as a significant signal, the (effective) amplitude of the response must exceed the (effective) amplitude of the residual noise by a specified minimum amount. In signal processing, it is common practice to require a margin of 6 dB. This amount is distinguished by the fact that it corresponds to a factor of two of the amplitudes of signal and noise – or somewhat more precisely: the effective amplitude N of the noise increased by the signal amplitude S is twice as large as N alone, or (S+N)>2N.

Responses whose amplitude falls below the mentioned limit escape detection. At stimulus levels above the response threshold L1, the amplitude of the signal increases, below which it continues to decrease until it reaches zero. The zero crossing L0 can be determined approximately by (linear) extrapolation of the suprathreshold measured amplitude characteristic or growth function (AGF). Its (horizontal) distance ΔL1 from the response threshold depends on the strength of the residual noise, on the absolute amplitude of the response, on the slope of its growth function, and on the nature of any pathological event that may be present (e.g., recruitment or loss of neuronal synchronization). Also, the stimulus level associated with L0 is generally still above the (physiological) auditory threshold LHS. The distance between the thresholds LHS and L0 depends, among other things, on the type and origin of the signal to be detected (e.g., it is very large in the case of the stapedius reflex).

The impact of noise amplitude, slope of the growth function, and number of averages on the response threshold can be described by an exact mathematical expression using simple reasoning and based on realistic assumptions:

[Formula 8]

Equation 8


  • L1 designates the response threshold,
  • L0 is the zero crossing of the amplitude growth function,
  • AN(m=1) designates the (effective) amplitude of the unaveraged noise,
  • g is the slope of the (linearly approximated) amplitude growth function (for FAEPs typically 10 nV/dB) and
  • m is the number of averages.

If only the effect of the number m of averages is considered in the given equation, it can be seen that the response threshold L1 approaches the zero crossing L0 with increasing accuracy. However, the two quantities agree with each other only asymptotically (i.e., for m → ∞).

The other parameters appearing in the equation have the following effect (Figure 18 [Fig. 18]):

  • The stronger the (unaveraged) noise AN (m=1), the larger the discrepancy between L1 and L0.
  • The steeper the amplitude characteristic curve (large slope g), the smaller this deviation.

Under unfavorable but realistic conditions, the supposed response threshold L1 may deviate from the real response threshold L0 by up to 30 dB. In practice, the reliability of threshold estimation can be estimated by graphically displaying the obtained data set several times after completion of the measurement series (consisting of several derivations performed at different stimulus levels), for example for the averaging numbers m, m/2, m/4, and m/8 (this assumes that either all individual sweeps or at least the intermediate averages were stored during data acquisition). The investigator then determines the visual response threshold from each of the four series, compares the values with each other, and derives an estimate for the “true threshold”.

According to theory, the difference L1–L0 between the supposed and real threshold decreases by a factor of √2 if the number of averages is doubled:

[Formula 9]

Equation 9.

This equation can be solved for L0 and yields the following expression for the “true threshold”:

[Formula 10]

Equation 10.

For example, if the observed response threshold L1 drops from 40 to 30 dB when the averaging number is doubled, then the true response threshold L0 is 6 dB.

Unfortunately, no statement can be made about the practical usefulness of the described approach, since no measuring device with a corresponding implementation exists yet. It may be assumed that the described rules are only approximately valid in practice for at least two reasons: First, the considerations are based on the assumption of a steady-state EEG noise, which only a few patients are willing and able to produce, and second, the stimulus levels within a measurement series are usually varied in a grid with steps of 10 dB, which limits the accuracy of the results. In general, the relevance of the threshold correction decreases as the slope of the amplitude growth function increases. For very efficient stimuli like the CE chirp, it is probably dispensable.

Correcting the observed response threshold for the influence of residual noise does not relieve the investigator of the necessity to derive the hearing threshold from the response threshold. As shown in Figure 17 [Fig. 17], the distance between these two quantities is composed of two contributions ΔL1 and ΔL2. Since the contribution ΔL1 is now under control, the only unknown remaining is the difference ΔL2. This term accounts for the fact that even under ideal conditions (free of interference), not every stimulation that leads to a perception of the stimulus is accompanied by a measurable response of the respective sensory, synaptic, or neuronal target structure. For example, with electrocochleography (ECochG), the compound action potential (CAP) is detectable virtually directly at the threshold of hearing, whereas the cochlear microphonics (CM) are not visible until stimulus levels which exceed the hearing threshold by more than 50 dB [22]. For the consideration of effects of this kind, the only practicable procedure is the application of an empirically justified method-specific correction. However, as a result of the compensation of the insufficient noise liberation, the corrections will be smaller and the accuracy will increase as a result.

Conclusion and outlook

The devices offered today for practical use for the measurement of OAEs and AEPs undoubtedly have a high technical level, but they still do not exhaust the possibilities of signal processing. From the technical limitations in the pioneering days of electrical response audiometry, which date back about half a century, relics have been preserved which have lost their raison d’être today and have been partly adopted for the measurement of OAEs. A prominent example of unused technical resources is the discarding of signal sections with undesirably large amplitude (“artifacts”) immediately after their registration. Today, the availability of storage space and computational capacity make it possible to store all signal segments (sweeps) until the completion of data acquisition in order to sort them later and to be able to discard those sweeps in which the damage caused by the unfavorable signal-to-noise ratio outweighs the benefit of an additionally recorded physiological response [28]. It is generally known, even in early life, that only at the end of the scholar year, when all school grades are available, there is a reliable basis for deciding which of the individual assessments should be excluded when calculating the average grade. The feasibility of sorted averaging and the resulting benefit have already been demonstrated [29].

The starting point for the present review was the conviction that the use of all signal processing options leads to a reduction of the certainly very high number of negative AEPs or OAEs results that cannot be attributed to pathology. Many of the tools presented in this overview are well-established basic instruments, the use of which should be – but is not – self-evident in the detection of small and noisy signals. Other approaches are new and so far only tentatively tested, so their benefit cannot be quantified. It seems appropriate to note that the algorithms of power-optimized signal processing run automatically in the background and do not require any additional knowledge or intervention from the examiner.

Only the derivation of the auditory threshold from the response threshold requires a minimal intervention by the user. However, the effort associated with the correction of the residual noise does not significantly exceed that of the previous application of questionable “extrapolation rules”. Since the majority of objective hearing tests performed in practice serve to determine the hearing threshold, and because incorrect determination of the threshold can have serious consequences particularly in children [23], this effort is certainly justified.

It will certainly also help to increase the quality of objective audiometry if manufacturers and users pay increased attention to the relevant standards [5] and [6] and recommendations [1], [21], [25]. Finally, it should be noted that the lack of a uniform nomenclature for the methods and parameters unnecessarily complicates their use [20]. In this respect, as in the more in-depth study of the contents of the article now coming to an end, the manufacturers are certainly challenged to a greater extent than the users of their products.


Competing interests

The author declares that he has no competing interests.

Citation reference

This article is also available in German:

Hoth S. Leistungsoptimierte Signalverarbeitung in der objektiven Audiometrie – Digitale Werkzeuge für die effiziente Messung und Nutzung von AEP und OAE. GMS Z Audiol (Audiol Acoust). 2022;4:Doc06.
DOI: 10.3205/zaud000024


The author’s ORCID ID is: 0000-0002-3405-9751


ADANO, Hoth S. Empfehlungen der Arbeitsgemeinschaft Deutschsprachiger Audiologen und Neurootologen (ADANO) zur Durchführung der Elektrischen Reaktions-Audiometrie mit transienten Potentialen. 2006. Verfügbar unter: External link
Aoyagi M, Suzuki Y, Yokota M, Furuse H, Watanabe T, Ito T. Reliability of 80-Hz amplitude-modulation-following response detected by phase coherence. Audiol Neurootol. 1999 Jan-Feb;4(1):28-37. DOI: 10.1159/000013817 External link
Beagley HA, Sayers BM, Ross AJ. Fully objective ERA by phase spectral analysis. Acta Otolaryngol. 1979 Mar-Apr;87(3-4):270-8. DOI: 10.3109/00016487909126420 External link
Böhnke F, Janssen T, Steinhoff HJ. Zeit-Frequenz-Darstellung evozierter otoakustischer Emissionen zur Diagnose kochleärer Funktionsstörungen. Otorhinolaryngol Nova. 1992;2(2):80-4. DOI: 10.1159/000312825 External link
DKE/GUK 821.6 - Hörgeräte, Audiometer und Kuppler. DIN EN 60645-6:2010-08: Akustik – Audiometer – Teil 6: Geräte zur Messung von otoakustischen Emissionen (IEC 60645-6:2009); Deutsche Fassung EN 60645-6:2010. Berlin: Beuth; 2010.
DKE/GUK 821.6 - Hörgeräte, Audiometer und Kuppler. DIN EN 60645-7:2010-08: Akustik – Audiometer – Teil 7: Geräte zur Messung von akustisch evozierten Hirnstammpotentialen (IEC 60645-7:2009); Deutsche Fassung EN 60645-7:2010. Berlin: Beuth; 2010.
Don M, Elberling C, Waring M. Objective detection of averaged auditory brainstem responses. Scand Audiol. 1984;13(4):219-28. DOI: 10.3109/01050398409042130 External link
Don M, Masuda A, Nelson R, Brackmann D. Successful detection of small acoustic tumors using the stacked derived-band auditory brain stem response amplitude. Am J Otol. 1997 Sep;18(5):608-21.
Elberling C, Don M. Quality estimation of averaged auditory brainstem responses. Scand Audiol. 1984;13(3):187-97. DOI: 10.3109/01050398409043059 External link
Hönerloh HJ, Kletti J. Ein Verfahren zur Objektivierung von ERA-Messungen. Laryngol Rhinol Otol (Stuttg). 1981 Apr;60(4):178-80.
Hönerloh HJ, Kletti J. Filterung und Glättung von ERA-Potentialen. Arch Otorhinolaryngol. 1978 Sep 28;221(2):135-41.
Hoth S. Die Kategorisierung von Hörstörungen anhand der Latenzabweichung in der BERA. Laryngol Rhinol Otol. 1987 Dec;66(12):655-60.
Hoth S. Die Schwelle in der Begriffswelt des Audiologen. Z Audiol. 2009;48(1):46-9.
Hoth S. Ein alternatives Maß für die Amplitude elektrophysiologischer Reizantworten. Z Audiol. 2017;56(4):140-6.
Hoth S. OAE in HD-Qualität. Omnimed Forum HNO. 2020;22(4):196-200.
Hoth S. OAE in Klinik und Praxis – Neue Bewertung etablierter Regeln. Z Audiol. 2019;58(4):148-51.
Hoth S. Reliability of latency and amplitude values of auditory-evoked potentials. Audiology. 1986;25(4-5):248-57. DOI: 10.3109/00206098609078390 External link
Hoth S. Warum sind TEOAE und DPOAE gegenüber cochleären Funktionsdefiziten unterschiedlich empfindlich? Z Audiol. 2003;42(2):48-50.
Hoth S. Zeitlich differentielle Analyse des Korrelationskoeffizienten: Eine Bereicherung bei der Auswertung von akustisch evozierten Potentialen. Audiol Akustik. 1991;30(6):214-20.
Hoth S, Böttcher P. Nomenklatur und Diagramme bei der Beschreibung und Interpretation von OAE-Messungen. Z Audiol. 2008;47(4):140-9.
Hoth S, Janssen T, Mühler R, Walger M, Wiesner T. Empfehlungen der AGERA zum Einsatz objektiver Hörprüfmethoden im Rahmen der pädaudiologischen Konfirmationsdiagnostik (Follow-up) nach nicht bestandenem Neugeborenen-Hörscreening. HNO. 2012;60:1100-2.
Hoth S, Lenarz T. Elektrische Reaktions-Audiometrie. Heidelberg: Springer; 1994.
Hoth S, Mühler R, Neumann K, Walger M. Objektive Audiometrie im Kindesalter – Ein Lehrbuch für die Praxis. Heidelberg: Springer; 2014.
Hoth S, Polzer M. Qualität in Zahlen. Signalnachweis in der objektiven Audiometrie. Z Audiol. 2006;45(3):100-10.
IERASG. A specification for ABR systems used for post newborn hearing screening diagnostic testing. 2021. Verfügbar unter: External link
Kemp DT, Bray P, Alexander L, Brown AM. Acoustic emission cochleography – Practical aspects. Scand Audiol Suppl. 1986;25:71-95.
Leitner H. Ein neues Verfahren zur automatischen Auswertung der ERA mit Hilfe der stochastisch-ergodischen Konversion (SEC). Laryng Rhinol. 1975;54:677-81.
Mühler R, von Specht H. Sorted averaging – principle and application to auditory brainstem responses. Scand Audiol. 1999;28:145-9.
Rahne T, von Specht H, Mühler R. Sorted averaging – application to auditory event-related responses. J Neurosc Meth. 2008;172(1):74-8.