gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

Power and sample size estimation for comparing diagnostic methods with imperfect reference standards

Meeting Abstract

Suche in Medline nach

  • Irene Schmidtmann - IMBEI, Universitätsmedizin der Johannes Gutenberg-Universität, Mainz, Germany
  • Ahmed E. Othman - Klinik und Poliklinik für Neuroradiologie, Universitätsmedizin der Johannes Gutenberg-Universität, Mainz, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 264

doi: 10.3205/23gmds060, urn:nbn:de:0183-23gmds0603

Veröffentlicht: 15. September 2023

© 2023 Schmidtmann et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: In the medical field, it is essential to evaluate the diagnostic performance of new imaging protocols. Ideally, this is done by comparing the new method to a (near) perfect reference standard. However, obtaining such a reference standard is not always feasible. As a result, it may be necessary to assess whether a new method can be used interchangeably with an existing method, particularly if the new method can provide faster results. This assessment is crucial in determining if the two methods can give similar outcomes for individual patients. Multi-reader multi-case (MRMC) studies are commonly used to perform comparative assessments of imaging protocols as the interpretation of images also depends on the ability of the reader.

Obuchowski [1] introduced a measure of interchangeability and derived bootstrap confidence intervals, which can be used to test for equivalence. In a subsequent paper, Obuchowski et al. [2] provided tests for interchangeability. One of these tests was recently used in a study on accelerated spine MRI [3]. However, a more general power and sample size estimation than possible there is desirable.

Methods: We considered a MRMC study for diagnostic methods with a binary outcome, assuming normally distributed case difficulty and normally distributed reader ability. Case difficulty and reader ability are variables in linear predictors that determine underlying sensitivity and specificity via a logit link, thus allowing for heterogeneity of observations.

We conducted various MRMC simulations with a range of linear predictors, producing a range of underlying values for sensitivity and specificity for the two methods being compared. To test for interchangeability, we estimated the probability of agreement using the existing method twice, the probability of agreement using each method once, and obtained a confidence interval for their difference using the test suggested by Obuchowski et al. [2]. We fitted a generalized estimating equation (GEE) model to estimate these probabilities from dependent observations. The confidence interval for the difference of these probabilities can be obtained using either bootstrapping or the delta method. All simulations were performed in R 4.2.3.

Results: Our simulations showed that there was little difference between the results based on bootstrapped confidence intervals and confidence intervals obtained using the delta method. However, the time required for bootstrapping confidence intervals was substantially longer than for delta method confidence intervals.

Discussion: Our tool allows for the estimation of the power of a test for interchangeability, given the number of cases, the number of readers, and some plausible assumptions about the distribution of case difficulty, reader ability, and their link to sensitivity and specificity. As bootstrapping is not mandatory to obtain the confidence intervals, the necessary simulations can be performed within a few minutes on a laptop computer.

Conclusion: Our suggested tool will facilitate the planning of MRMC interchangeability studies, enabling researchers to determine the sample size required for such studies to achieve sufficient power.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Obuchowski NA. Can electronic medical images replace hard-copy film? Defining and testing the equivalence of diagnostic tests. Stat Med. 2001;20(19):2845-63.
2.
Obuchowski NA, Subhas N, Schoenhagen P. Testing for interchangeability of imaging tests. Acad Radiol. 2014;21(11):1483-9.
3.
Almansour H, Herrmann J, Gassenmaier S, Afat S, Jacoby J, Koerzdoerfer G, et al. Deep Learning Reconstruction for Accelerated Spine MRI: Prospective Analysis of Interchangeability. Radiology. 2023;306(3):e212922.