Artikel
Power and sample size estimation for comparing diagnostic methods with imperfect reference standards
Suche in Medline nach
Autoren
Veröffentlicht: | 15. September 2023 |
---|
Gliederung
Text
Introduction: In the medical field, it is essential to evaluate the diagnostic performance of new imaging protocols. Ideally, this is done by comparing the new method to a (near) perfect reference standard. However, obtaining such a reference standard is not always feasible. As a result, it may be necessary to assess whether a new method can be used interchangeably with an existing method, particularly if the new method can provide faster results. This assessment is crucial in determining if the two methods can give similar outcomes for individual patients. Multi-reader multi-case (MRMC) studies are commonly used to perform comparative assessments of imaging protocols as the interpretation of images also depends on the ability of the reader.
Obuchowski [1] introduced a measure of interchangeability and derived bootstrap confidence intervals, which can be used to test for equivalence. In a subsequent paper, Obuchowski et al. [2] provided tests for interchangeability. One of these tests was recently used in a study on accelerated spine MRI [3]. However, a more general power and sample size estimation than possible there is desirable.
Methods: We considered a MRMC study for diagnostic methods with a binary outcome, assuming normally distributed case difficulty and normally distributed reader ability. Case difficulty and reader ability are variables in linear predictors that determine underlying sensitivity and specificity via a logit link, thus allowing for heterogeneity of observations.
We conducted various MRMC simulations with a range of linear predictors, producing a range of underlying values for sensitivity and specificity for the two methods being compared. To test for interchangeability, we estimated the probability of agreement using the existing method twice, the probability of agreement using each method once, and obtained a confidence interval for their difference using the test suggested by Obuchowski et al. [2]. We fitted a generalized estimating equation (GEE) model to estimate these probabilities from dependent observations. The confidence interval for the difference of these probabilities can be obtained using either bootstrapping or the delta method. All simulations were performed in R 4.2.3.
Results: Our simulations showed that there was little difference between the results based on bootstrapped confidence intervals and confidence intervals obtained using the delta method. However, the time required for bootstrapping confidence intervals was substantially longer than for delta method confidence intervals.
Discussion: Our tool allows for the estimation of the power of a test for interchangeability, given the number of cases, the number of readers, and some plausible assumptions about the distribution of case difficulty, reader ability, and their link to sensitivity and specificity. As bootstrapping is not mandatory to obtain the confidence intervals, the necessary simulations can be performed within a few minutes on a laptop computer.
Conclusion: Our suggested tool will facilitate the planning of MRMC interchangeability studies, enabling researchers to determine the sample size required for such studies to achieve sufficient power.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Obuchowski NA. Can electronic medical images replace hard-copy film? Defining and testing the equivalence of diagnostic tests. Stat Med. 2001;20(19):2845-63.
- 2.
- Obuchowski NA, Subhas N, Schoenhagen P. Testing for interchangeability of imaging tests. Acad Radiol. 2014;21(11):1483-9.
- 3.
- Almansour H, Herrmann J, Gassenmaier S, Afat S, Jacoby J, Koerzdoerfer G, et al. Deep Learning Reconstruction for Accelerated Spine MRI: Prospective Analysis of Interchangeability. Radiology. 2023;306(3):e212922.