Artikel
Simulation-based selection of a propensity-score weighting approach to adjust for unexpected self-selection in a cohort study – results from the PRAIM study
Suche in Medline nach
Autoren
Veröffentlicht: | 6. September 2024 |
---|
Gliederung
Text
Introduction: The PRAIM study is an observational non-inferiority study comparing the performance of AI-supported double reading with standard double-reading (without AI) among women (50–69 years) undergoing organised mammography screening in twelve screening sites in Germany. The AI system provided in the viewer (1) “normal” tags to indicate a subset of most likely unsuspicious examinations and (2) localisation prompts to trigger reassessment for highly suspicious examinations if they were judged unsuspicious by the radiologists. Primary endpoint was screen-detected breast cancer rate.
During data collection, it was learned radiologists sometimes chose the viewer for the final report of the examination depending on the triaging prediction (normal/not-normal), which is visible in advance. As the consensus conference is more convenient if all participating radiologists have used the same viewer, some radiologists tended to report only normal-tagged cases in the AI-supported viewer and switched to other viewers otherwise. This confounding procedure was not anticipated in the original study setup and could introduce severe bias.
Methods: A simulation study, reflecting the expected real situation as closely as possible, was conducted to identify a statistical method that successfully corrects for the self-selection bias.
Five bias scenarios with different degrees of bias (i.e. a different probability of radiologists finalizing the examination using the AI-supported viewer, given the AI normal-prediction) were considered.
Four regression methods, all with a quasibinomial error distribution, were applied: regression without adjustment, with inverse propensity score weighting (IPW), with IPW after trimming of extreme propensity scores (PS), and with overlap PS weighting. PS depended on the minimal adjustment set identified in a causal graph analysis (the radiologists assessing the examination, and the normal-prediction). For each scenario, 1000 data sets were simulated. Risk ratios (RR) and power to detect non-inferiority were calculated.
Results: The regression with overlap PS weighting was on average able to estimate the RR without relevant bias and had a minimal power of 79·7% (and power of 94·6% for a reading behaviour as observed in preliminary usage data) to reject inferiority of AI for all considered scenarios, independent from the direction and extent of the simulated self-selection bias. The IPW trimming approach was also nearly unbiased but had wider confidence intervals and thus lower power. Both other approaches were clearly biased.
Discussion: IPW is a standard method to deal with bias in intervention allocation in cohort studies. If overlap in the baseline variables of the intervention groups is poor, PS weights can be extreme and result in biased estimates. An ad-hoc solution is the trimming of extreme PS. A more recent development are overlap weights, which target the population with the most overlap on observed covariates.
However, it has to be noted that the approaches target different estimands: the unadjusted and the IPW regression target the average treatment effect in the entire population (ATE), the trimmed IPW regression in the trimmed population and the overlap weighted regression in the overlap population (ATO).
Conclusion: The overlap weighting approach had the smallest bias and the largest power and was chosen for the main analysis.
Competing interests: The study was funded by Vara. SB, TM, and CL are current employees of Vara with stock options as part of the standard compensation package. AK received a payment from Vara for general consulting and speaker fees.
The authors declare that a positive ethics committee vote has been obtained.