gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Construction and assessment of prognostic rules in the presence of missing predictor data using multiple imputation: methodology and evaluation on two datasets and simulations

Meeting Abstract

Search Medline for

  • Bart Mertens - Leiden University Medical Centre, Leiden, Netherlands
  • Liesbeth de Wreede - Leiden University Medical Centre, Leiden, Netherlands

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 459

doi: 10.3205/20gmds334, urn:nbn:de:0183-20gmds3340

Published: February 26, 2021

© 2021 Mertens et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Background: We investigate calibration and assessment of prognostic rules when missing values are present in the predictors. This research has two key objectives. The first is to investigate how the calibration of any prediction rule can be combined with use of multiple imputation to account for missing predictor observations. The second objective is to propose such methods that can be implemented with current multiple imputation software in a pragmatic manner, while allowing for unbiased predictive assessment through validation on new observations for which outcome is not yet available.

Methods: We commence with a review of the methodological foundations of multiple imputation as a model estimation approach as opposed to a purely algorithmic description. We specifically contrast application of multiple imputation for parameter (effect) estimation with predictive calibration. Based on this review, two approaches are formulated, of which the second utilizes application of the classical Rubin's rules for parameter estimation, while the first approach averages probabilities from models fitted on single imputations to directly approximate the predictive density for future observations. We present pragmatic implementations using current software which allow for validation and estimation of performance measures by cross-validation, as well as imputation of missing data in predictors on future data where outcome is missing by definition. We will discuss application for both censored (survival) outcomes using the Cox model as well as binary outcome logistic regression prediction. Method performance is verified through application on two real datasets and simulation. Accuracy (Brier scores) and variance of predicted probabilities are investigated.

Results: Results show substantial reductions in variation of calibrated probabilities when using the first approach relative to use of Rubin's rules. Furthermore, as compared to the prediction-averaging approach, variance levels from Rubin's rules pooled models do not reduce to zero when numbers of imputations are increased. Irrespective of approach, numbers of imputations must be substantially increased from current practice for variation to reduce to acceptable levels for clinical application, with numbers between 100 and 1000 more realistic for reliable predictive calibration.

Conclusions: Prediction-averaging implementations should be preferred as compared to classical application of Rubin's rules in the calibration of prognostic (logistic or Cox) models for prediction application when multiple imputataions are used to account for missing predictor data. Single-imputation based implementations are not suitable for clinical application, whether based on prediction-averaging or regular Rubin's rules-based model calibrations.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Mertens BJA, Banzato E, de Wreede LC. Construction and assessment of prediction rules for binary outcome in the presence of missing predictor data using multiple imputation and cross-validation: Methodological approach and data-based evaluation. Biom J. 2020 May;62(3):724-741. DOI: 10.1002/bimj.201800289 External link