gms | German Medical Science

62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

17.09. - 21.09.2017, Oldenburg

Investigation of the added benefit of new markers for the prediction of adverse kidney events in patients after cardiac surgery using different measures in raw and cross-validated versions

Meeting Abstract

Suche in Medline nach

  • Siegfried Kropf - Otto-von-Guericke-Universität Magdeburg, Institut für Biometrie und Medizinische Informatik, Magdeburg, Deutschland
  • Christian Albert - Otto-von-Guericke-Universität Magdeburg, Universitätsklinik für Nieren- und Hochdruckkrankheiten, Magdeburg, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Oldenburg, 17.-21.09.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. DocAbstr. 155

doi: 10.3205/17gmds083, urn:nbn:de:0183-17gmds0836

Veröffentlicht: 29. August 2017

© 2017 Kropf et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Several proposals have been given to measure the additional benefit of new prognostic models with respect to an established model. Usually, multivariate prediction is based on logistic regression models and the new model includes additional covariables compared to the reference model.

A common measure for the improvement in the prediction is the increase in the area under the ROC curve (AUC) of the ROC curve. In order to give more detailed information about the interplay of possibly opposite changes in sensitivity and specificity, Pencina et al. [1], [2] and Pickering and Endre [3] introduced the so-called net reclassification improvement (NRI) (based on a given set of risk categories for the predicted probability of the target event), a category-free version (cfNRI) and the integrated discrimination improvement (IDI). All three new metrics comprise separate components accounting for the events and for the non-events, respectively, that are summarized as a composite metric. However, there are also limitations with the use, for example cfNRI in contrast to IDI does not account for the magnitude of change in predicted risk. Yet, changes in the metrics may be largely driven by either event- or non-event subgroups and thus are interdependent on their incidence within the study cohort.

Since adverse clinical events such as acute kidney injury (AKI), renal replacement therapy (RRT) or composite endpoints (major adverse kidney events, MAKE) are more or less rare, large sample sizes are necessary to derive reliable estimates of the parameters of the regression model and particularly for the estimation of classification errors. If the model dimension increases, an apparent improvement of accuracy may be due to the issue of overfit. For that reason, we completed the raw estimates with cross-validated ones.

The comparison of the reference model with the extended model for different endpoints, in different subsamples and with regard to different measures for the improvement of prediction accuracy showed that it is not easy to attain a relevant improvement for prediction models that are already based on profound reference models even if the new markers included address mechanisms which are not reflected by the established markers.

One can see that the different metrics (AUC, NRI, cfNRI, IDI) are also differently sensitive for small improvements, AUC being one of the least sensitive measures. It remains to discuss if the other metrics have the same relevance. How large a change in each metric needs to be for clinical relevance must be addressed with regard to the study’s’ objective (i.e. screening or confirmation endpoint). Moreover, it is subject to further investigation if the composite metrics are ideal candidates for interpretation synopsis for model improvement because the proportion of events (and non-events, respectively) will often be far from 50% as used as implicit weight in the composite metrics. A discussion of this issue is also given in Pencina et al. [4].

As expected, the cross-validated measures have been more conservative in the detection of model improvements indicating that the issue of overfit cannot be neglected here.

Die Autoren geben an, dass kein Interessenkonflikt besteht.

Die Autoren geben an, dass ein positives Ethikvotum vorliegt.


Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in Medicine. 2008;27:157-172.
Pencina MJ, D’Agostino RB Sr, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Statist Med. 2011;30:11-21.
Pickering JW, Endre ZH. New Metrics for Assessing Diagnostic Potential of Candidate Biomarkers. Clin J Am Soc Nephrol. 2012;7:1355–1364.
Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS. Comments on 'Integrated discrimination and net reclassification improvements — Practical advice'. Stat Med. 2008;27:207-12.