gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

Development of a survival prediction model – a case study

Meeting Abstract

  • Samuel Kilian - Institute of Medical Biometry, University of Heidelberg, Heidelberg, Germany
  • Kathrin Burgmaier - Department of Pediatrics, Faculty of Medicine, University Hospital Cologne and University of Cologne, Cologne, Germany
  • Max Liebau - Department of Pediatrics, Center for Family Health, Center for Rare Diseases, and Center for Molecular Medicine, University Hospital Cologne and Faculty of Medicine, University of Cologne, Cologne, Germany
  • Meinhard Kieser - Institute of Medical Biometry, Heidelberg University, Heidelberg, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 60

doi: 10.3205/23gmds072, urn:nbn:de:0183-23gmds0728

Published: September 15, 2023

© 2023 Kilian et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: When developing a prediction model, it is crucial to choose methods carefully. This is particularly important for survival endpoints, as the standard practice in regression analysis of predicting the mean of a distribution may not be suitable. To guide the process, the TRIPOD statement outlines a framework for specifying and reporting various aspects, including outcome, predictors, missing data, model specification, and validation [1].

Methods: We discuss advantages and disadvantages of different ways to handle each aspect and we present the choices we made for developing and validating a prediction model for kidney survival of patients with the Autosomal Recessive Polycystic Kidney Disease. Patient data was collected within an observational registry study [2]. The selection of the discussed methods is based on personal experience and unsystematic literature searches.

Results: In terms of predicting patient outcomes, several options exist, including a relative risk score, a complete survival distribution, or something in between. The choice between these options is dependent on the prediction objective. Additionally, choosing the right model is not straight forward, as the widely used Cox model is easy to apply in practice, but machine learning approaches, such as random survival forests [3], may provide more accurate predictions due to their adaptability. When a small set of predictors is desired, variable selection techniques are necessary. Here, the performance of different predictor sets has to be estimated in an unbiased way, e.g. by cross validation. Selecting an appropriate metric to evaluate model performance is also crucial and must align with the prediction objective. For example, concordance can be used as metric if the objective is to predict the order of events correctly [4]. Technical issues, such as pooling Kaplan-Meier curves, are essential to consider when handling missing values using multiple imputation [5]. When splitting a dataset into development and validation sets, it is important to ensure both groups are representative, which can be accomplished by stratified splitting. If strata are too small to split, we recommend splitting by minimization. Model validation should be prespecified and can occur within the development process via cross-validation or on a separate validation dataset. The latter option provides the possibility to make data driven choices during the development, without introducing bias in the validation process.

Discussion: Developing and validating a survival prediction model involves several factors that must be taken into account. To exemplify these elements, we provide a real clinical case study.

The authors declare that they have no competing interests.

The authors declare that a positive ethics committee vote has been obtained.


References

1.
Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015 Jan 6;162(1):55-63. DOI: 10.7326/M14-0697 External link
2.
Ebner K, Feldkoetter M, Ariceta G, Bergmann C, Buettner R, Doyon A, Duzova A, Goebel H, Haffner D, Hero B, Hoppe B, Illig T, Jankauskiene A, Klopp N, König J, Litwin M, Mekahli D, Ranchin B, Sander A, Testa S, Weber LT, Wicher D, Yuzbasioglu A, Zerres K, Dötsch J, Schaefer F, Liebau MC; ESCAPE Study Group; GPN Study Group. Rationale, design and objectives of ARegPKD, a European ARPKD registry study. BMC Nephrol. 2015 Feb 18;16:22. DOI: 10.1186/s12882-015-0002-z External link
3.
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. The annals of applied statistics. 2008;2(3):841-860. DOI: 10.1214/08-AOAS169 External link
4.
Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA. 1982 May 14;247(18):2543-6. DOI: 10.1001/jama.1982.03320430047030 External link
5.
Marshall A, Altman DG, Holder RL, Royston P. Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Med Res Methodol. 2009 Jul 28;9:57. DOI: 10.1186/1471-2288-9-57 External link