Article
Development of a survival prediction model – a case study
Search Medline for
Authors
| Published: | September 15, 2023 |
|---|
Outline
Text
Introduction: When developing a prediction model, it is crucial to choose methods carefully. This is particularly important for survival endpoints, as the standard practice in regression analysis of predicting the mean of a distribution may not be suitable. To guide the process, the TRIPOD statement outlines a framework for specifying and reporting various aspects, including outcome, predictors, missing data, model specification, and validation [1].
Methods: We discuss advantages and disadvantages of different ways to handle each aspect and we present the choices we made for developing and validating a prediction model for kidney survival of patients with the Autosomal Recessive Polycystic Kidney Disease. Patient data was collected within an observational registry study [2]. The selection of the discussed methods is based on personal experience and unsystematic literature searches.
Results: In terms of predicting patient outcomes, several options exist, including a relative risk score, a complete survival distribution, or something in between. The choice between these options is dependent on the prediction objective. Additionally, choosing the right model is not straight forward, as the widely used Cox model is easy to apply in practice, but machine learning approaches, such as random survival forests [3], may provide more accurate predictions due to their adaptability. When a small set of predictors is desired, variable selection techniques are necessary. Here, the performance of different predictor sets has to be estimated in an unbiased way, e.g. by cross validation. Selecting an appropriate metric to evaluate model performance is also crucial and must align with the prediction objective. For example, concordance can be used as metric if the objective is to predict the order of events correctly [4]. Technical issues, such as pooling Kaplan-Meier curves, are essential to consider when handling missing values using multiple imputation [5]. When splitting a dataset into development and validation sets, it is important to ensure both groups are representative, which can be accomplished by stratified splitting. If strata are too small to split, we recommend splitting by minimization. Model validation should be prespecified and can occur within the development process via cross-validation or on a separate validation dataset. The latter option provides the possibility to make data driven choices during the development, without introducing bias in the validation process.
Discussion: Developing and validating a survival prediction model involves several factors that must be taken into account. To exemplify these elements, we provide a real clinical case study.
The authors declare that they have no competing interests.
The authors declare that a positive ethics committee vote has been obtained.
References
- 1.
- Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015 Jan 6;162(1):55-63. DOI: 10.7326/M14-0697
- 2.
- Ebner K, Feldkoetter M, Ariceta G, Bergmann C, Buettner R, Doyon A, Duzova A, Goebel H, Haffner D, Hero B, Hoppe B, Illig T, Jankauskiene A, Klopp N, König J, Litwin M, Mekahli D, Ranchin B, Sander A, Testa S, Weber LT, Wicher D, Yuzbasioglu A, Zerres K, Dötsch J, Schaefer F, Liebau MC; ESCAPE Study Group; GPN Study Group. Rationale, design and objectives of ARegPKD, a European ARPKD registry study. BMC Nephrol. 2015 Feb 18;16:22. DOI: 10.1186/s12882-015-0002-z
- 3.
- Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. The annals of applied statistics. 2008;2(3):841-860. DOI: 10.1214/08-AOAS169
- 4.
- Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA. 1982 May 14;247(18):2543-6. DOI: 10.1001/jama.1982.03320430047030
- 5.
- Marshall A, Altman DG, Holder RL, Royston P. Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Med Res Methodol. 2009 Jul 28;9:57. DOI: 10.1186/1471-2288-9-57
