gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

A machine learning and feature selection pipeline for sepsis prediction on intensive care unit admission

Meeting Abstract

  • Alexandra Albu - Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany; Deutsches Krebsforschungszentrum (DKFZ) und Nationales Centrum für Tumorerkrankungen (NCT) Heidelberg, Heidelberg, Germany
  • Franziska Holke - Klinik für Anästhesiologie und Operative Intensivmedizin Medizinische Fakultät Mannheim Ruprecht-Karls-Universität Heidelberg, Mannheim, Germany
  • Bianka Hahn - Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
  • Han Cao - Department of Theoretical Neuroscience, Central Institute of Mental Health, Medical Faculty, Heidelberg University, Mannheim, Germany; Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
  • Lars Feuerbach - Deutsches Krebsforschungszentrum (DKFZ) und Nationales Centrum für Tumorerkrankungen (NCT) Heidelberg, Heidelberg, Germany
  • Holger A. Lindner - Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
  • Verena Schneider-Lindner - Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 732

doi: 10.3205/24gmds136, urn:nbn:de:0183-24gmds1366

Published: September 6, 2024

© 2024 Albu et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Sepsis is a significant complication and leading cause of death in Intensive Care Units (ICUs). Advanced predictive approaches are sought to improve patient outcomes.

This study investigates machine learning (ML) algorithms’ capabilities to predict sepsis using single time-point on-admission surgical ICU patient data. Additionally, it assesses the potential to reduce the number of features needed to deliver accurate predictions.

Methods: We analyzed the electronic medical records of 928 patients admitted to University Medical Center Mannheim’s surgical ICU, 2016-2022. We included 52 features from comprehensive routine clinical monitoring and documentation comprising missing-free vital signs, lab results, clinical scores, and SIRS-descriptors [1]. We used sepsis labels from the Ground Truth for Sepsis Questionnaire [2] filled in daily by expert intensivists to diagnose sepsis as outcome.

We explored the potential of five established supervised ML-algorithms previously showing good classification performance, explainability and interpretability: Random Forest, Support Vector Machine, Extreme Gradient Boosting, Ridge Regression, and Logistic Regression to predict sepsis.

We separated 10% of the data for final validation, and then performed a 10-fold cross-validation for feature selection. These steps yielded two feature sets: the top 10 most selected features across all algorithms and the optimal feature set selected by a single strategy determined by the highest AUPRC among the folds.

To evaluate our resulting two sets of features, we retrained the ML algorithm with the highest average AUPRC from the full training dataset, and tested it on the previously separated 10% hold-out. We measured the performance of both models with AUROC, AUPRC, confusion matrix, and true positive rate (TPR) with respect to time of sepsis onset.

Results: The 10 most selected features included laboratory tests, dimensions of the Sequential Organ Failure Assessment (SOFA) score, and a SIRS-descriptor. The second set of selected features contained laboratory tests, one SOFA score dimension, and two SIRS-descriptors. On the hold-out, the full feature set predicted sepsis with an AUROC 0.7039 +/- 0.020 standard deviation. The top 10 most selected features yielded a greater AUROC 0.7308 +/- 0.018, and the second selected set 0.7493 +/- 0.015. The mean AUPRC was similar for all three feature sets. The highest TPR was 74%, while for patients developing sepsis within the first 4 days, TPR was higher at 78%.

Discussion: Classification performance of the five ML algorithms was similar on average; therefore, we cannot suggest a single best model for sepsis prediction. We expect the performance of our selected feature sets to differ in external validation. Thus, they rather serve as starting point for further model development. Yet, we show that our feature selection strategies improve correct sepsis prediction and reduce the required measurements for classification. Our models perform better for short-term predictions, suggesting the need for repeated parameter measurements and reclassification during ICU care to improve the recognition of a dynamically changing sepsis risk.

Conclusion: Predicting sepsis using on-admission ICU data remains an important challenge, yet using a data-driven subset of features has the prospect of earlier intervention or closer monitoring of patients with a high risk of sepsis.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Lindner HA, Balaban U, Sturm T, Weiss C, Thiel M, Schneider-Lindner V. An algorithm for systemic inflammatory response syndrome criteria-based prediction of sepsis in a polytrauma cohort. Crit Care Med. 2016;44(12):2199-207.
2.
Lindner HA, Schamoni S, Kirschning T, Worm C, Hahn B, Centner FS, et al. Ground truth labels challenge the validity of sepsis consensus definitions in critical illness. J Transl Med. 2022;20(1):27.