gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Calibrating expert knowledge derived models with patient data using low dimensional representations for synthetic patient trajectories

Meeting Abstract

  • Hanning Yang - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Freiburg im Breisgau, Germany
  • Meropi Karakioulaki - Department of Dermatology, Medical Faculty and Medical Center, University of Freiburg, Freiburg im Breisgau, Germany
  • Cristina Has - Department of Dermatology, Medical Faculty and Medical Center, University of Freiburg, Freiburg im Breisgau, Germany
  • Moritz Hess - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Freiburg im Breisgau, Germany
  • Harald Binder - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Freiburg im Breisgau, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 800

doi: 10.3205/24gmds022, urn:nbn:de:0183-24gmds0221

Veröffentlicht: 6. September 2024

© 2024 Yang et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: In longitudinal clinical studies, such as those focusing on the progression of rare diseases like Epidermolysis Bullosa (EB), various interrelated measures need to be considered, such as blisters, wounds, inflammation, anaemia, and physical growth [1]. However, modelling patient trajectories is often hampered by limited observations and complex missing data patterns in individual patient data (IPD) as data is usually gathered in the course of clinical routine care. Therefore, representing expert knowledge from the scientific literature via quantitative models can help augment IPD with synthetic data. Nonetheless, this process still lacks calibration with real data. We here construct ordinary differential equations (ODEs) to represent knowledge of the natural course of EB, specifically simulating the evolution of biomarkers over time. To address the complexity and noise in real-world clinical data, we leverage deep neural networks for ODE calibration, which has shown promising results in other fields [2].

Methods: Drawing upon expert knowledge from a comprehensive literature search, we developed a system of five ordinary differential equations (ODEs) to accurately capture the dynamics of key biomarkers like C-Reactive Protein (a marker of inflammation), hemoglobin (indicative of anemia), and other variables relevant to EB progression. Initial conditions are sampled from Gaussian distributions, with the means of these distributions also being calibrated. To address the complexity to calibrate the ODE system with real data, an autoencoder, i.e., a neural network-based approach, is introduced for dimension reduction. A dataset of 100 patients observed across 5 time points, resulting in up to 500 observations (Reimer et al. 2020), is used for training. For calibrating the distributions for the initial conditions, we employ a centering approach in the latent space. We also simulate realistic data scenarios by introducing noise, missing data patterns, including complete missing variables and noise variables. The problem of missing variables is addressed by a separately trained imputation layer outside of the autoencoder. As a performance metric, we employ the mean squared error (MSE) to evaluate the accuracy of an imputation task. In addition, we investigate the Euclidean distance between true and calibrated ODE parameters in a simulation study.

Results: Our approach enables the calibration of our extensive ODE system in a low-dimensional latent space. Even when introducing a large amount of noise and complex missing data patterns, the results remain reliable, which is justified by the stability of the gradients. Furthermore, our centering approach maximizes the utilization of available information.

Discussion and conclusion: We demonstrate how to tackle the calibration challenge of a complex expert-informed synthetic data model by investigating the similarity of real and generated observations in a low-dimensional latent space learned by a neural network. Combining ODEs with an autoencoder offers benefits such as effective dimension reduction and flexibility in handling various data complexities. This enables naturalistic synthetic IPD while having control over the generative process, e.g., for potentially synthesizing realistic control groups. This approach is applicable to observational clinical data with limited sample size, where rare diseases like EB are a typical application scenario.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Reimer A, Hess M, Schwieger-Briel A, Kiritsi D, Schauer F, Schumann H, Bruckner-Tuderman L, Has C. Natural history of growth and anaemia in children with epidermolysis bullosa: a retrospective cohort study. Br J Dermatol. 2020;182(6):1437-1448. DOI: 10.1111/bjd.18475 Externer Link
2.
Grassi T, Nauman F, Ramsey JP, Bovino S, Picogna G, Ercolano B. Reducing the complexity of chemical networks via interpretable autoencoders. Astron Astrophys. 2022;668. DOI: 10.1051/0004-6361/202039956 Externer Link