gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

A Comparison of Deep Neural Network Approaches for Generating Synthetic Longitudinal Data

Meeting Abstract

Search Medline for

  • Kiana Farhadyar - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg im Breisgau, Germany
  • Maren Hackenberg - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg im Breisgau, Germany
  • Harald Binder - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg im Breisgau, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 191

doi: 10.3205/21gmds123, urn:nbn:de:0183-21gmds1238

Published: September 24, 2021

© 2021 Farhadyar et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Longitudinal datasets are prevalent in clinical and biomedical areas. Because of privacy concerns, high-quality synthetic data generation from these datasets is of great importance. Artificially building a cross-sectional structure from longitudinal data causes loss of important information about the changing pattern between two time-points [1]. Therefore, researchers need specific synthetic data generation methods for longitudinal data. We consider generative deep learning approaches for this task. A major challenge in clinical longitudinal datasets is the varying number of visits and the irregular time intervals between them, due to different needs of patients. Different deep generative approaches will correspondingly have different characteristics with a varying number of time points, sample size, number of variables, and general complexity of the data structure. In this work, we therefore compare different approaches with simulated and real data. As a starting point, we investigate a standard variational autoencoder (VAE). In this approach, we consider measurements from different visits and the time interval between visits simultaneously as input variables. As a more complex approach, we investigate a combination of ordinary differential equations with VAEs for dynamic modeling [2], which can infer individual-specific trajectories. As a third approach, we consider a combination of a transformer approach [3] with point processes. Specifically, we employ point processes for event modeling to handle the irregular time points in the longitudinal data (e.g. [4], [5]). We find that more complex and thus flexible methods do not necessarily perform better when it comes to simple structures. However, the performance of the rather naive simultaneous VAE considerably degrades as the data structure becomes more complex. Furthermore, when the number of visits changes from patient to patient, the standard VAE needs more parameters and a more complex architecture, which also is seen to decrease performance. In contrast, we demonstrate how the more complex approaches become feasible with increasing sample size. We thus provide more general guidance for deep generative modeling of longitudinal data for clinical settings in different scenarios.

The authors declare that a positive ethics committee vote has been obtained.


References

1.
Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018;15:20170387.
2.
Hackenberg M, Harms P, Schmidt T, Binder H. Deep dynamic modeling with just two time points: Can we still allow for individual trajectories? [Preprint]. ArXiv. 2020. arXiv:2012.00634.
3.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need [Preprint]. ArXiv. 2017. arXiv:1706.03762.
4.
Zuo S, Jiang H, Li Z, Zhao T, Zha H. Transformer hawkes process. In: International Conference on Machine Learning; 2017 Aug 6-11; Sydney, Australia. (PMLR Proceedings of Machine Learning Research; 70). p. 11692–11702.
5.
Chen RTQ, Amos B, Nickel M. Neural Spatio-Temporal Point Processes [Preprint]. ArXiv. 2020. arXiv:2011.04583.
6.
Kingma DP, Welling M. Auto-encoding variational bayes [Preprint]. ArXiv. 2013. arXiv:1312.6114.