gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

Adversarial machine learning lifts the level of detail in synthetic medical time series

Meeting Abstract

Search Medline for

  • Sven Festag - Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital, Jena, Germany
  • Sebastian Uschmann - Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital, Jena, Germany
  • Cord Spreckelsen - Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital, Jena, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 137

doi: 10.3205/23gmds098, urn:nbn:de:0183-23gmds0984

Published: September 15, 2023

© 2023 Festag et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: In the domain of image synthesis, it has been shown that adversarial training enforces generative neural networks to focus on small details during the generation process that would otherwise be ignored [1].

The presented project investigates whether this characteristic is transferrable to the generation process of synthetic low-dimensional multivariate medical time series. Such a generation process can be used for imputation and forecasting tasks as well as for the production of fully synthetic training data. The latter might be used to augment small data sets or to circumvent patient privacy concerns [2].

Methods: In a recent study, we evaluated the ability of conditional generative neural networks (GANs) to learn the imputation and forecasting of multivariate medical time series [3]. The generator and the adversarial critic of the analysed GAN setup both consist of a recurrent encoder summarising the non-missing context of the series and a subsequent attention-driven recurrent decoder that either predicts the missing parts (generator) or estimates the difference between the real conditional data distribution and the generator's learned distribution.

The training and test set consisted of intervals from synchronously measured ECGs and blood pressure series of 275 or 65 different patients. For every 10s window, both channels contained a missing interval of 0.8s at random positions (imputation) or the end (forecasting).

In the aforementioned project, we only considered mean metrics summarising errors over all time steps. To check our current hypothesis, we conducted a visual inspection of clinically relevant features in the synthesised infills. The subseries produced by the generator after classical non-adversarial training only and after additional training guided by the critic were compared.

Results: The evaluation shows that the inclusion of the critic in the learning process initially leads to an increase in the MSE between the original data and the imputed/forecasted data points.?

In contrast, visual inspection of the results shows that the classical non-adversarial approach leads to a smoothed generated time series omitting short termed details. This becomes particularly evident when looking at the QRS complex of the ECG time series. The typical short-termed spike is omitted from the prediction in the classical method without the critic. Hence the qualitative inspection suggests that the addition of the critic improves the generated result concerning indistinguishability between real and synthetic data.

Discussion: The reason for the difference between the numerical and visually perceived error might be that the classical approach of minimising an average distance leads to a low total loss information even if high local losses exist in short intervals. When a critic is included to compute the loss, it can use these short details to distinguish between synthetic and actual data because there is a measurable difference in the distributions. Therefore, the generator must model these details.

To our knowledge, this property of GANs has not yet been investigated in the domain of biomedical time series imputation and forecasting.

Further research is needed to quantify the perceived differences in the synthesised data. The ranking of algorithms by averaging scores might be inappropriate.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Lotter W, Kreiman G, Cox D. Unsupervised Learning of Visual Structure using Predictive Generative Networks [Preprint]. ArXiv. 2016. arXiv:1511.06380v2. DOI: 10.48550/arXiv.1511.06380 External link
2.
Chen RJ, Lu MY, Chen TY, Williamson DFK, Mahmood F. Synthetic data in machine learning for medicine and healthcare. Nat Biomed Eng. 2021 Jun;5(6):493–7.
3.
Festag S, Spreckelsen C. Medical multivariate time series imputation and forecasting based on a recurrent conditional Wasserstein GAN and attention. J Biomed Inform. 2023 Mar;139:104320.