gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

A representation of longitudinal data amenable for modeling with transformers

Meeting Abstract

Suche in Medline nach

  • Kiana Farhadyar - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Freiburg im Breisgau, Germany
  • Harald Binder - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Freiburg im Breisgau, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 230

doi: 10.3205/23gmds048, urn:nbn:de:0183-23gmds0483

Veröffentlicht: 15. September 2023

© 2023 Farhadyar et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Longitudinal cohort data play a crucial role in biomedical research, as they offer valuable insights into the dynamics of patient health. Therefore, novel modeling techniques that have been developed in the context of deep learning for sequence data could be attractive, in particular transformer architectures, which are based on attention mechanisms. Yet, the latter have particularly been developed for natural language processing, i.e., other data types might need to be adapted for obtaining promising performance.

Methods: We propose to convert longitudinal datasets into a discrete event format, thus making them compatible with the intrinsic characteristics of language, for which transformers have been developed. Specifically, we consider patterns observed in the variables of a dataset as discrete events that occur in sequences. We then treat events that occur in a specific variable or group of variables as a word, and subsequently keep an event in our vocabulary set if it occurs in a certain number of sequences,. To obtain a word embedding, which is required by the transformer architecture, we consider transformations of the original variables space, obtained by dimension reduction. Specifically, we train an autoencoder to obtain an embedding that reflects the correlation between different events. The corresponding encoder can encode the event vectors to an embedding and the decoder can transform model predictions back to events.

Results: For illustration, we consider data from a psychological resilience application, specifically a longitudinal dataset on self-reported stressors that a participant experienced in the last week at follow-up time points. In the conversion process, we define events as the occurrence of two or more consecutive stressors and employe heuristic algorithms to group simultaneous events into new events. To evaluate our method, we extracte the output of our customized embedding for each event and computed the pairwise similarity between all events. Additionally, we calculate attention scores for participants. The embedding (event level) and attention part (sequence level) shows meaningful values compatible with subject matter knowledge and intuition on daily life stressors.

Discussion: Using the suggested adaptation of datasets, transformers could be used in our application that corresponds to typical longitudinal data from biomedicine. This may subsequently more generally open up new opportunities for uncovering complex temporal patterns in longitudinal data.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.