Artikel
A representation of longitudinal data amenable for modeling with transformers
Suche in Medline nach
Autoren
Veröffentlicht: | 15. September 2023 |
---|
Gliederung
Text
Introduction: Longitudinal cohort data play a crucial role in biomedical research, as they offer valuable insights into the dynamics of patient health. Therefore, novel modeling techniques that have been developed in the context of deep learning for sequence data could be attractive, in particular transformer architectures, which are based on attention mechanisms. Yet, the latter have particularly been developed for natural language processing, i.e., other data types might need to be adapted for obtaining promising performance.
Methods: We propose to convert longitudinal datasets into a discrete event format, thus making them compatible with the intrinsic characteristics of language, for which transformers have been developed. Specifically, we consider patterns observed in the variables of a dataset as discrete events that occur in sequences. We then treat events that occur in a specific variable or group of variables as a word, and subsequently keep an event in our vocabulary set if it occurs in a certain number of sequences,. To obtain a word embedding, which is required by the transformer architecture, we consider transformations of the original variables space, obtained by dimension reduction. Specifically, we train an autoencoder to obtain an embedding that reflects the correlation between different events. The corresponding encoder can encode the event vectors to an embedding and the decoder can transform model predictions back to events.
Results: For illustration, we consider data from a psychological resilience application, specifically a longitudinal dataset on self-reported stressors that a participant experienced in the last week at follow-up time points. In the conversion process, we define events as the occurrence of two or more consecutive stressors and employe heuristic algorithms to group simultaneous events into new events. To evaluate our method, we extracte the output of our customized embedding for each event and computed the pairwise similarity between all events. Additionally, we calculate attention scores for participants. The embedding (event level) and attention part (sequence level) shows meaningful values compatible with subject matter knowledge and intuition on daily life stressors.
Discussion: Using the suggested adaptation of datasets, transformers could be used in our application that corresponds to typical longitudinal data from biomedicine. This may subsequently more generally open up new opportunities for uncovering complex temporal patterns in longitudinal data.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.