gms | German Medical Science

Towards reliable prediction of significant changes in microbial communities based on 16S time series data

Meeting Abstract

  • Ann-Kathrin Brüggemann - Institute for Artificial Intelligence in Medicine, University Hospital Essen, Department of Medicine, University of Duisburg-Essen
  • Sultan Imangaliyev - Institute for Artificial Intelligence in Medicine, University Hospital Essen, Department of Medicine, University of Duisburg-Essen
  • Jan Kehrmann - Institute for Medical Microbiology, University Hospital Essen, Department of Medicine, University of Duisburg-Essen
  • Folker Meyer - Institute for Artificial Intelligence in Medicine, University Hospital Essen, Department of Medicine, University of Duisburg-Essen
  • Ivana Kraiselburd - Institute for Artificial Intelligence in Medicine, University Hospital Essen, Department of Medicine, University of Duisburg-Essen

SMITH Science Day 2022. Aachen, 23.-23.11.2022. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocP32

doi: 10.3205/22smith43, urn:nbn:de:0183-22smith435

Veröffentlicht: 31. Januar 2023

© 2023 Brüggemann et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Early diagnosis is essential for the subsequent success of infectious diseases like sepsis. The ability to detect significant changes in bacterial communities in patients could be a step towards this early detection. Detecting significant changes vs. random fluctuations is key in microbiome analysis as microbiome composition changes frequently and rapidly over time. We employ a predictive approach to determine significant changes using 16S time series data for bacterial genera as our model input. Due to its low-cost nature, rapid turn-around, non-invasive methods for sampling and widespread use of 16S rDNA analytics, this approach has the potential to inform future clinical applications.

Methods: The prediction of time series data finds versatile use in many different research areas [1]. One of the most commonly used models is the autoregressive integrated moving average (ARIMA) model, a linear model based on the combination of autoregressive and moving average models [2]. Another option for time series forecasting is to use a Long Short-Term Memory (LSTM) model. These types of recurrent neural network (RNN) models were first used in language processing. They quickly became an alternative to ARIMA or other machine learning models for time series prediction [3].

Different types of LSTM architectures have been used for a wide range of scientific problems. For example, they are used to generate synthetic microbial communities that produce various metabolites [3], or to predict when certain antibiotic resistance genes emerge on a beach after a rainfall [4]. For the time series prediction task at hand, a model related to the ARIMA model, a vector autoregressive moving average (VARMA) model, was used as the baseline model. Unlike ARIMA, the VARMA model can handle multivariate time series, such as the abundance of different microbial genera per time point.

Since an LSTM model can be described as a black box model, it is preferable to use a model analysis tool that can output information about the significance of the features for the model at hand. In this case, Shapley Additive Explanations (SHAP) was used to calculate feature importance both for the whole model and for individual time steps. SHAP is based on a game theoretic approach and can be applied to different types of machine learning models [5], [6].

The data used for the prediction task was generated and published by Caporaso et al. [7]. They included 16S data from the gut microbiome of two individuals sampled over several months. The data was processed with a pipeline based on QIIME2 [8] to show the absolute abundance of the different microbial genera found.

The workflow used for generating the data is embedded in a continuous integration and continuous delivery (CI/CD) environment. This type of production environment applies an ongoing automation, testing and monitoring to improve the development process, resulting in reliable and functioning software, producing reproducible and credible data.

Results and discussion: The VARMA baseline model was compared with an LSTM model consisting of two hidden layers. When comparing the mean absolute error (MAE) of both models, the LSTM model performed better than the baseline model and was therefore further used for prediction. Currently, we are able to predict the overall range of abundance of bacterial genera in a patient sample over time with good results. This can be very useful as it allows us to determine whether the abundances of different microbial genera in a community are within a normal range or constitute a change from the average range. A critical change in bacterial abundances could be an indication of an impending problematic development in bacterial communities and could be the first indication of possible sepsis.

The SHAP output for the LSTM model gives an overview of the importance of the bacterial genera for the predictions made by the model.

Outlook: For the next steps, we plan to analyze several other LSTM architectures that have proven successful in other scientific issues to further improve the model performance. Some of these architectures would be models with a higher number of hidden layers as well as encoder-decoder models with or without an attention layer [3], [4]. The integration of static metadata is another goal for the future, and we will investigate whether it can improve the performance of the model. We also strive to establish and deepen cooperations that allow us to work on a larger number of datasets. Our overall aim is to optimize early therapeutic approaches and provide a treatment advantage to physicians and patients.


Literatur

1.
Hirata T, Kuremoto T, Obayashi M, Mabu S, Kobayashi K. Time Series Prediction Using DBN and ARIMA. International Conference on Computer Application Technologies (CCATS); 2015 Aug 31-Sep 2; Matsue, Japan. New York, NY, USA: Institute of Electrical and Electronics Engineers (IEEE); 2015. p. 24-9. DOI: 10.1109/CCATS.2015.15 Externer Link
2.
Siami-Namini S, Tavakoli N, Siami Namin A. A comparison of ARIMA and LSTM in forecasting time series. 17th IEEE International Conference on Machine Learning and Applications (ICMLA); 2018 Dec 17-18; Orlando, Florida, USA. New York, NY, USA: Institute of Electrical and Electronics Engineers (IEEE); 2018. p. 1394-1401. DOI: 10.1109/ICMLA.2018.00227 Externer Link
3.
Baranwal M, Clark RL, Thompson J, Sun Z, Hero AO, Venturelli OS. Recurrent neural networks enable design of multifunctional synthetic human gut microbiome dynamics. Elife. 2022 Jun 23;11:e73870. DOI: 10.7554/eLife.73870 Externer Link
4.
Jang J, Abbas A, Kim M, Shin J, Kim YM, Cho KH. Prediction of antibiotic-resistance genes occurrence at a recreational beach with deep learning models. Water Res. 2021 May 15;196:117001. DOI: 10.1016/j.watres.2021.117001 Externer Link
5.
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Part of NIPS 2017. In: Guyon I, von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett, R, eds. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 Dec 4-9; Long Beach, CA, USA. Red Hook, NY, USA: Curran Associates Inc.; 2017. p. 4768–4777.
6.
Lundberg SM, Nair B, Vavilala MS, Horibe M, Eisses MJ, Adams T, Liston DE, Low DK, Newman SF, Kim J, Lee SI. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018 Oct;2(10):749-60. DOI: 10.1038/s41551-018-0304-0 Externer Link
7.
Caporaso JG, Lauber CL, Costello EK, Berg-Lyons D, Gonzalez A, Stombaugh J, Knights D, Gajer P, Ravel J, Fierer N, Gordon JI, Knight R. Moving pictures of the human microbiome. Genome Biol. 2011;12(5):R50. DOI: 10.1186/gb-2011-12-5-r50 Externer Link
8.
Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS 2nd, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, Caporaso JG. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol. 2019 Aug;37(8):852-7. DOI: 10.1038/s41587-019-0209-9 Externer Link