gms | German Medical Science

64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

08. - 11.09.2019, Dortmund

Uncovering multivariable patterns in single cell RNA-Seq data using deep Boltzmann machines and log-linear models

Meeting Abstract

Suche in Medline nach

  • Moritz Hess - Universitätsklinikum Freiburg, Medizinische Fakultät, Albert-Ludwigs-Universität Freiburg, Deutschland, Freiburg, Germany
  • Stefan Lenz - Universitätsklinikum Freiburg, Medizinische Fakultät, Albert-Ludwigs-Universität Freiburg, Deutschland, Freiburg, Germany
  • Harald Binder - Universitätsklinikum Freiburg, Medizinische Fakultät, Albert-Ludwigs-Universität Freiburg, Deutschland, Freiburg, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Dortmund, 08.-11.09.2019. Düsseldorf: German Medical Science GMS Publishing House; 2019. DocAbstr. 316

doi: 10.3205/19gmds064, urn:nbn:de:0183-19gmds0647

Veröffentlicht: 6. September 2019

© 2019 Hess et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

The high dimensional gene expression profiles of single cells, inferred by deep RNA sequencing (RNA-Seq), should theoretically allow to precisely characterize the function and state of cells and finally to model intercellular interactions, which is crucial for better understanding pathogenesis of diseases. Deep feed forward architectures such as convolutional neural networks can learn complex dependencies but are by design confined to learn structure with respect to a specific task. In contrast Deep generative models such as Deep Boltzmann Machines (DBM) learn the joint distribution of the training data. A trained DBM can thus be regarded as a model of the complex co-expression pattern that is linked with the physiological and biochemical processes occurring in a cell.

To infer the relationship between the learned co-expression pattern and observed phenotypes such as the specific cell type, we need to identify the genes which are strongly related to a cellular differentiation, indicated by distinct multivariable patterns observed in the expression data. Here we propose an approach to extract these patterns from trained DBMs. In detail, we sample the states of visible and latent variables from a trained DBM. Using these samples, we infer the visible variables, i.e. genes, whose states are jointly associated with the states of latent variables of the last hidden layer of the network. To that end we employ log-linear models which we fit in a step wise procedure to model the contingency table of a set of visible variables (the pattern) and one latent variable. In each step, a visible variable is added to the model, if the interaction effect of the given visible variable and the latent variable improves the model fit.

We demonstrate the potential of the method based on gene expression data from different neuron types in the mouse brain. Using DBMs we learn the joint distribution of gene expression in different cell types. We then sample from the DBM and extract multivariable gene patterns from the samples using the log-linear modeling approach. Based on the extracted patterns we assign samples to training examples. In this application, the extracted patterns enable us to assign the samples to similar training examples, as identified by Euclidean distance. This indicates that we are able to extract the genes which make up the essential patterns in the data which discriminate different sub-populations of cells.

These findings render the proposed approach very valuable in detecting patterns in gene expression data, underlying cellular differentiation.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.