gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

Dimension reduction for temporal patterns in time-series single-cell RNA-sequencing data

Meeting Abstract

Suche in Medline nach

  • Maren Hackenberg - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Freiburg im Breisgau, Germany
  • Laia Canal Guitart - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Freiburg im Breisgau, Germany
  • Harald Binder - Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Freiburg im Breisgau, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 328

doi: 10.3205/23gmds071, urn:nbn:de:0183-23gmds0717

Veröffentlicht: 15. September 2023

© 2023 Hackenberg et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Generating single-cell RNA-sequencing (scRNA-seq) data at several time points, e.g., during a developmental process, promises insights into mechanisms controlling cellular differentiation at the level of individual cells. As there is no one-to-one correspondence between cells at different timepoints, a first step in a typical analysis workflow is to reduce dimensionality to visually inspect temporal patterns. Here, one implicitly assumes that the resulting low-dimensional manifold captures the central gene expression dynamics of interest. Yet, commonly used techniques are not specifically designed to do so and their representations do not necessarily coincide with the one that best reflects the actual underlying dynamics.

We thus investigate how visual representations of different temporal patterns in time-series scRNA-seq data depend on the choice of dimension reduction, considering principal component analysis (PCA), t-distributed stochastic neighbourhood embedding (t-SNE), uniform manifold approximation and projection (UMAP) and single-cell variational inference (scVI), a popular deep learning-based approach. We specifically focus on comparing visual representations of each dimension reduction technique as such visual inspection of temporal patterns is often a crucial first step that guides the choice of further downstream analyses.

To characterize the approaches in a controlled setting, we introduce an artificial time series in a real-world snapshot scRNA-seq dataset from one experimental time point by simulating an underlying low-dimensional developmental process and generating corresponding high-dimensional gene expression data. Specifically, we apply a specific dimension reduction approach (say, tSNE) to the snapshot data and transform the low-dimensional representation according to biologically meaningful temporal patterns, e.g., dividing cell clusters during a differentiation process. We train a deep learning model to generate synthetic high-dimensional gene expression profiles corresponding to the simulated pattern at each time point, and apply the different dimension reduction approaches on the high-dimensional time-series data to compare how well they reflect the underlying temporal pattern introduced in, e.g., t-SNE space. Subsequently, we vary the dimension reduction method used to introduce the temporal pattern, the temporal pattern itself, and the underlying snapshot dataset based on which the development is simulated, to generalize our findings.

We thus characterize the different perspective of each technique on a specific temporal pattern with respect to the dataset, the underlying representation in which the pattern was introduced and to the pattern itself. The results illustrate how the choice of the dimension reduction approach can dramatically alter, i.e, distort, temporal structure. We further demonstrate this effect on a real-world time-series scRNA-seq dataset. To alleviate such problems, we provide directions for designing dimension reduction techniques that explicitly respect temporal structure.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, Ginhoux F, Newell EW. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2018 Dec 3. DOI: 10.1038/nbt.4314 Externer Link
2.
Chari T, Pachter L. The specious art of single-cell genomics [Preprint]. bioRxiv. 2021. DOI: 10.1101/2021.08.25.457696 Externer Link
3.
Damrich S, Hamprecht FA. On UMAPs true loss function. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, editors. Advances in Neural Information Processing Systems 34 (NeurIPS 2021). 2021. p. 5798–5809.
4.
Ding J, Condon A, Shah SP. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nat Commun. 2018 May 21;9(1):2002. DOI: 10.1038/s41467-018-04368-5 Externer Link
5.
Feng C, Liu S, Zhang H, Guan R, Li D, Zhou F, Liang Y, Feng X. Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study. Int J Mol Sci. 2020 Mar 22;21(6):2181. DOI: 10.3390/ijms21062181 Externer Link