gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Sustainable Data Management for Single-Cell Sequencing: Tools, Platforms, and Challenges

Meeting Abstract

  • Nils Rosenboom - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany
  • Robert Kossen - Universitätsmedizin Göttingen, Göttingen, Germany
  • Ulrich Sax - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany; University of Göttingen, Campus-Institute Data Science (CIDAS), Göttingen, Germany
  • Sara Yasemin Nussbeck - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany; University Medical Center Göttingen, Central Biobank UMG, Göttingen, Germany
  • Harald Kusch - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Germany; University of Göttingen, Campus-Institute Data Science (CIDAS), Göttingen, Germany; University of Göttingen, Cluster of Excellence “Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells” (MBExC), Göttingen, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 618

doi: 10.3205/24gmds160, urn:nbn:de:0183-24gmds1602

Published: September 6, 2024

© 2024 Rosenboom et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Managing and integrating single-cell sequencing (SCS) data sustainably poses significant challenges for today's institutions and researchers. While a wide variety of sequencing data handling pipelines and tools is available, this exploration focuses on the research data management (RDM) tools and platforms available to handle and analyze such data in a FAIR manner. The challenges encompass the formidable size of the data, storage and computational complexity, and the need for comprehensive metadata, especially in respect of spatial transcriptomics and the generally complex preprocessing. Different approaches and how they match the requirements are evaluated and discussed, with special regards to scalability to ensure effective handling of SCS data.

Prior to this work, multiple platforms have been examined. However, with respect to the FAIR handling of SCS data, none completely covered the collected requirements. Currently, data handling pipelines are mostly manual, making appropriate documentation tedious and requiring additional effort. Comprehensive solutions that integrate the documentation into the workflow seem to be non-existent.

Methods: In the biomedical context of cardiovascular basic science [1], we examined locally available RDM tools with existing user communities, including FAIRDOM-SEEK [2] (SEEK, https://seek4science.org/), PanHunter (https://panhunter.com/), Heart Cell Atlas [3] (https://www.heartcellatlas.org/), and DHART [4] (https://dhart.dieterichlab.org/) for Omics data management, analysis, and visualization. To compare these tools, a defined test set of SCS data was integrated into each system, and the resulting pipelines and capabilities were thoroughly analyzed and evaluated.

Results: The selected RDM tools share a variety of features and aims. However, their focus varies and consequently several distinct functionalities are available. SEEK is primarily addressing FAIR RDM tasks such as organizing and sharing metadata and data of complex experimental approaches. PanHunter emphasizes data handling and integrated Multi-Omics data analysis. The Heart Cell Atlas focuses on visualization of analysis results by providing comprehensive insights into the cellular composition and molecular characteristics of the heart. DHART specializes in Omics visualization tailored for cardiovascular research. Also, from a technical perspective the selected tools implement different strategies such as standalone systems, all-inclusive platforms, and API-connected solutions.

The data integration tests reveal valuable insights into the efficacy of the examined tools. Our findings highlight the strengths and limitations of SEEK, Panhunter, Heart Cell Atlas, and DHART in handling SCS data. Further, the tests illuminate specific challenges, such as (meta-)data complexity and size.

Discussion: Building upon our results, it becomes evident that each tool applies slightly different concepts, offering unique advantages. While SEEK has shown strengths in FAIR data management but limitations regarding large file support and integration of analytical pipeline automation, Heart Cell Atlas, DHART, and PanHunter demonstrate their strengths in addressing specific requirements for SCS data handling. In summary, all analyzed tools provide useful overlapping and complementary functionality to address RDM requirements regarding SCS data. However, to sufficiently support the highly dynamic demands in the expanding field of spatial OMICS technologies, further developments are urgently needed.

Acknowledgements: Funded by the DFG through the CRC1002 infrastructure (INF) project, the Z projects of CRC1190, CRC1565 and Germany’s Excellence Strategy - EXC 2067/1- 390729940.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Kusch H, Kossen R, Suhr M, Freckmann L, Weber L, Henke C, et al. Management of Metadata Types in Basic Cardiological Research. Stud Health Technol Inform. 2021;283:59–68. DOI: 10.3233/SHTI210542 External link
2.
Wolstencroft K, Owen S, Krebs O, Nguyen Q, Stanford NJ, Golebiewski M, et al. SEEK: A Systems Biology Data and Model Management Platform. BMC Systems Biol. 2015;9(1):33. DOI: 10.1186/s12918-015-0174-y External link
3.
Kanemaru K, Cranley J, Muraro D, Miranda AMA, Ho SY, Wilbrey-Clark A, et al. Spatially resolved multiomics of human cardiac niches. Nature. 2023;619(7971):801-810. DOI: 10.1038/s41586-023-06311-1 External link
4.
Orvis J, Gottfried B, Kancherla J, Adkins RS, Song Y, Dror AA, et al. gEAR: Gene Expression Analysis Resource portal for community-driven, multi-omic data exploration. Nat Methods. 2021;18:843–844. DOI: 10.1038/s41592-021-01200-9 External link