gms | German Medical Science

63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

02. - 06.09.2018, Osnabrück

From seamless acquisition and sustainable management to publication of next-generation sequencing data

Meeting Abstract

  • Nadine Umbach - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Deutschland; DZHK (German Center for Cardiovascular Research), partner site Göttingen, Göttingen, Deutschland
  • Luca Freckmann - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Deutschland
  • Cornelius Knopp - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Deutschland
  • Tim Meyer - DZHK (German Center for Cardiovascular Research), partner site Göttingen, Göttingen, Deutschland; University Medical Center Göttingen, Institute of Pharmacology and Toxicology, Göttingen, Deutschland
  • Markus Suhr - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Deutschland
  • Harald Kusch - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Deutschland; University Medical Center Göttingen, Department of Molecular Biology, Göttingen, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Osnabrück, 02.-06.09.2018. Düsseldorf: German Medical Science GMS Publishing House; 2018. DocAbstr. 272

doi: 10.3205/18gmds098, urn:nbn:de:0183-18gmds0988

Veröffentlicht: 27. August 2018

© 2018 Umbach et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: New methods of next-generation sequencing (NGS) have been introduced that allow for a previously unprecedented view into complex biological systems. Here, we describe our approach and experiences towards a seamless pipeline and high availability of well annotated, complete, and consistent NGS workflow data from experiment planning, sample preparation, and sequencing to publication.

State of the Art: More and more scientific journals ask for deposition of process and sequencing data in reliable, unrestricted, and publicly available databases [1], [2]. However, NGS methods are complex procedures that result in high-dimensional and high-volume data. Adequate annotation schemata and data curation tools are far from being easily available and applicable. At the same time, most of the stakeholders are only involved in individual sections along the processing pipeline. Thus, metadata are collected in various formats and levels of details that are often difficult to access and are analyzed with long delay. This leads to a series of significant restrictions in respect of the understanding, reproducibility, and publishing of the data and thus limits its reusability.

Concept: Based on interviews with collaborating experts from CRC 1002 and DZHK partner site Göttingen as well as on-site visits of sequencing facilities, we identified three biomedical use cases (DNA-seq, RNA-seq, single-cell-seq). We designed corresponding processes and carried out a requirement analysis. Conceptual structures of functional metadata annotations relating to samples and experimental configurations (e.g. MINSEQE [3], MIAME [4], ISO_TS_20428_2017), specifications of the International Nucleotide Sequence Database Collaboration [5], and publication instructions of journals were carefully analyzed, evaluated, and harmonized with regard to suitability and practical application.

Implementation: To ensure correctness, completeness, and consistency of data along the NGS pipelines, a comprehensive approach is conceptualized, implemented and currently evaluated by six research groups. For more than two years a growing community of now eight research groups at University Medical Center Göttingen is applying an electronic laboratory notebook software (ELN) to digitalize and standardize primary documentation of laboratory experiments. In addition, the SEEK platform [6] is used for efficient documentation and organized storage of project/experiment planning information, quality control reports, and meta-analysis data.

?Lessons Learned: So far, specifications towards a seamless processing pipeline documentation for NGS experiments differ significantly in complexity, level of descriptions, and public availability. We experienced that harmonizing existing standards for practical application is challenging, stressing the urgent need for additional coordination of standardization at the (inter)national level. The selection and implementation of an ELN and SEEK for collection and organization of NGS data was facilitated by existing expertise with these tools against the background of the lack of integrated software solutions fulfilling the requirements. Active involvement of all relevant stakeholders like clinicians, biologists, biomedical informaticians, and biostatisticians was crucial for comprehensive coordination and implementation of the concept generation of the documentation process. A further big challenge lies in the connection of primary and bioinformatical analysis data. Here, we demonstrate essential aspects on the way to seamlessly collect and sustainably manage NGS processing data from experimental design, sample sequencing, and quality controls to publication.

Acknowledgements: This work was funded by the BMBF in the projects GenoPerspektiv (grant 01GP1402) and Data and Information Management (DZHK - German Center for Cardiovascular Research) (grant 81Z7300173) as well as by the DFG for the Collaborative Research Centre (CRC) 1002 on Modulatory Units in Heart Failure, subproject INF.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Recommended Data Repositories. Scientific Data. [last accessed on 09.04.2018]. Available from: https://www.nature.com/sdata/policies/repositories Externer Link
2.
Research Data Guidelines. [ last accessed on 09.04.2018]. Available from: https://www.elsevier.com/authors/author-services/research-data/data-guideline. Externer Link
3.
Functional Genomics Data Society (FGED). MINSEQE - Minimum Information about a high-throughput nucleotide SEQuencing Experiment. [ last accessed on 09.04.2018]. Available from: http://fged.org/projects/minseqe/ Externer Link
4.
Brazma A, Hingamp P, Quackenbush J, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001;29(4):365–71.
5.
International Nucleotide Sequence Database Collaboration INSDC. [ last accessed on 09.04.2018]. Available from: http://www.insdc.org/ Externer Link
6.
Wolstencroft K, Owen S, Krebs O, et al. SEEK: a systems biology data and model management platform. BMC Syst Biol. 2015;9:33.