gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Reproducible bioinformatics workflows: A case study with software containers and interactive notebooks

Meeting Abstract

Search Medline for

  • Anja Eggert - Leibniz Institute for Farm Animal Biology, Dummerstorf, Germany
  • Pål Olof Westermark - Leibniz Institute for Farm Animal Biology, Dummerstorf, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 374

doi: 10.3205/20gmds376, urn:nbn:de:0183-20gmds3763

Published: February 26, 2021

© 2021 Eggert et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Background: Reproducible specification of workflows in bioinformatics is challenging given their complexity. We developed a new statistical method in the field of circadian rhythmicity, which allows to rigorously determine whether measured quantities such as gene expression are not rhythmic. The statistical method itself was implemented in the R package “HarmonicRegression”, available on the CRAN repository. However, the bioinformatics workflow is much larger than the statistical test. For instance, to ensure the applicability and validity of the statistical method, we simulated data sets of 20,000 gene expressions over two days, with a large range of parameter combinations (e.g. sampling interval, fraction of rhythmicity, amount of outliers, detection limit of rhythmicity, etc.).

Methods: We describe and demonstrate the use of Jupyter notebooks to document, specify, and distribute our statistical method and its application to both simulated and experimental data sets. Jupyter notebooks combine text documentation with dynamically editable and executable code.

Results: Thus, parameters and code can be dynamically modified, allowing both verification of results, as well as instant experimentation. The notebook runs inside a Docker software container, which mirrors the original software environment and avoids the need to install any software.

Conclusion: The Docker container and the Jupyter notebook will be available on GitHub, accompanying our paper with preprint available on bioRxiv. This frameworkensures complete long-term reproducibility of the workflow.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.