gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Reproducible bioinformatics workflows: A case study with software containers and interactive notebooks

Meeting Abstract

Suche in Medline nach

  • Anja Eggert - Leibniz Institute for Farm Animal Biology, Dummerstorf, Germany
  • Pål Olof Westermark - Leibniz Institute for Farm Animal Biology, Dummerstorf, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 374

doi: 10.3205/20gmds376, urn:nbn:de:0183-20gmds3763

Veröffentlicht: 26. Februar 2021

© 2021 Eggert et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Background: Reproducible specification of workflows in bioinformatics is challenging given their complexity. We developed a new statistical method in the field of circadian rhythmicity, which allows to rigorously determine whether measured quantities such as gene expression are not rhythmic. The statistical method itself was implemented in the R package “HarmonicRegression”, available on the CRAN repository. However, the bioinformatics workflow is much larger than the statistical test. For instance, to ensure the applicability and validity of the statistical method, we simulated data sets of 20,000 gene expressions over two days, with a large range of parameter combinations (e.g. sampling interval, fraction of rhythmicity, amount of outliers, detection limit of rhythmicity, etc.).

Methods: We describe and demonstrate the use of Jupyter notebooks to document, specify, and distribute our statistical method and its application to both simulated and experimental data sets. Jupyter notebooks combine text documentation with dynamically editable and executable code.

Results: Thus, parameters and code can be dynamically modified, allowing both verification of results, as well as instant experimentation. The notebook runs inside a Docker software container, which mirrors the original software environment and avoids the need to install any software.

Conclusion: The Docker container and the Jupyter notebook will be available on GitHub, accompanying our paper with preprint available on bioRxiv. This frameworkensures complete long-term reproducibility of the workflow.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.