Artikel
Reproducible and easy-to-use data analysis pipelines for biomedical research with singularity containers
Suche in Medline nach
Autoren
Veröffentlicht: | 6. September 2024 |
---|
Gliederung
Text
Software containers such as Singularity are widely used tools that facilitate the reuse of software across different computing environments.?
However, many biologists and other researchers find command line tools such as Singularity unfamiliar and do not feel productive when using software via the command line.?
The aim of this work is to create a graphical user interface that can be used by biologists without programming experience. We evaluate the feasibility of our approach with a proof-of-concept implementation for the TRR156.
Singularity is a containerization technology that allows us to transform complex multi-step bioinformatics workflows into singular, immutable files that serve the purpose of executable files.
Singularity files contain not only the application logic, but also the necessary environment to run the application, eliminating dependency management to the greatest extent currently possible.
We containerize two software pipelines used in single-cell RNA-seq at TRR156 with Singularity?.
Both pipelines take genomic sequencing data in fastq file format and quantify the number of occurrences of reads found in a reference genome or transcriptome.
We then create a graphical user interface to run these pipelines locally on a consumer Windows laptop.?
We validate the correctness of our containerized adaptation of the wf-transcriptome pipeline against the original developed by Oxford Nanopore on a test dataset using a reference transcriptome.
We compare count matrices from 6 runs of the adaptation with 6 runs of the original. We find the latter to be all identical.
Taking the latter as ground truth, we find a MARDS less than 0.2 on counts we consider reliable, i.e. greater than 5. This result is consistent with previous findings on the reliability of RNA quantification.
The other pipeline is a containerization of the simpleaf wrapper on top of alevin-fry and can be safely assumed to be correct, since no changes needed to be made. Three runs on a full-size dataset result in identical quantification matrices.
We re-implement the full reproducibility feature for the wf-transcriptome pipeline by masking /dev/random and /dev/urandom inside the container with a pseudo-random number generator.
Six local runs and six runs on the HPC cluster bwForClusterHelix result in identical count matrices.
A GUI that allows a user to select and start locally installed Singularity containers was implemented in Rust using the Dioxus fullstack library. It runs as a hybrid application with a web-based frontend.
Singularity Containers enable research collaboration at the technical level, simply by sharing a file and without any additional installations.? They can be published on services such as figshare, zenodo or heiData/heiArchive together with all other citable assets of the same publication. They largely eliminate dependency management and contain their own definition file describing how they were built. In this way, choosing Singularity increases both software interoperability and reusability.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016;533(7604):452–4. DOI: 10.1038/533452a
- 2.
- Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017;12(5):e0177459. DOI: 10.1371/journal.pone.0177459
- 3.
- Oxford Nanopore Technologies Plc: epi2me-labs/wf-transcriptomes. GitHub. 2024 [cited 2024 Apr 26]. Available from: https://github.com/epi2me-labs/wf-transcriptomes
- 4.
- He D, Zakeri M, Sarkar H, Soneson C, Srivastava A, Patro R. Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data. Nat Methods. 2022;19(3):316–22. DOI: 10.1038/s41592-022-01408-3
- 5.
- Zhang C, Zhang B, Lin L-L, Zhao S. Evaluation and comparison of computational tools for RNA-seq isoform quantification. BMC Genomics. 2017;18(1). DOI: 10.1186/s12864-017-4002-1