gms | German Medical Science

GMDS 2015: 60. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

06.09. - 09.09.2015, Krefeld

Systematically evaluating interfaces for RNA-seq analysis from a life scientist perspective

Meeting Abstract

  • Alicia Poplawski - University Medical Center Johannes Gutenberg University Mainz, Mainz, Deutschland
  • Federico Marini - University Medical Center Johannes Gutenberg University Mainz, Mainz, Deutschland
  • Moritz Hess - University Medical Center Johannes Gutenberg University Mainz, Mainz, Deutschland
  • Tanja Zeller - University Heart Center Hamburg-Eppendorf, Deutschland; German Center for Cardiovascular Research (DZHK e.V.), Partner Site Hamburg/Lübeck/Kiel, Hamburg, Deutschland
  • Johanna Mazur - University Medical Center Johannes Gutenberg University Mainz, Mainz, Deutschland
  • Harald Binder - University Medical Center Johannes Gutenberg University Mainz, Mainz, Deutschland

GMDS 2015. 60. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Krefeld, 06.-09.09.2015. Düsseldorf: German Medical Science GMS Publishing House; 2015. DocAbstr. 109

doi: 10.3205/15gmds123, urn:nbn:de:0183-15gmds1231

Veröffentlicht: 27. August 2015

© 2015 Poplawski et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Massively parallel RNA-sequencing (RNA-seq) has become the instrument of choice for transcriptome analysis and a cornerstone of modern life science laboratories, but it is generally carried out by bioinformaticians. One of the reasons is the huge amount of data generated. For example, sequencing platforms can generate terabytes of data in a single sequencing run where many datasets are sequenced in the scope of a single project. Furthermore a complete RNA-Seq analysis requires many working steps. To test for differential expression, the sequenced short reads first need to be filtered and mapped to the genome. The next steps involve quantification followed with a test for differential expression. Subsequently, a gene set enrichment analysis can be performed. For every step many tools are available and each one needs input files in a specific format. Therefore results from different programs and with different formats need to be combined. For extracting useful information out of this massive amount of data, substantial computational skills and resources are required.

With sequencing costs being constantly reduced and sequencing speed and efficiency rising exponentially, it might become more important to shift at least some analysis steps from core facilities to life scientists, using standardized tools with easy-to-use interfaces.

Methods: We performed a systematic search and evaluation of such interfaces to investigate to what extent these can indeed facilitate RNA-seq data analysis even for users without extensive computer-science background.

Material and methods: We performed a systematic search in PubMed using the key words “RNA”, “seq”/ “RNA-seq” and “pipeline”/“workflow”/“integrated solution” and a complementary search using Google and Wikipedia. In consultation with biologists and bioinformaticians we defined criteria for a detailed evaluation of more widely used interfaces. Central criteria were ease of configuration, documentation, usability, computational demand, and reporting.

Results: We found a total of 29 open source interfaces, and 6 of the more widely used interfaces were evaluated in detail. The interfaces differ in the number of working steps covered. Only 13 interfaces allow for a complete analysis, while 20 of them integrate the steps mapping and quantification. Other interfaces are focused just on quantification and differential expression analysis. At a first glance, most of the evaluated interfaces make RNA-sequencing analyses easier to perform compared to manual execution of analysis steps, but most of them require a moderate to considerable amount of time to get familiar with, and eventual problems cannot be solved without considerable efforts or IT skills.

Discussion: No interface scored best in all of these criteria, indicating that the final choice will depend on the specific perspective of users and the corresponding weighting of criteria. Considerable technical hurdles had to be overcome in our evaluation. For many users this will diminish potential benefits compared to command line tools, leaving room for future improvement of interfaces.