gms | German Medical Science

64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

08. - 11.09.2019, Dortmund

Integrative analysis of RNA-Seq and DNA-Seq data via boosting

Meeting Abstract

Search Medline for

  • Alicia Poplawski - Institut für Medizinische Biometrie, Epidemiologie und Informatik, Universitätsmedizin der Johannes Gutenberg-Universität Mainz, Mainz, Germany
  • Harald Binder - Universitätsklinikum Freiburg, Freiburg, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Dortmund, 08.-11.09.2019. Düsseldorf: German Medical Science GMS Publishing House; 2019. DocAbstr. 265

doi: 10.3205/19gmds063, urn:nbn:de:0183-19gmds0633

Published: September 6, 2019

© 2019 Poplawski et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at



Introduction and question: The identification of SNPs and InDels in the comparison of tumor and normal samples comparison often results in small power and a high false positive rate. A possibility to solve this problem is to increase the sample size, but genome sequencing is expensive and the sample size is often limited, especially for rare diseases. Another possibility to overcome this problem is to integrate data from other molecular levels. Assuming that a mutation at the DNA level leads to a modified transcription level, RNA-Seq and DNA-Seq data can be combined.

Material and methods: We present a boosting approach for performing such a combined analysis. RNA-Seq data are analyzed in a first step, to identify differential expressed genes. The reciprocal value of the resulting p-values are then used as weights in a likelihood-based boosting algorithm to identify SNPs and InDels in DNA-Seq data. The same weight is used for each SNP within a gene and within 200 kb upstream and downstream of the gene body to include regulatory regions. The boosting algorithm utilized resampling techniques and SNPs/InDels are selected based on inclusion frequencies. This approach was develop on simulated RNA-Seq and DNA-Seq data for tumor and control sampes.

Results: The integrated analysis of simulated DNA-Seq and RNA-Seq data in order to identify SNPs and InDels is seen to raise the power and reduce the false positive rate compared to an analysis of only DNA-Seq.

Discussion: A mutation on DNA level does not necessary lead to a changed transcription level having only an effect on the protein level. This type of mutation is down-weighted by the propose boosting approach, but still can potentially be identified, in contrast to filtering mutations based on differentially expressed genes.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.