gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Big Data Requires Big Caution – OMICS Data Are No Exception

Meeting Abstract

Suche in Medline nach

  • Jens Allmer - Hochschule Ruhr-West University of Applied Sciences, Mülheim an der Ruhr, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 145

doi: 10.3205/20gmds012, urn:nbn:de:0183-20gmds0128

Veröffentlicht: 26. Februar 2021

© 2021 Allmer.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Background: Changes in gene expression are a hallmark of disease. Transcripts can be regulated by transcription factors (proteins) and protein abundance can be regulated by microRNAs (miRNAs). These post-transcriptional regulators of gene expression are co-transcribed with genes or originate from their own locii. Their mode of action is either slicing mRNA or repression of translation. MicroRNAs can be measured in many ways such as microarrays and RNA-seq. Mature miRNAs (the key elements) are very short sequences (18-24 nt). When putting this in perspective with the size of the human genome, it can be statistically expected to find several perfect matches for lengths of 18 and 19 while for larger sizes the expectation drops below one. At the same time sequencing is not perfect and depending on the method and sequencing approach, different base-calling accuracies result. As if these two confounding factors wouldn't be enough, there exist a large number of miRNAs throughout the phylogenetic tree and tens of thousands are listed in miRBase.

Methods: Often it can be obvious for the data creators what challenges exist in the dataset and how to explain/deal with them. However, once deposited in online repositories such as the sequence read archive at NCBI, they are available to anyone and the knowledge about the particular problems with the dataset cannot usually be captured exhaustively. If these problems are paired with ignorance of biological processes and the determination to uncover something novel, interesting claims arise. One such claim is that plant miRNAs regulate human gene expression. Nothing could be further from the truth and while pre-miRNAs are relatively stable and a few may bypass digestion, their abundance would be so low that only a few cells of trillions could be affected.

Results: These claims follow from missing caution in the analysis of big data. Big data is defined by volume, variety, velocity, and veracity. Adding variability and viability to the equation and the challenges mentioned above can be placed in the workflow. For example, the volume of available sequence data is tremendous (petabytes) and the veracity is rather low due to problems in methodology and sequencing accuracy. Hypotheses, if any, are not viable and void of biological intuition. Unfortunately, once published myths such as plant miRNAs regulating human gene expression are hard to curb.

Conclusion: The danger is that a lack of scientific rigor erodes the exceptional utility of OMICS data for health research. Therefore, big caution is needed when big data is analyzed and big scrutiny is needed from editors and reviewers when big data is used to support claims.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.