gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Quality control in genome-wide association studies revisited: a critical evaluation of the standard methods

Meeting Abstract

Suche in Medline nach

  • Hanna Brudermann - Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany
  • Tanja K. Rausch - Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, GermanyDepartment of Pediatrics, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany
  • Inke R. König - Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 84

doi: 10.3205/20gmds245, urn:nbn:de:0183-20gmds2457

Veröffentlicht: 26. Februar 2021

© 2021 Brudermann et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Background: Genome-wide association studies (GWAs) investigating the relationship between millions of genetic markers and a clinically relevant phenotype were originally based on the common disease – common variant assumption [1], thus aiming at identifying a small number of common genetic loci as cause for common diseases. In recent years, the focus of GWAs has shifted, and the task is no longer only the discovery of common genetic loci, but the discovery of loci with small effects by (mega-)meta-analyses or the aggregation of genomic information into genetic risk prediction scores. Since even low error frequencies can distort association results, extensive and accurate quality control of the given data is mandatory. However, after extensive discussions about standards for quality control in GWAS in the early years [2], further work on how to control data quality and adapt data cleaning to new GWAS aims and sizes is rare.

Methods: The aim of this study was to perform an extensive literature review to evaluate currently applied quality control criteria and their justification. Building on the findings from the literature search, a workflow was developed to include justified quality control steps, such as the proportion of missing data or the mean heterozygosity of the data, which are applied iteratively to the data. This workflow is subsequently illustrated using a real data set.

Results: Our results show that in most published GWAs, no scientific reasons for the applied quality steps are given. Cutoffs for the most common quality measures are mostly not explained. Especially the principal component analysis and the test for deviation from Hardy-Weinberg equilibrium are frequently used as quality criteria in many GWAs without analyzing the existing conditions exactly and adjusting the quality control accordingly.

However, if these quality measures are applied without being aware of the threshold values that are necessary depending on the data, the user risks losing relevant data and thus reduces the chance of finding an association. On the other hand, a thoughtless use of different quality criteria may prevent the identification of erroneous data, which then unnecessarily biases the results.

Conclusion: It is pointed out that researchers still have to decide between universal and individual parameters and therefore between optimal comparability to other analyses and optimal conditions within the specific study, keeping in mind that a strict quality control, which removes all data with a high risk of bias, always carries the risk that the remaining data is too homogeneous to make small effects visible. The developed workflow can be used for a uniform execution of GWAs and provides a suitable basis for customizing the quality control if required. To be able to better evaluate the results of GWAs in the future, the scientific necessities of the individual steps should always be mentioned.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Ziegler A, König IR, Pahlke F. A Statistical Approach to Genetic Epidemiology: Concepts and Applications, with an e-Learning Platform. 4th ed. Weinheim: John Wiley & Sons; 2012. Zugriff unter: http://gbv.eblib.com/patron/FullRecord.aspx?p=708055 Externer Link
2.
Ziegler A, König IR, Thompson JR. Biostatistical aspects of genome-wide association studies. Biometrical Journal: Journal of Mathematical Methods in Biosciences. 2008;50(1):8-28. DOI: 10.1002/bimj.200710398 Externer Link