gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Testing methods to analyze small sample size GWAS with different settings

Meeting Abstract

Suche in Medline nach

  • Alicia Poplawski - Institut für Medizinische Biometrie, Epidemiologie und Informatik (IMBEI), Universitätsmedizin der Johannes-Gutenberg-Universität Mainz, Mainz, Germany
  • Konstantin Strauch - Institut für Medizinische Biometrie, Epidemiologie und Informatik (IMBEI), Universitätsmedizin der Johannes-Gutenberg-Universität Mainz, Mainz, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 446

doi: 10.3205/20gmds380, urn:nbn:de:0183-20gmds3800

Veröffentlicht: 26. Februar 2021

© 2021 Poplawski et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Background: Due to small effect sizes and the huge number of tested single-nucleotide polymorphisms (SNPs) genome-wide association studies (GWAS) require very large sample sizes, in order to identify genetic variants associated with a genetic disease. However, often only fairly small sample sizes are available. In order to detect the best way to identify SNPs and indels in the comparison of different experimental conditions (as for example tumor vs. matched normal samples) with only small sample size, different analytical tools were tested using simulated data. The power and false discovery rate were calculated using thresholds at different levels.

Methods: Whole genome sequencing data of germline DNA were simulated for leukemia patients and unaffected controls by resampling “1000 Genome Project” data using hapgen2 [1]. Known leukemia SNPs were simulated along a wide set of odds ratio (OR) values. Different methods, such as rvtests [2], logistic regression, and likelihood based boosting [3] were employed to analyze the simulated data.

Results: Results were obtained regarding the ability to identify disease-causing variants on SNP and on gene level investigating the power and the false discovery rate. The boosting algorithm utilized resampling techniques and SNPs are selected based on how often they were included in the model (inclusion frequencies). Boosting results were compared for different inclusion frequencies thresholds and the other methods for different p-value thresholds. Independent from the selected boundary and from the OR, boosting failed to detect variants on SNP level and also seems to be inferior to logistic regression for the identification on gene level. Results for rvtests are not available yet.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Su Z, Marchini J, Donnelly P. HAPGEN2: simulation of multiple disease SNPs. Bioinformatics. 2011 Aug 15;27(16):2304-5. DOI: 10.1093/bioinformatics/btr341 Externer Link
2.
Zhan X, Hu Y, Li B, Abecasis GR, Liu DJ. RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data. Bioinformatics. 2016 May 1;32(9):1423-6. DOI: 10.1093/bioinformatics/btw079 Externer Link
3.
Binder H, Binder MH. Package ‘GAMBoost’. 2015.