gms | German Medical Science

64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

08. - 11.09.2019, Dortmund

Kernel-based tests integrating variant effect predictions from deep learning for genetic association tests of rare variants

Meeting Abstract

Suche in Medline nach

  • Stefan Konigorski - Hasso Plattner Institute, Potsdam, Germany
  • Remo Monti - Hasso Plattner Institute, Potsdam, Germany; Max Delbrück Center for Molecular Medicine, Berlin, Germany
  • Christoph Lippert - Hasso Plattner Institute, Potsdam, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Dortmund, 08.-11.09.2019. Düsseldorf: German Medical Science GMS Publishing House; 2019. DocAbstr. 322

doi: 10.3205/19gmds067, urn:nbn:de:0183-19gmds0677

Veröffentlicht: 6. September 2019

© 2019 Konigorski et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Genome-wide association studies aim to identify genetic variants that are associated with disease outcomes or complex traits, and one of their main challenges is the low power of the statistical association tests. Different kernel-based tests have been established as powerful tools for the analysis [1], [2], [3], [4], however, they still yield low power in association tests of rare genetic variants (i.e., variants with a low minor allele frequency in the population) [5]. How to optimally analyze rare variants is still an open research question. In a recent study, we have proposed a rare-variant association test based on a generalized set of kernels that aims to increase the statistical power by incorporating information from biological annotations and intermediate molecular traits [6].

Here, our goal was to apply and evaluate these new kernel-based tests in three studies. First, we investigated the empirical type I error and power of the new kernel-based tests in simulation studies, in comparison to the popular rare-variant tests SKAT and SKAT-O [1], [2], under scenarios that reflect different potential molecular mechanisms underlying the genotype-phenotype association. Second, we applied and evaluated the kernel-based tests in the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. In the available sample of n=556 individuals, we performed a genome-wide association study of more than 36 million variants from whole-genome sequencing data with nine Alzheimer’s disease traits (biomarkers and the volume of brain regions from MRI scans) by incorporating gene expression measures from RNA-sequencing experiments as intermediate molecular traits in the test. In the third and main application, we applied and evaluated the kernel-based tests in the UK Biobank study and incorporated variant effect predictions on intermediate epigenetic traits from deep learning into the tests. For this, in a preliminary step, a deep neural network was trained to model the binding of transcription factors and other epigenetic regulatory events based on the DNA sequence. Subsequently, the model was used to predict the variants’ effects on epigenetic processes in their surrounding sequence context [7], [8]. Then, we performed an association analysis of approx. 4 million genetic variants from exome-sequencing experiments with the metabolic traits BMI and type II diabetes in n=50,000 individuals of the UK Biobank, using the predicted regulatory effects of the genetic variants as weights in the kernel-based tests.

The results of the simulation study indicate that the new kernel-based tests have valid type I error and can improve the power of association tests compared to SKAT and SKAT-O, if the genetic variants are associated with the incorporated intermediate molecular traits. The biological relevance of these synthetic analyses is supported by the results of both applications to real-data, which indicate that the new kernels integrating domain knowledge can yield higher power in association tests of rare variants compared to existing tests and can identify novel candidate variants. The identification of novel disease-associated variants and the additional information gained from incorporating intermediate molecular traits can help to understand disease etiology and yield candidate targets for treatment interventions.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


Wu MC, Lee S, Cai T, et al. Rare-variant association testing for sequencing data with the sequence kernel association test. American Journal of Human Genetics. 2011;89(1): 82–93.
Lee S, Wu MC, Lin X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics. 2012;13(4): 762–775.
Listgarten J, Lippert C, Kang EY, et al. A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 2013;29(12): 1526–1533.
Lippert C, Xiang J, Horta D, et al. Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 2014;30(22): 3206–3214.
Konigorski S, Yilmaz YE, Pischon T. Comparison of single-marker and multi-marker tests in rare variant association studies of quantitative traits. PLoS One. 2017;12(5): e0178504.
Konigorski S, Khorasani S, Lippert C. Integrating omics and MRI data with kernel-based tests and CNNs to identify rare genetic markers for Alzheimer's disease. 2018 Dec 2. arXiv preprint: 1812.00448.
Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods. 2015;12(10): 931–934.
Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Research. 2016;26(7): 990-999.