gms | German Medical Science

GMDS 2012: 57. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

16. - 20.09.2012, Braunschweig

Comparison of methods aiming to detect causal genes in datasets including rare variants

Meeting Abstract

Search Medline for

  • Holger Kirsten - Universität Leipzig / LIFE, Leipzig, Deutschland
  • Markus Scholz - Universität Leipzig / LIFE, Leipzig, Deutschland

GMDS 2012. 57. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Braunschweig, 16.-20.09.2012. Düsseldorf: German Medical Science GMS Publishing House; 2012. Doc12gmds162

DOI: 10.3205/12gmds162, URN: urn:nbn:de:0183-12gmds1628

Published: September 13, 2012

© 2012 Kirsten et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Outline

Text

Background: Rare causal variants are believed to fill a significant part of the observed gap between heritability estimates of common diseases or quantitative traits and explained variance by discovered common genetic variants. Next generation sequencing methods allow the identification of rare variants for a reasonable number of individuals which can be analyzed for screening of disease variants or genetic modifiers of traits. Statistical methods to detect these variants are required and should be appropriately compared and characterized.

Method: The publically available GAW17 dataset is based on a preliminary 1,000 Genomes dataset of 697 individuals combined with a simulated complex disease model comprising intermediate quantitative phenotypes. This data set was used to assess and compare strategies for selecting candidate loci. Our approaches are either genome-wide considering all markers simultaneously or gene-centric, i.e. we aim to select candidate genes rather than markers. For this purpose, we analyse and compare a number of uni- and multivariate methods including marginal correlation, Hotelling test, combination of both, LASSO, Boosting, correlation-adjusted t-score (CAT score) and the correlation-adjusted marginal correlation (CAR score). Methods are evaluated on the basis of top-gene lists for three different phenotypes including both, categorically and continuously distributed traits.

Results and Discussion: We detect clear differences between methods. Detailed analysis of the causal gene characteristics reveals conditions under which particular methods perform well. Exemplarily, in gene-wise analysis, the marginal statistic was superior when there is a single causal marker with a dominating effect in the gene and when a relatively liberal cut-off of the gene-list is used, while the Hotelling test was superior when there are several independent causal markers of the gene and when a stringent cut-off of the gene-list is used. Interestingly, in gene-wise analysis, more elaborated methods for regression analysis (LASSO, Boosting, and CAT / CAR scores) did not generally perform better compared with these statistics. We discuss recommendations for the application of the methods in screening of disease variants or genetic modifiers of quantitative traits.


References

1.
Scholz M, Kirsten H. Comparison of scoring methods for the detection of causal genes with or without rare variants. BMC Proceedings. 2011;5(9):S49.
2.
Zuber V, Strimmer K. High-dimensional regression and variable selection using CAR scores. Statist Appl Genet Mol Biol. 2011;10:34.