gms | German Medical Science

MAINZ//2011: 56. GMDS-Jahrestagung und 6. DGEpi-Jahrestagung

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V.
Deutsche Gesellschaft für Epidemiologie e. V.

26. - 29.09.2011 in Mainz

Global two sample summary statistics for (high-dimensional) categorical profiles with application to ICF Core Sets

Meeting Abstract

Suche in Medline nach

  • Monika Jelizarow - Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie LMU, München
  • Ulrich Mansmann - Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie LMU, München

Mainz//2011. 56. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds), 6. Jahrestagung der Deutschen Gesellschaft für Epidemiologie (DGEpi). Mainz, 26.-29.09.2011. Düsseldorf: German Medical Science GMS Publishing House; 2011. Doc11gmds097

doi: 10.3205/11gmds097, urn:nbn:de:0183-11gmds0972

Veröffentlicht: 20. September 2011

© 2011 Jelizarow et al.
Dieser Artikel ist ein Open Access-Artikel und steht unter den Creative Commons Lizenzbedingungen ( Er darf vervielfältigt, verbreitet und öffentlich zugänglich gemacht werden, vorausgesetzt dass Autor und Quelle genannt werden.



While, over the past decade, a plethora of methods has been developed to address the detection of global effects with regard to high-dimensional groups of metric variables such as gene sets, less effort has been invested in the case when the variables of interest are categorical. For the former, the construction of a global test statistic as the sum of (transformed) univariate test statistics has amply been discussed, see Ackermann and Strimmer [1] for an extensive review. The rationale behind these so-called summary statistics is a straightforward construction of global tests both addressing arbitrary dimension and exploiting the multiplicity of well investigated univariate tests, each assessing marginal effects. By computing permutation-based p-values, global tests which are obtained through this approach can account for possible correlations within the summary statistics. A first model-based attempt to two sample summary statistics addressing binary variables was made by Mansmann and Meister [2] who employ a deviance summary statistic. In our work, we propose further summary statistics for categorical (i.e. nominal and ordinal) variables and thus extend their range of application from metric to arbitrary scales of measurement. Unlike Mansmann and Meister [2], we do not pursue model-based approaches, but focus on established two sample tests such as the chi-square test for homogeneity. We examine the properties of these summary statistics by means of simulated data and demonstrate their potential usefulness for analyzing International Classication of Functioning (ICF) Core Sets where a lot of ordinal covariates are usually collected [3]. Beyond, Single Nucleotide Polymorphisms arising in Genome-Wide Association Studies are another relevant scope of application for the approach we present. In case of hierarchically structured data such as the ICF Core Sets, we illustrate how the proposed summary statistics might be used in order to reveal significant subsets according to the hierarchical testing approach introduced by Meinshausen [4]. Our analysis of ICF Core Sets for stroke patients [5] suggests strong differences between the functional profiles of patients where the stroke is located in the right versus the left brain hemisphere. Applying the hierarchical testing procedure allowed to assign significant differences to subsets concerning specific mental functions, the ability of applying knowledge and communication devices.


Ackermann M, Strimmer K. A general modular framework for gene set enrichment analysis. BMC Bioinformatics. 2009;10:47.
Mansmann U, Meister R. A marginal global two-sample test for multivariate (high-dimensional) binary data. ISCB Montpellier. 2010.
World Health Organization. International Classifcation of Functioning Disability and Health. ICF; 2001.
Meinshausen N. Hierarchical testing of variable importance. Biometrika. 2008;95:265-78.
Grill E, Lipp B, Boldt C, Stucki G, Koenig E. Identifcation of relevant ICF categories by patients with neurological conditions in early postacute rehabilitation facilities. Disability and Rehabilitation. 2005;27:459-66.