gms | German Medical Science

MAINZ//2011: 56. GMDS-Jahrestagung und 6. DGEpi-Jahrestagung

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V.
Deutsche Gesellschaft für Epidemiologie e. V.

26. - 29.09.2011 in Mainz

A systematic framework to analyse and classify measures of association in 2x2 probability tables

Meeting Abstract

Search Medline for

  • Markus Scholz - Universität Leipzig, Leipzig
  • Dirk Hasenclever - Universität Leipzig, Leipzig

Mainz//2011. 56. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds), 6. Jahrestagung der Deutschen Gesellschaft für Epidemiologie (DGEpi). Mainz, 26.-29.09.2011. Düsseldorf: German Medical Science GMS Publishing House; 2011. Doc11gmds063

DOI: 10.3205/11gmds063, URN: urn:nbn:de:0183-11gmds0631

Published: September 20, 2011

© 2011 Scholz et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Outline

Text

Background: Measures of association play a role in selecting 2x2 tables in high-dimensional binary data that exhibit strong associations. Several measures of association are in use namely mutual information (MutInf), correlation coefficient (R), odds ratio (OR) based measures like Yule’s Q and Y and in genetics Lewontin’s D’. These measures markedly differ on specific tables and in their dependence on the margins. There is no consensus for what purpose to use which measure.

Methods: We study a 2-dimensional group of margin transformations on the 3-dimensional manifold T of all 2x2 probability tables. All measures of association independent of the margins are monotone functions of the odds ratio. The margin transformations allow introducing natural coordinates that identify T with real 3-space such that the z-axis corresponds to log(sqrt(OR)) and margins vary on planes z=const. We use these coordinates to visualise how each measure of association depends on the margins by plotting the measure restricted to tables with constant odds ratio.

Results: The measures listed above represent different selection criteria for interesting tables: MutInf is maximal only for the table with œ in the diagonal and down-weights tables with any skewed margins given the odds ratio. R is maximal for diagonal tables and down-weights for deviation from diagonal shape. D’ is maximal whenever one cell goes to zero and up-weights L-shaped tables with one small entry. Unfortunately - although extensively used in genetics - also some degenerate tables with a small row or column receive height weights. This explains the well known fact that D’ exhibits erratic behaviour when estimated for tables with skewed margins.

As an alternative to D’ without these defects, we develop a novel measure of association HS based on the odds ratio in which tables with skewed margins are weighted according to the relative entropy among tables with the same odds ratio. Entropy is a principled measure of the combinatorial plausibility of a table. Relative entropy given the odds ratio is maximal on symmetric tables for odds ratio ≤ 12.89. We show analytically that at about 12.89 a bifurcation occurs such that for large odds ratios higher weights are given to L-shaped tables. HS behaves well in down-weighting tables with very skewed margins.

Conclusion: We present a mathematical framework to investigate the relative merits of measures of associations and propose a new entropy and odds ratio based measure useful when interest is on L-shaped tables.