gms | German Medical Science

GMDS 2014: 59. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

07. - 10.09.2014, Göttingen

A genetic algorithm for generating correlated binary variables – with applications to logistic regression models

Meeting Abstract

Suche in Medline nach

  • K. Jung - Universitätsmedizin Göttingen, Göttingen

GMDS 2014. 59. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Göttingen, 07.-10.09.2014. Düsseldorf: German Medical Science GMS Publishing House; 2014. DocAbstr. 109

doi: 10.3205/14gmds176, urn:nbn:de:0183-14gmds1765

Veröffentlicht: 4. September 2014

© 2014 Jung.
Dieser Artikel ist ein Open Access-Artikel und steht unter den Creative Commons Lizenzbedingungen (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.de). Er darf vervielfältigt, verbreitet und öffentlich zugänglich gemacht werden, vorausgesetzt dass Autor und Quelle genannt werden.


Gliederung

Text

Introduction: Correlated binary variables are frequently observed in many scientific studies. Typical examples are the risk predictors of a disease in the setting of logistic regression models or repeated measures in a longitudinal setting. In order to evaluate statistical methods for the analysis of such data, techniques for simulating random numbers from a multivariate binary distribution are necessary. One of the first proposed techniques for that purpose simply draws a sample from the multivariate normal distribution and then dichotomizes this data according to the desired marginal probabilities [1]. Other algorithms are based for example on correlated Poisson variables [2] or multinomial distributed variables [3]. Commonly, these approaches propose different ways to fully assess the joint distribution of all variables, given their marginal means and their correlation structure. However, these approaches come generally along with restrictions either on the range of the input parameters or they are not feasible for a larger number of variables. The advantages and disadvantages of existing approaches have been reviewed in [4].

Methods: In this talk a new genetic algorithm for iterating the joint distribution of correlated binary variables is proposed. The iteration performance of this algorithm is evaluated with respect to prespecified marginal means, correlation structures and the number of variables. The applicability of this new approach is further studied in different scenarios of sample size planning for logistic regression models.

Results: The proposed technique can cope with a large range of the input parameters. For small numbers (≤5) of variables the joint distribution is iterated within an acceptable number of steps, i.e. within seconds, for larger numbers of variables the iteration can take minutes or longer. The precision of iteration can be prespecified in terms of the deviation from the given correlation matrix. The marginal means are usually exactly represented by using this method.

Discussion: In contrast to existing methods the genetic algorithm is more flexible with regard to the input parameters, although it can lack in precision for larger numbers of variables. The new technique is also useful for simulating the necessary sample size of logistic regression models.


References

1.
Emrich JE, Piedmonte MR. A method for generating high-dimensional multivariate binary variates. Stat Comp. 1991;45:302-4.
2.
Park C, Park T, Shin D. A simple method for generating correlated binary variates. Am Stat. 1996;50:306-10.
3.
Kang SH, Jung SH. Generating binary variables with complete specification of the joint distribution. Biom J. 2001;43:263-9.
4.
Farrell P, Rogers-Steward K. Methods for generating longitudinally correlated binary data. Int Stat Rev. 2008;76:28-38.