gms | German Medical Science

GMS Medizinische Informatik, Biometrie und Epidemiologie

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)

ISSN 1860-9171

SIBSIM - quantitative phenotype simulation in extended pedigrees

SIBSIM - Simulation quantitativer Phänotypen in erweiterten Stammbäumen

Original Article

Search Medline for

  • corresponding author Daniel Franke - University Hospital Schleswig-Holstein, Campus Lübeck, Institute of Medical Biometry and Statistics, Lübeck, Germany
  • author André Kleensang - University Hospital Schleswig-Holstein, Campus Lübeck, Institute of Medical Biometry and Statistics, Lübeck, Germany
  • author Andreas Ziegler - University Hospital Schleswig-Holstein, Campus Lübeck, Institute of Medical Biometry and Statistics, Lübeck, Germany

GMS Med Inform Biom Epidemiol 2006;2(1):Doc02

The electronic version of this article is the complete one and can be found online at: http://www.egms.de/en/journals/mibe/2006-2/mibe000021.shtml

Published: February 21, 2006

© 2006 Franke et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Abstract

A tool (SIBSIM) is described for quantitative phenotype simulation in extended pedigrees. Download and installation information are given and the advantages and limitations of the tool are described. The input format is based on XML and the different sections of an input file are explained. A short explanation of the algorithm is given. Links to the download site, the user manual, and related literature as well as a detailed example are included.

Availability: The software is available at: http://www.imbs.uni-luebeck.de/pub/sibsim.

Keywords: computer simulation, QTL, phenotype

Zusammenfassung

Ein Programm (SIBSIM) zur Simulation quantitativer Phänotypen in erweiterten Familien wird vorgestellt. Es werden sowohl Informationen zum Download als auch zur Installation gegeben, Vorteile und Limitierungen der Implementierung werden beschrieben. Das Eingabeformat ist XML-basiert; die einzelnen Abschnitte werden im Text erklärt. Der Simulationsalgorithmus selbst wird skizziert. Referenzen auf das Benutzerhandbuch und weiterführende Literatur sowie ein detailliertes Beispiel werden angegeben.

Verfügbarkeit: Die Software ist erhältlich unter: http://www.imbs.uni-luebeck.de/pub/sibsim.


Aim

The aim of this work is an introduction to SIBSIM, a modern and powerful computer program to simulate genotype and quantitative trait data in extended pedigrees. In the current release (2.1.2), we put emphasis on the simulation of a quantitative trait in pedigrees of arbitrary size without monozygotic twins. Well known software as, e.g., the SIMULATE package [1] are not as scalable as SIBSIM. As an advantage over both G.A.S.P. [2] and SIMLA [3] no predefined boundaries restrict SIBSIM in its potential, neither in genome nor in family size.

Instead, SIBSIM is as highly scalable as possible to meet any needs. SIBSIM may not only be used in simulation studies, but also in the validation, verification and testing process of other applications which deal with the implementation of statistical analysis of genomic data. We successfully used SIBSIM in the latter respect and detected a bug in a widely used genetic epidemiological software package.

The following paragraphs describe compile- and runtime requirements and recommencements of SIBSIM, the XML configuration file format as well as a short description of the phenotype simulation model.


Implementation

SIBSIM is completely written in C++ and available under the GNU General Public License (GPL) as source code distribution. It is designed as GNU autoconf/automake project and therefore literally portable to any Linux or Unix platform. SIBSIM requires the XML parsing library libxml2 installed which is freely available at http://www.xmlsoft.org/. The SIBSIM package may be compiled using any compiler, but GNU gcc in version 3.2 or later is recommended. Please refer to the online manual for further information regarding requirements and any other topic related to the building and installation process of SIBSIM.


User interface

For flexibility in further development, we decided to use an XML input file format for SIBSIM. A set of tags was defined to specify the various facets of a simulation. The document is divided into multiple sections: one for genotype description, an optional one for the trait and one or more for family structures.

The genotype section describes marker and quantitative trait loci by name, position in centiMorgan, alleles and their corresponding frequencies. The user may specify a function name to map from distance to the recombination value θ; currently haldane and kosambi are implemented. Optionally, the value for missing genotypes as well as a fraction of missing genotypes that are missing completely at random may be given. In the current release, the simulation of only one diallelic quantitative trait locus is supported.

The phenotype section is optional. If undefined, only genotypes are simulated. Phenotype simulation is based on the general variance analytic model, see, e.g. Falconer and Mackay [4] or Ziegler and König [5]. The phenotypic value x ik of an individual k within family i is additively decomposed into an overall mean µ, a major gene effect g ik , being determined by the genotype of the quantitative trait locus together with its specified inheritance model, a polygenic effect G ik which summarizes the effect of multiple genes to the phenotype in question, an environmental effect E ik , and, finally, an error term ε ik :

x ik = µ + g ik + G ik + E ik + ε ik

The environmental effect is either simulated as family effect E i which assigns each member of the pedigree the same random value, or as true environmental effect E ik which assigns the same random value to each sibling of a sibship, but different values between distinct sibships. The polygenic component is individually determined using average breeding values analogously to G.A.S.P. [2], i.e., polygenic effects of founders are drawn from a given distribution with given mean and variance. Effects of non-founders are drawn from the same distribution, but with a mean averaged from the polygenic effects of the respective parents.

The pedigree section defines any relations between family members, either founders or non-founders. There are no limitations in respect to the number of individuals per family but one in respect to the family structure: families may not have monozygotic twins. However, families with consanguinity and/or marriage loops are supported. The pedigree section is the only one that may occur multiple times. Each specified pedigree will appear as often as specified by its replicates attribute in the output file(s). For example, let ped1 be a nuclear family of two offspring and their parents. Let ped2 be an extended family of six individuals. The replicate attributes are set to 200 and 100, respectively. Therefore, 1400 individuals in 300 families will be simulated.

Finally, global attributes in the XML document define the number of simulated files, the location of file storing and the output format. Currently, only the linkage format as described by [6] is available. Data description files in linkage DAT format, e.g., for use with mapping software may be extracted from XML. An interface to S.A.G.E. [7] was also implemented. Please, see the section SIBSIM Usage of the online manual for further options.

Internal as well as external general entities are supported by SIBSIM. Please refer to the online documentation for further information about entities and examples of usage - a more complete introduction to entities is e.g. given in [8].

Monte-Carlo simulations heavily rely on pseudo random number generation. We therefore follow the recommendations of [9] and employ the "long period (>2×1018) random number generator of L'Ecuyer with Bays-Durham shuffle".


Acknowledgements

This work was supported by the Deutsche Forschungsgemeinschaft (ZI 591/12-1).


Appendix - example of simulation setup

We attempt to simulate a quantitative trait: the phenotype shall have three components: a genetic effect g ik accounting for 20% of the phenotypic variance. An overall shared environmental effect (E i ) contributes another 30% of variance within the phenotype. The remaining variance may be summarized as white noise, a normal distribution with mean zero and variance 0.5 (ε ik ). The trait emerges in families of three generations (Figure 1 [Fig. 1]), 15 individuals in total. The relationships between these individuals were arbitrarily chosen. We selected nine markers from chromosome 22 (Figure 2 [Fig. 2]) whose description as well as alleles and the corresponding allele frequencies were freely available from online databases [10], [11]. Given this setup, we prepared an input file for SIBSIM. Some portions of this file are displayed in Figure 3 [Fig. 3].

After processing and validating the input file, SIBSIM either reports an error message or silently simulates genotype and phenotype data. Eventually, the data is written in linkage format in sequentially numbered files (sample output is shown in Figures 4 [Fig. 4] and 5 [Fig. 5]).


References

1.
Terwilliger JD, Speer M, Ott J. Chromosome-based method for rapid computer simulation in human genetic linkage analysis. Genet Epidemiol. 1993;10(4):217-24.
2.
Wilson AF, Bailey-Wilson JE, Pugh EW Sorant AJM. The Genometric Analysis Simulation Program (G.A.S.P.): a software tool for testing and investigating methods in statistical genetics. Am J Hum Genet. 1996; 59:A193.
3.
Bass MP, Martin ER, Hauser ER. Pedigree Generation for Analysis of Genetic Linkage and Association. Pac Symp Biocomput. 2004:93-103. Available from: http://helix-web.stanford.edu/psb04/bass.pdf.
4.
Falconer DS, Mackay TFC. Introduction to Quantitative Genetics., 4th ed. Prentice Hall; 1996.
5.
Ziegler A, König IR. A Statistical Approach to Genetic Epidemiology: Concepts and Applications. Weinheim: Wiley-VCH; 2006.
6.
Terwilliger JD, Ott J. Handbook of Human Genetic Linkage. Baltimore: The Johns Hopkins University Press; 1994.
7.
S.A.G.E. Statistical Analysis for Genetic Epidemiology v4.6. 2004. Available from: http://darwin.cwru.edu/sage/.
8.
Harold ER, Means, WS. XML in a Nutshell. 2nd ed. O'Reilly; 2002.
9.
Press HW, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes In C. 2nd ed. Cambridge University Press; 1999. p. 282.
10.
MarkerSearch [database on the internet]. Bethesda (MD): National Cancer Institute (US). Available from: http://lpgws.nci.nih.gov/cgi-bin/MarkerSearch.
11.
Search for Markers [database on the internet]. Marshfield (WI): Marshfield Clinic (US). Available from: http://www2.marshfieldclinic.org/RESEARCH/GENETICS/Map_Markers/mapmaker/SearchFormFrames.html.
12.
Search for Maps [database on the internet]. Research Triangle Park (NC): RTI International (US). Available from: www.gdb.org/jmqp/queryByPos.html.