- Open Access
Testing for genetic association taking into account phenotypic information of relatives
BMC Proceedings volume 3, Article number: S123 (2009)
We investigated efficient case-control association analysis using family data. The outcome of interest was coronary heart disease. We employed existing and new methods that take into account the correlations among related individuals to obtain the proper type I error rates. The methods considered for autosomal single-nucleotide polymorphisms were: 1) generalized estimating equations-based methods, 2) variance-modified Cochran-Armitage (MCA) trend test incorporating kinship coefficients, and 3) genotypic modified quasi-likelihood score test. Additionally, for X-linked single-nucleotide polymorphisms we proposed a two-degrees-of-freedom test. Performance of these methods was tested using Framingham Heart Study 500 k array data.
Several single-gene variants associated with coronary heart disease (CHD) using Framingham Heart Study (FHS) 100 k array data were reported previously . Regression models with generalized estimating equations (GEE)  as well as family-based association testing using FBAT  were used. Both methods do not utilize all family information available. While the FBAT test statistic is based on the use of offspring genotypes conditional on (informative) parental genotypes, the GEE association test uses all individuals with genotype and phenotype data. The latter usually uses an exchangeable working correlation matrix to account for correlation within each sibship. Hence, available parental information is not optimally used.
Our aim is to use family information efficiently. In this paper we study an association between CHD and candidate genes using the binary outcome of CHD directly. The following methods were investigated: 1) a logistic regression model taking into account familial dependence of the observations using GEE, 2) Cochran-Armitage (CA) trend test taking into account the correlations among related individuals when computing the variance, and 3) the extensions of modified quasi-likelihood score (MQLS) test . The last methods also use phenotypic information of ungenotyped family members for an optimal weighting scheme, and can be used for sibships as well as for nuclear families. Because the first two methods are genotypic tests, we extended the allelic MQLS test to the corresponding genotypic test (gMQLS), assuming a multiplicative model .
Unil now, little has been reported on performance of such test statistics for association on the X chromosome [6, 7]. Because the X chromosome represents 2.5% of the human genome for males and 5% for females, information coming from the X chromosome cannot be ignored. To identify X-linked markers for susceptibility to a disease, we investigate statistics to test for association on the X chromosome in a related sample using GEE and sex-stratified allelic MQLS test.
We analyzed Problem 2 of Genetic Analysis Workshop 16 data, using GeneChip® Human Mapping 500 k Array Set provided by the FHS SHARe (SNP Health Association Resource) project. The large pedigrees (n = 841) were broken up into nuclear family units (n = 1,902). The data consist of 2,878 subjects in the Offspring Cohort (n = 2,555) and their parents in the Original Cohort (n = 323). A binary outcome variable was created as any event of hard CHD (n = 225). The details of data sets created and used are described in Table 1.
Single-nucleotide polymorphism (SNP) selection
We checked inheritance error. PLINK version 1.02  was used for preprocessing of data with the following inclusion thresholds: minor allele fequency ≥ 0.01, missing rate per person ≤ 0.1, missing rate per SNP ≤ 0.1, and Hardy-Weinberg equilibrium p ≥ 0.001. For chromosome 8, by ignoring relatedness between subjects, we conducted allelic tests for the preprocessed 22,207 SNPs (from 27,362 of FHS 500 k SNP resource) using PLINK. Then, 121 SNPs were selected using a threshold of allelic p-values < 0.005. For chromosome X, 8,020 SNPs (from 9,828) were tested, and using the same threshold 35 SNPs were selected.
GEE-based and modified CA trend test
One merit of using pedigrees in a case-control study is that cases with affected relatives might have higher expected frequency of associated alleles than cases without affected relatives. For GEE, an exchangeable working correlation matrix was used to account for correlation within each sibship and each family. However, this correlation is prone to misspecification, and subsequent loss of efficiency may be substantial .
Under the null hypothesis of no association between genotype and disease, CA trend test is , where U is a sum of weighted differences of genotype counts between cases and controls. When subjects are biologically related, we need to account for their correlations by computing the variance of U. Slager and Schaid  proposed a method in which the variance and covariance terms can be calculated based on identity-by-decent-sharing probabilities. We calculated the covariance using expected identity-by-decent (2 times kinship coefficient); hence, this method is called the modified Cochran-Armitage (MCA) test.
MQLStest and its extensions
Alternatively, we considered MQLS test proposed by Thornton and McPeek , which is said to be more powerful and more widely applicable. It distinguishes between unaffected controls and controls of unknown phenotype (general population controls), and it also incorporates phenotypic data of relatives with missing genotypes.
Suppose we have n + m sampled individuals with phenotypic information. Let Y = (Y1, ..., Y n ) denote genotype data of n individuals with non-missing genotype, so that m individuals have missing genotype. Let Φ be the kinship matrix of the non-missing genotype individuals, and ΦN, Mbetween missing and non-missing genotype individuals. The entries of the matrix are 1 on the diagonal and 2ϕ ij kinship coefficient between the ith and jth individual off the diagonal. A N and A M are the column of the phenotype of the respectively non-missing and missing genotype individuals. The entry in A for the ith individual from the jth family is
with 0 <k < 1 specified to be the population prevalence of the trait. Then, the statistic is given by
where, α = A N + Φ-1 ΦN, MA M , Γ = αT(ΦA N + ΦN, MA M )- (1Tα)2 (1TΦ-11T)-1,
, , and .
We extended the allelic MQLS test to the corresponding genotypic test, gMQLS, assuming multiplicative model using genotypic mean and the corresponding variance .
For the X-linked SNPs, a simple allele-based test can be constructed by counting alleles, with males contributing a single allele and females two alleles. Because the assumption that the allele frequency does not vary with sex could not be met, we stratified the analysis by sex, and used the allelic MQLS test. To combine the results we combined the two chi-squared tests to obtain a two-degrees-of-freedom test (xMQLS).
The analyses using new methods have been conducted using functions written by the authors in R .
Association study for autosomal SNPs on chromosome 8
We compared the following methods: CA, MCA, GEE, and gMQLS. These tests were performed 1) using Offspring Cohort and 2) using the Original and Offspring Cohorts as described in Table 1. Note that for gMQLS, phenotypic information of un-genotyped individuals was also incorporated. The population prevalence of CHD - k in Eq. (1) - was set as 5%. To compare type 1 error rates, the quantile-quantile plots of 0.5-percentiles (the percentage of SNPs selected) are depicted in Figure 1. The points below the diagonal indicate that allelic tests ignoring relatedness in PLINK overestimated the association. The results are comparable for these selected SNPs.
In Table 2, the top ten ranking SNPs detected by gMQLS using nuclear families are reported. The gMQLS gave more significant results when information of parental generation was included: for example, the p-value decreased from 9.80 × 10-5 to 1.05 × 10-5 for RS17094201. None of the SNPs tested were found to have genome-wide significance (nominal p < 5 × 10-8).
Testing association for X-linked SNPs
We performed analysis using GEE adjusted for sex and the two-degrees-of-freedom test, xMQLS. The results of the top ten ranking SNPs using xMQLS are reported in Table 3. The xMQLS gave more significant results compared with other methods (minimum p-value = 6.05 × 10-7).
The fact that the behavior of the GEE-based methods sometimes deviates from other methods may be explained by the fact that the working correlation matrix has not been specified correctly, especially for nuclear families . This can be a disadvantageous feature of the GEE-based methods for family-based genome-wide association study.
We did not perform simulation studies regarding type 1 error rates of the new methods. However, a good performance of the allelic variants has been reported [4, 12], and it is reasonable to expect similar performance from the new tests.
The extended MQLS tests can be used for different types of families, and also to incorporate phenotypic information of ungenotyped relatives. Therefore, a better performance can be expected by increasing the number of cases. For this, selecting families with many cases might be more efficient.
The use of an allelic test for X-linked SNPs leads to criticism that males have only half the impact on the analysis as females. Instead, Clayton  proposed genotype-based tests for association that treat males as homozygous females. For females, we denote genotypes 0, 1, and 2, and genotypes of males are coded as 0 and 2. Then, X-chromosome specific covariances can be used to calculate genotypic trend tests taking into account the family relationship.
The extended MQLS methods are promising. However, these may not be computationally feasible for family-based genome-wide association study. We recommend these tests to be used in a two-stage approach.
Analyzing family data using all information available in a case-control association study may improve efficiency. Two different subsets of data were considered: one consists of the Offspring Cohort, and the second with nuclear families (Original and Offspring Cohort). To account for relatedness among individuals, we considered first the GEE-based methods. As an alternative, we proposed new methods by extending CA trend test.
To gain efficiency, we also considered the extensions of MQLS test. The last methods utilize most of family information, and therefore might be more efficient than others. Using these methods, we analyzed the real FHS data. The new methods performed well compared with the GEE-based methods.
Adding family information seemed to improve the results. Although only a small number (n = 323) was added, the proportion of cases added (20%) was relatively large compared with that in the sibling-only data (6%). And, the gMQLS test might be more efficient because it incorporates all phenotypic information available - even CHD cases of un-genotyped parents.
For X-linked SNPs, equivalent results were obtained: the xMQLS test outperform the GEE-based methods using these specific data. Further work should be done to evaluate the new methods.
Coronary heart disease
Framingham Heart Study
Generalized estimating equations
Genotypic test corresponding to the modified quasi-likelihood score
Modified quasi-likelihood score
Larson MG, Atwood LD, Benjamin EJ, Cupples LA, D'Agostino RB, Fox CS, Govindaraju DR, Guo CY, Heard-Costa NL, Hwang SJ, Murabito JM, Newton-Cheh C, O'Donnell CJ, Seshadri S, Vasan RS, Wang TJ, Wolf PA, Levy D: Framingham Heart Study 100 K project: genome-wide associations for cardiovascular disease outcomes. BMC Med Genet. 2007, 8 (suppl 1): S5-10.1186/1471-2350-8-S1-S5.
Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika. 1986, 73: 13-22. 10.1093/biomet/73.1.13.
Laird NM, Horvath S, Xu X: Implementing a unified approach to family-based tests of association. Genet Epidemiol. 2000, 19 (suppl 1): S36-S42. 10.1002/1098-2272(2000)19:1+<::AID-GEPI6>3.0.CO;2-M.
Thornton T, McPeek MS: Case-control association testing with related individuals: a more powerful quasi-likelihood score test. Am J Hum Genet. 2007, 81: 321-337. 10.1086/519497.
Sasieni P: From genotypes to genes: doubling the sample size. Biometrics. 1997, 53: 1253-1261. 10.2307/2533494.
Zheng G, Joo J, Zhang C, Geller NL: Testing association for markers on the X chromosome. Genet Epidemiol. 2007, 31: 834-843. 10.1002/gepi.20244.
Clayton D: Testing for association on the X chromosome. Biostatistics. 2008, 9: 593-600. 10.1093/biostatistics/kxn007.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.
Wang YG, Carey V: Working correlation structure misspecification, estimation and covariance design: implications for generalised estimating equations performance. Biometrika. 2003, 90: 29-41. 10.1093/biomet/90.1.29.
Slager SL, Schaid DJ: Evaluation of candidate genes in case-control studies: a statistical method to account for related subjects. Am J Hum Genet. 2001, 68: 1457-1462. 10.1086/320608.
R Development Core Team: A Language and Environment for Statistical Computing. [http://www.r-project.org]
Bourgain C, Hoffjan S, Nicolae R, Newman D, Steiner L, Walker K, Reynolds R, Ober C, McPeek MS: Novel case-control test in a founder population identifies P-selectin as an atopy-susceptibility locus. Am J Hum Genet. 2003, 73: 612-626. 10.1086/378208.
The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. H-WU was supported by grants from IOP Genomics/SenterNovem (IGE05007).
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://0-www.biomedcentral.com.brum.beds.ac.uk/1753-6561/3?issue=S7.
The authors declare that they have no competing interests.
H-WU performed the analyses and wrote the manuscript. H-WU and JJH-D participated in the development of the methods, and interpreted the results of the analysis. HJvdW participated in data preprocessing. All authors read and approved the final manuscript.
About this article
Cite this article
Uh, H., Wijk, H.J. & Houwing-Duistermaat, J.J. Testing for genetic association taking into account phenotypic information of relatives. BMC Proc 3, S123 (2009) doi:10.1186/1753-6561-3-S7-S123
- Nuclear Family
- Generalize Estimate Equation
- Framingham Heart Study
- Kinship Coefficient
- Phenotypic Information