|Van Tassell, Curtis - Curt|
Submitted to: Biomed Central (BMC) Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/9/2012
Publication Date: 10/6/2012
Citation: Ma, L., Wiggans, G.R., Wang, S., Sonstegard, T.S., Yang, J., Crooker, B.A., Cole, J.B., Van Tassell, C.P., Da, Y. 2012. Effect of sample stratification on dairy GWAS results. Biomed Central (BMC) Genomics. 13:536. Interpretive Summary: Genome-wide association studies are used to identify genetic factors associated with phenotypes. Population stratification, systematic differences in allele frequencies between subpopulations, can cause false-positive results in association studies. False-positive results suggest that there is a relationship of a marker to a gene when no such relationship exists. Artificial insemination has been widely used in dairy cattle breeding for the past 50 years, which has resulted in population stratification. Several methods to account for stratification were examined in this study. Stratification was largely attributable to the large number of half-sib families in the population. All three methods for stratification correction reduced the number of significant effects. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction eliminated most of the effects detected by a method without stratification correction, and could have removed some true effects associated with genetic selection.
Technical Abstract: Background Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach. Results Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10-15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. An elite cluster of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cluster. A large half-sib family in this elite cluster contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. For 31 dairy traits, the three methods had a small number of common effects with genome-wide significance, including the DGAT1-NIBP region of BTA14 for fat percentage, a SNP in DGAT1 for fat yield, a SNP 45kb upstream from PREY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. Among the top 100 effects per trait, the three methods for stratification correction and a method without stratification correction had 41 common effects for 13 of the 31 traits. Conclusions Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction eliminated most of the effects detected by a method without stratification correction and could have removed some true effects associated with genetic selection.