Submitted to: Journal of Dairy Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: January 1, 2010
Publication Date: May 1, 2010
Repository URL: http://hdl.handle.net/10113/42137
Citation: Weigel, K.A., Van Tassell, C.P., O'Connell, J.R., Van Raden, P.M., Wiggans, G.R. 2010. Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms. Journal of Dairy Science. 93(5):2229-2238. Interpretive Summary: Single nucleotide polymorphism (SNP) genotypes of Jersey cattle were used to evaluate the accuracy of imputation of missing genotypes. Reference panels of 2542 animals with genotypes for 43,385 SNP were used in conjunction with study samples of 604 animals for which genotypes were available for random subsets of 1, 2, 5, 10, 20, 40, or 80% of loci. Up to 74.3, 80.1, 90.5, 94.2, 96.4, 98.9, or 99.5% of the masked genotypes, respectively, were imputed correctly for animals in the study samples using two publicly available algorithms based on hidden Markov models that do not require the genotyping of parents or other close relatives.
Technical Abstract: The availability of dense single nucleotide polymorphism (SNP) genotypes for dairy cattle has created exciting research opportunities and revolutionized practical breeding programs. Broader application of this technology will lead to situations in which genotypes from different low-, medium-, or high-density platforms must be combined. In this case, missing SNP genotypes can be imputed using family- or population-based algorithms. Our objective was to evaluate the accuracy of imputation in Jersey cattle, using reference panels comprised of 2542 animals with 43,385 SNP genotypes and study samples of 604 animals for which genotypes were available for 1, 2, 5, 10, 20, 40, or 80% of loci. Two population-based algorithms, fastPHASE 1.2 (P. Scheet and M. Stevens; University of Washington TechTransfer Digital Ventures Program, Seattle, WA) and IMPUTE 2.0 (B. Howie and J. Marchini; Department of Statistics, University of Oxford, United Kingdom), were used to impute genotypes on Bos taurus autosomes 1, 15, and 28. The mean proportion of genotypes imputed correctly ranged from .659 to .801 when 1 to 2% of genotypes were available in the study samples, from .733 to .964 when 5 to 20% of genotypes were available, and from .896 to .995 when 40 to 80% of genotypes were available. In the absence of pedigrees or genotypes of close relatives, the accuracy of imputation may be modest (e.g., < .80) when low-density platforms with fewer than 1000 SNP are used, but population-based algorithms can provide reasonably good accuracy (e.g., > .90) when medium-density platforms of 2000 to 4000 SNP are used in conjunction with high-density genotypes (e.g., > 40,000 SNP) from a reference population. Accurate imputation of high-density genotypes from inexpensive, low- or medium-density platforms could greatly enhance the efficiency of whole genome selection programs in dairy cattle.