Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #310521

Title: Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants

item SWARTS, KELLY - Cornell University
item LI, HUIHUI - International Maize & Wheat Improvement Center (CIMMYT)
item NAVARRO, J. ALBERTO - Cornell University
item AN, DONG - China Agricultural University
item ROMAY, MARIA CINTA - Cornell University
item HEARNE, SARAH - International Maize & Wheat Improvement Center (CIMMYT)
item ACHARYA, CHARLOTTE - Cornell University
item GLAUBITZ, JEFFREY - Cornell University
item MITCHELL, SHARON - Cornell University
item ELSHIRE, ROBERT - Agresearch
item Buckler, Edward - Ed
item Bradbury, Peter

Submitted to: The Plant Genome
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/29/2014
Publication Date: 9/26/2014
Citation: Swarts, K., Li, H., Navarro, J., An, D., Romay, M., Hearne, S., Acharya, C., Glaubitz, J.C., Mitchell, S., Elshire, R.J., Buckler Iv, E.S., Bradbury, P. 2014. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. The Plant Genome. 7(3). DOI: 10.3835/plantgenome2014.05.0023.

Interpretive Summary: Next-generation sequencing of DNA from crop plants provides a low-cost method to produce data on a very large number of nucleotide variants and, as a result, holds great promise for plant geneticists and breeders. To keep per sample costs low, the resulting data is often low-coverage, and some of the heterozygous loci may not be identified accurately. Swarts et al. describe two computational methods that can be used to overcome these problems. The methods start by identifying the population haplotypes then use a hidden Markov model to find the most likely genotype of each sample analyzed. Using large maize populations representing collections of diverse inbreds, full sib families, and landraces, they show that the methods are very accurate and compare favorably to Beagle 4.0, a widely used software package for imputing genotypes

Technical Abstract: Next-generation sequencing technology such as genotyping-by-sequencing (GBS) made low-cost, but often low-coverage, whole-genome sequencing widely available. Extensive inbreeding in crop plants provides an untapped, high quality source of phased haplotypes for imputing missing genotypes. We introduce Full-Sib Family Haplotype Imputation (FSFHap), optimized for full-sib populations, and a generalized method, Fast Inbred Line Library ImputatioN (FILLIN), to rapidly and accurately impute missing genotypes in GBS-type data with ordered markers. FSFHap and FILLIN impute missing genotypes with high accuracy in GBS-genotyped maize (Zea mays L.) inbred lines and breeding populations, while Beagle v. 4 is still preferable for diverse heterozygous populations. FILLIN and FSFHap are implemented in TASSEL 5.0.