Skip to main content
ARS Home » Research » Publications at this Location » Publication #250940

Title: Filling in missing genotypes using haplotypes

item Vanraden, Paul
item O'CONNELL, J - University Of Maryland
item Wiggans, George
item WEIGEL, K - University Of Wisconsin

Submitted to: Journal of Dairy Science
Publication Type: Abstract Only
Publication Acceptance Date: 3/8/2010
Publication Date: 6/24/2010
Citation: Van Raden, P.M., O'Connell, J.R., Wiggans, G.R., Weigel, K.A. 2010. Filling in missing genotypes using haplotypes. Journal of Dairy Science. 93(E-Suppl. 1):534-35(abstr. 622).

Interpretive Summary:

Technical Abstract: Unknown genotypes can be made known (imputed) from observed genotypes at the same or nearby loci of relatives using pedigree haplotyping, or from matching allele patterns (regardless of pedigree) using population haplotyping. Fortran program findhap.f90 was designed to combine population and pedigree haplotyping. Each chromosome was divided into segments of about 100 markers each. Each genotype was matched to the list of currently known haplotypes sorted from most to least frequent for efficiency. If a match was found (no conflicting homozygote), any remaining unknown alleles in the found haplotype were imputed from homozygous genotypes. The individual's second haplotype was obtained by subtracting its first from its genotype, and the second was checked against remaining haplotypes. If no match was found, the new genotype (or haplotype) was added to the list. After completing population haplotyping, pedigrees were examined to resolve conflicts between parent and progeny haplotypes, locate crossovers that created new haplotypes, and impute haplotypes of nongenotyped ancestors from their genotyped descendants. One processor took 2 h to find haplotypes for 43,385 actual markers of 33,414 Holsteins. For the same population, time increased only to 2.5 h with 500,000 simulated markers but with 500 markers per segment. Computing time increased much less than linearly because most haplotypes were excluded as not matching after just the first few markers. Genotype storage required 13 GB for 500,000 markers, but haplotype storage required only 2.5 GB. Shared haplotypes were stored just once, and only index numbers were stored for individuals instead of full haplotypes. Paternal alleles were determined correctly for 95% of heterozygous markers, and linkage was determined correctly for 98% of adjacent pairs of heterozygous markers in simulated data. Population haplotyping correctly filled 95% of missing high density marker genotypes. Pedigree haplotyping can fill missing genotypes efficiently for nongenotyped ancestors or progeny with lower marker density.