Submitted to: Journal of Dairy Science
Publication Type: Abstract Only
Publication Acceptance Date: 3/19/2015
Publication Date: 7/12/2015
Citation: Bickhart, D.M., Cole, J.B., Hutchison, J.L. 2015. Animal selection for whole genome sequencing by quantifying the unique contribution of homozygous haplotypes sequenced. Journal of Dairy Science. 98(Suppl. 2)/Journal of Animal Science 93(Suppl. 3):649(abstr. W86).
Technical Abstract: Major whole genome sequencing projects promise to identify rare and causal variants within livestock species; however, the efficient selection of animals for sequencing remains a major problem within these surveys. The goal of this project was to develop a library of high accuracy genetic variants found within diverse haplotypes that were in a homozygous state identified from animal genotypes in the national database. An inverted weight function that calculated the value of sequencing an animal based on the sum of the rarity of the haplotypes it had in its SNP-based genotype was used to calculate the estimate, as more common haplotypes would likely be represented within animals already sequenced in subsequent iterations. A weight value was assigned to each 75 SNP haplotype based on the inverse of its frequency within genotyped animals in the national database. Each individual’s haplotype weights were summed, and the highest scoring animal was selected for sequencing. Haplotypes from selected animals were removed from future consideration, and the cumulative scores of all remaining animals were recalculated in the absence of those selected haplotypes. This iteration continued until all haplotypes above a frequency threshold of 4% had been selected for sequencing. There were a total of 3,680 75-SNP haplotypes above a frequency of 4% in the national database and 484,522 genotyped Holstein animals. We compared this method against the selection of animals for sequencing based on three additional algorithms: (1) an ascending relatedness weight function, (2) an unbiased predictor of imputation accuracy, and (3) a random selection of animals from the population. By calculating an iterative summed score based on the inverse value of an animal’s unsequenced haplotypes, one can quickly determine the value of sequencing a new individual and avoid data redundancy that plagues projects that focus on sequencing highly related individuals in a population.