Submitted to: Journal of Dairy Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/23/2015
Publication Date: 7/1/2016
Citation: Bickhart, D.M., Hutchison, J.L., Null, D.J., Van Raden, P.M., Cole, J.B. 2016. Reducing animal sequencing redundancy by preferentially selecting animals with low-frequency haplotypes. Journal of Dairy Science. 99(7):5526-5534.
Interpretive Summary: Whole genome sequencing studies often sequence redundant DNA segments (or haplotypes) within the population by virtue of their higher frequency. We have developed a method for dairy cattle that maximizes the unique information obtained by sequencing studies by preferentially selecting individuals that have lower frequency haplotypes present in the population. This method shows improved efficiency in sample selection over other previously published methods and will benefit dairy geneticists and computational biologists.
Technical Abstract: Many studies leverage targeted whole genome sequencing (WGS) experiments in order to identify rare and causal variants within populations. As a natural consequence of experimental design, many of these surveys tend to sequence redundant haplotype segments due to high frequency in the base population, and the variants discovered within the data are difficult to phase. We propose a new algorithm, called Inverse Weight Selection (IWS), that preferentially selects individuals based on the cumulative presence of rare frequency haplotypes in order to maximize the efficiency of WGS surveys. In order to test the efficacy of this method, we used genotype data from 112,113 registered Holstein bulls derived from the US national dairy database. We demonstrate that IWS is at least 6.8% more efficient than previously published methods in the selection of the least number of individuals required to sequence all haplotype segments greater than or equal to 4% frequency in the US Holstein population. We also suggest that future surveys focus on sequencing homozygous haplotype segments as a first-pass in order to achieve a 50% reduction in cost with an added benefit of phasing variant calls efficiently. Together this new selection algorithm and experimental design suggestion significantly reduce the overall cost of variant discovery through WGS experiments, making surveys for causal variants influencing disease and production ever more efficient.