Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Animal Genomics and Improvement Laboratory » Research » Publications at this Location » Publication #291151

Title: Tools to exploit sequence data to find new markers and disease loci in dairy cattle

item Bickhart, Derek
item HARRIS, LEWIN - University Of California
item Liu, Ge - George

Submitted to: Journal of Dairy Science
Publication Type: Abstract Only
Publication Acceptance Date: 2/23/2013
Publication Date: 7/8/2013
Citation: Bickhart, D.M., Harris, L., Liu, G. 2013. Tools to exploit sequence data to find new markers and disease loci in dairy cattle. Journal of Dairy Science. 96(E-Suppl. 1):624 (abstr. 557).

Interpretive Summary:

Technical Abstract: The decrease in cost of Next-Generation Sequencing has brought the technology into the realm of practical applications in livestock genomics. Recently, the 1000 Bulls Project has heralded the possibility of using full sequence data to improve imputation and detect disease loci within select founder bulls. Sadly, informatics tools designed to utilize such data have not yet reached maturity, as many currently available programs are hard-coded to call variants only in human subjects or take an inordinate amount of time for analysis. With these challenges and prospects in mind, we have developed a comprehensive variant detection pipeline that uses a variety of information derived from sequence data in order to call SNP, INDEL and structural variants within the genomes of individuals. The pipeline is designed to be fully automated, is capable of being restarted in the case of errors and can be run on different computing architectures. We have run our pipeline on sequence data derived from a famous Holstein bull. Despite having 87 gigabases (30X coverage of the genome) of sequence for this bull, our pipeline took only 48 hours to fully analyze the data using 20 processor cores and less than 32 gigabytes of ram. Initial filtering of this data has revealed one million candidate SNP and 759 copy number variants (CNV). An annotation program incorporated into the pipeline has also revealed putative functional impacts of these variants and has identified more than 17,000 non-synonymous SNP that could alter protein function in this individual. The pipeline provides an efficient and freely available tool for researchers to process cattle genomic sequence data to detect genetic variants for use in the dairy industry.