Skip to main content
ARS Home » Pacific West Area » Dubois, Idaho » Range Sheep Production Efficiency Research » Research » Publications at this Location » Publication #398904

Research Project: Agroecological Approach to Enhance U.S. Sheep Industry Viability and Rangeland Ecosystem Conservation

Location: Range Sheep Production Efficiency Research

Title: Using whole genome sequence to compare variant callers and breed differences of US sheep

item STEGEMILLER, MORGAN - University Of Idaho
item REDDEN, REID - Texas A&M University
item NOTTER, DAVID - Virginia Polytechnic Institution & State University
item TAYLOR, TODD - University Of Wisconsin
item Taylor, Joshua - Bret
item COCKETT, NOELLE - Utah State University
item Heaton, Michael - Mike
item KALBFLEISCH, THEODORE - University Of Kentucky
item MURDOCH, BRENDA - University Of Idaho

Submitted to: Frontiers in Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/22/2022
Publication Date: 1/4/2023
Citation: Stegemiller, M.R., Redden, R.R., Notter, D.R., Taylor, T., Taylor, J.B., Cockett, N.E., Heaton, M.P., Kalbfleisch, T.S., Murdoch, B.M. 2023. Using whole genome sequence to compare variant callers and breed differences of US sheep. Frontiers in Genetics. 13. Article 1060882.

Interpretive Summary: As livestock genome sequence becomes abundant and widely available, it can be used as a genetic resource to improve selection for healthier and more productive animals. However, obtaining accurate genetic information requires specialized software and custom approaches for accurately identifying and tracking variation in the DNA sequences. The aim of this study was to determine the most accurate approach to identify DNA sequence variation and use it to identify breed-associated DNA markers in U.S. sheep. We identified 10.5 million markers in all 14 breeds and 1,849 were unique to the Romanov breed. The markers identified from genome data improved the resolution of breed analysis and were critical for identifying Romanov breed-associated SNPs. The new Romanov markers can be used to estimate the approximate Romanov composition in cross-bred animals with unknown pedigrees. Since Romanov ewes are known to have litters of five or more lambs, managing the amount of Romanov germplasm is important for optimizing their reproduction in the flock.

Technical Abstract: As whole genome sequence (WGS) data sets become abundant and widely available, so has the need for variant detection and scoring. The aim of this study was to compare the accuracy of commonly used variant calling programs, Freebayes and GATK HaplotypeCaller (GATK-HC), and to use U.S. sheep WGS data sets to identify novel breed-associated SNPs. Sequence data from 145 sheep consisting of 14 U.S. breeds were filtered and biallelic single nucleotide polymorphisms (SNPs) were retained for genotyping analyses. Genotypes from both programs were compared to each other and to genotypes from bead arrays. The SNPs from WGS were compared to the bead array data with breed heterozygosity, principal component analysis and in identifying breed associated SNPs to analyze genetic diversity. The average sequence read depth was 2.78 greater with 6.11% more SNPs being identified in Freebayes compared to GATK-HC. The genotype concordance of the variant callers to bead array data was 96.0% and 95.5% for Freebayes and GATK-HC, respectively. Genotyping with WGS identified 10.5 million SNPs from all 145 sheep. This resulted in an 8% increase in measured heterozygosity and greater breed separation in the principal component analysis compared to the bead array analysis. There were 1,849 SNPs identified in only the Romanov sheep where all 10 rams were homozygous for one allele and the remaining 135 sheep from 13 breeds were homozygous for the opposite allele. Both variant calling programs had greater than 95% concordance of SNPs with bead array data, and either was suitably accurate for ovine WGS data sets. The use of WGS SNPs improved the resolution of PCA analysis and was critical for identifying Romanov breed-associated SNPs. Subsets of such SNPs could be used to estimate germplasm composition in animals without pedigree information.