|Smith, Timothy - Tim|
|Van Tassell, Curtis - Curt|
Submitted to: Plant and Animal Genome Conference Proceedings
Publication Type: Abstract Only
Publication Acceptance Date: 10/3/2007
Publication Date: 1/2/2008
Citation: Smith, T.P., Matukumalli, L., Sonstegard, T.S., Schnabel, R., Taylor, J., Haudenschild, C., Lawley, C., Moore, S., Van Tassell, C.P. 2008. Generation of large numbers of SNP in cattle by coupling reduced genome representation with high throughput sequencing (abstract). Plant and Animal Genome XVI Conference Proceedings. Poster No. P90. Interpretive Summary:
Technical Abstract: Whole genome sequencing projects have produced draft sequences for species from diverse evolutionary clades for comparative evolutionary studies. Generally, these projects have not simultaneously created extensive single nucleotide polymorphism (SNP) resources for use in genetics studies within the species sequenced. For example, the bovine genome sequencing project produced a draft sequence and an initial set of putative SNP for genetic studies in cattle, but the SNP identified in the project come primarily from comparison of whole genome shotgun reads, and have relatively low validation rate at or below 50%. We present an approach for de novo SNP discovery that uses restriction enzyme digestion to create reduced representation libraries (RRL), which are sequenced by high-throughput, short-read sequencing to generate high-confidence SNP and simultaneously estimate minor allele frequencies (MAF). First, in silico digests of the bovine sequence assembly were performed to evaluate the size distribution, repetitive element content and genomic coordinates of fragments produced by alternative restriction enzymes. Next, RRL representing ~2% of the genome were created by isolation of DNA fragments in the 70-120 bp range from HaeIII digestion of DNA pools from three cattle populations. These RRL libraries were sequenced to an average 10-fold coverage of fragment ends, which identified >70,000 putative SNP. Genotyping validated 88% of 25,834 genome-wide sampled SNP, and observed MAF was correlated 0.67 with sequence allele frequency. This approach and derivatives have utility for efficient generation of SNP resources for any species, regardless of the availability of a genome sequence.