|Van Tassell, Curtis - Curt|
|Smith, Timothy - Tim|
Submitted to: Nature Methods
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/7/2007
Publication Date: 2/24/2008
Citation: Van Tassell, C.P., Smith, T.P., Matukumalli, L.K., Taylor, J.F., Schnabel, R.D., Lawley, C.T., Haudenschild, C., Moore, S.S., Warren, W.C., Sonstegard, T.S. 2008. SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nature Methods. 5:247-252. Interpretive Summary: One of the complementary projects of the Bovine Genome Sequence Project was to produce SNP marker resources to enhance genetic analysis and improvement of the various cattle populations for economic traits of importance. Based on previous studies using 10,000 of the SNP markers from the genome project, it was predicted that a high-density SNP assay of more than 30,000 informative markers would be needed to implement genome-wide selection studies in dairy cattle. Our manuscript addresses this problem by describing an economical, single-step method for SNP discovery and validation that employs “next generation” sequencing technologies to characterize reduced representation libraries (RRL) from specific target populations. This method provides sufficient coverage to generate genome-wide informative SNP resources with fairly accurate estimates of minor allele frequencies (MAF) from each source population. Furthermore, this approach can be practically applied to species even when only low coverage draft genome sequence is available. Our results from cattle effectively show that a completely separate effort to determine MAF for large numbers of SNPs prior to design of high-density genotyping assays for GWA is not necessary.
Technical Abstract: Genome projects routinely produce draft sequences for species from diverse evolutionary clades, but generally do not create single nucleotide polymorphism (SNP) resources. We present an approach for de novo SNP discovery based on short-read sequencing of reduced representation libraries (RRL) to generate high-confidence SNP and simultaneously estimate minor allele frequencies (MAF). In silico digests of the bovine sequence assembly evaluated the size distribution, repetitive element content and genomic coordinates of fragments produced by various restriction enzymes. Subsequently, RRL representing ~2% of the genome were created by HaeIII digestion and size fractionation of DNA pools from three cattle populations. Each library was sequenced to an average 10-fold coverage of fragment ends, which identified >70,000 putative SNP. Genotyping validated 88% of a subset of 25,834 genome-wide SNP, and observed MAF was correlated 0.67 with sequence allele frequency. This approach and derivatives have utility for efficient generation of SNP resources for any species, regardless of the availability of a genome sequence.