Submitted to: BioMed Central (BMC) Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/14/2008
Publication Date: 12/4/2008
Citation: Wiedmann, R.T., Smith, T.P., Nonneman, D.J. 2008. SNP discovery in swine by reduced representation and high throughput pyrosequencing. BioMed Central (BMC) Genetics. 9:81.
Interpretive Summary: The availability of pig genome sequence, a high density of markers, and cost effective SNP genotyping will allow genome-wide association studies in swine. A major limitation to the development of highly parallel genotyping assays for swine is a lack of suitable SNPs for genotyping. This project was designed to identify large numbers of single nucleotide polymorphisms (SNP) distributed throughout the pig genome that could be used for high density genotyping. We used reduced representation sequencing to reduce the complexity of the genome and massively parallel second-generation sequencing to identify large numbers of high-confidence SNP for high density genotyping on a cost effective platform. The library was comprised of DNA pooled from twenty-six industry-relevant boars, which was digested to completion with a restriction enzyme and a representative portion of the individual genomes was obtained by DNA fragment size-selection. About five million sequence reads were collected and assembled into over 420,000 fragments with an average depth of about 8-fold. Putative SNPs were identified if each of two alleles appeared at least twice. The average depth of the assemblies containing SNP was 12.5 fold. About 130,000 single nucleotide polymorphisms (SNP) were found in over 51,000 unique fragments randomly distributed over the pig genome. This process was a highly effective way to identify a large number of pig SNPs to produce a high density, cost effective genotyping platform.
Technical Abstract: A reduced representation library (RRL) of porcine genomic fragments was used to identify SNP from a pool of DNA isolated from 26 animals (52 chromosomes) relevant to current pork production. Treatment of the pooled DNA with a restriction enzyme, coupled with gel-based size selection of 450 base pair fragments, produced an RRL representing 4% of the swine genome (an estimated 300,000 unique genomic fragments). Approximately 5 million sequence reads representing the fragment ends were assembled into contigs having an overall observed depth of 7.65-fold coverage. Differences between the reference assembly and sequence from individual chromosomes in the DNA pool were identified as putative SNP when alternate alleles were observed at least twice. The approximate minor allele frequency was estimated from the number of observations of the alternate alleles. The average coverage at the SNPs was 12.6-fold. This approach identified 130,499 SNPs in 51,852 contigs (one SNP in 849 bp of unique non-repetitive sequence). Comparison to swine genome draft sequence indicated 47,481 SNP (36%) and 16,696 contigs (32%) mapped to a position on a sequenced pig chromosome and the distribution was essentially random. We genotyped a sample of 176 putative SNPs and 168 (95.5%) were confirmed to have segregating alleles; the correlation of the observed minor allele frequency (MAF) to that predicted from the sequence data was 0.58. The process was a highly efficient means to identify a large number of porcine SNP having high validation rate to be used in an ongoing international collaboration to produce a highly parallel genotyping assay for swine.