Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #232691

Title: Large-scale enrichment and discovery of gene-associated SNPs

Author
item Buckler, Edward - Ed
item GORE, MICHAEL - CORNELL UNIVERSITY
item WRIGHT, MARK - CORNELL UNIVERSITY
item ERSOZ, ELHAN - CORNELL UNIVERSITY
item BOUFFARD, PASCAL - 454 CORP.
item JARVIE, THOMAS - 454 CORP.
item HURWITZ, BONNIE - COLD SPRING HARBOR LAB
item NARECHANIA, APURVA - COLD SPRING HARBOR LAB
item HARKINS, TIMOTHY - 454 CORP
item GRILLS, GEORGE - CORNELL UNIVERSITY
item Ware, Doreen

Submitted to: The Plant Genome
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/23/2009
Publication Date: 7/10/2009
Citation: Buckler Iv, E.S., Gore, M., Wright, M., Ersoz, E., Bouffard, P., Jarvie, T., Hurwitz, B., Narechania, A., Harkins, T., Grills, G., Ware, D. 2009. Large-scale enrichment and discovery of gene-associated SNPs. The Plant Genome. 2:121-133.

Interpretive Summary: Investigating the connections between genetic variation and phenotypic variation require high quality methods for scoring genetic variation. This research explores and develops a cost effective method for using next generation sequencing technology to identify thousands of genetic variants in maize. This approach combines genome filtration, next generation sequencing, and new analysis approaches. We also have tested this approach on other species, and it appears to work for a wide range of grasses including biofuel grasses.

Technical Abstract: With the recent advent of massively parallel pyrosequencing by 454 Life Sciences it has become feasible to cost-effectively identify numerous single nucleotide polymorphisms (SNPs) within the recombinogenic regions of the maize (Zea mays L.) genome. We developed a modified version of hypomethylated partial restriction (HMPR) technology to construct three gene-enriched HpaII libraries for maize inbred lines B73 (husk and root) and Mo17 (root), which were sequenced using Roche’s 454 Genome Sequencer FLX System. A custom bioinformatics pipeline was developed that dramatically reduced the number of false positive SNPs by identifying and preventing SNP calls from paralogous reference sequence alignments. With this implementation, 108,269 putative SNPs were identified between Mo17 and B73 at an estimated false discovery rate (FDR) of 11.9%. Restricting SNP calls to bases with 2X or greater coverage resulted in the identification of 68,960 putative SNPs at a 6.5% FDR. There was a high concordance (91%) between SNPs identified by our process and SNPs identified by Sanger sequencing of B73 and Mo17 amplicons. This approach has wide applicability to rapidly and reliably detect high quality gene-associated SNPs in large, complex plant genomes.