Author
Buckler, Edward - Ed | |
GORE, MICHAEL - CORNELL UNIVERSITY | |
WRIGHT, MARK - CORNELL UNIVERSITY | |
ERSOZ, ELHAN - CORNELL UNIVERSITY | |
BOUFFARD, PASCAL - 454 CORP. | |
JARVIE, THOMAS - 454 CORP. | |
HURWITZ, BONNIE - COLD SPRING HARBOR LAB | |
NARECHANIA, APURVA - COLD SPRING HARBOR LAB | |
HARKINS, TIMOTHY - 454 CORP | |
GRILLS, GEORGE - CORNELL UNIVERSITY | |
Ware, Doreen |
Submitted to: The Plant Genome
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 2/23/2009 Publication Date: 7/10/2009 Citation: Buckler Iv, E.S., Gore, M., Wright, M., Ersoz, E., Bouffard, P., Jarvie, T., Hurwitz, B., Narechania, A., Harkins, T., Grills, G., Ware, D. 2009. Large-scale enrichment and discovery of gene-associated SNPs. The Plant Genome. 2:121-133. Interpretive Summary: Investigating the connections between genetic variation and phenotypic variation require high quality methods for scoring genetic variation. This research explores and develops a cost effective method for using next generation sequencing technology to identify thousands of genetic variants in maize. This approach combines genome filtration, next generation sequencing, and new analysis approaches. We also have tested this approach on other species, and it appears to work for a wide range of grasses including biofuel grasses. Technical Abstract: With the recent advent of massively parallel pyrosequencing by 454 Life Sciences it has become feasible to cost-effectively identify numerous single nucleotide polymorphisms (SNPs) within the recombinogenic regions of the maize (Zea mays L.) genome. We developed a modified version of hypomethylated partial restriction (HMPR) technology to construct three gene-enriched HpaII libraries for maize inbred lines B73 (husk and root) and Mo17 (root), which were sequenced using Roche’s 454 Genome Sequencer FLX System. A custom bioinformatics pipeline was developed that dramatically reduced the number of false positive SNPs by identifying and preventing SNP calls from paralogous reference sequence alignments. With this implementation, 108,269 putative SNPs were identified between Mo17 and B73 at an estimated false discovery rate (FDR) of 11.9%. Restricting SNP calls to bases with 2X or greater coverage resulted in the identification of 68,960 putative SNPs at a 6.5% FDR. There was a high concordance (91%) between SNPs identified by our process and SNPs identified by Sanger sequencing of B73 and Mo17 amplicons. This approach has wide applicability to rapidly and reliably detect high quality gene-associated SNPs in large, complex plant genomes. |