Skip to main content
ARS Home » Northeast Area » Geneva, New York » Plant Genetic Resources Unit (PGRU) » Research » Publications at this Location » Publication #190487

Title: SEAN: SNP PREDICTION AND DISPLAY PROGRAM UTILIZING EST SEQUENCE CLUSTERS

Author
item HUNTLEY, DEREK - IMPERIAL COLLEGE
item BALDO, ANGELA
item JOHRI, SAURABH - IMPERIAL COLLEGE
item SERGOT, MAREK - IMPERIAL COLLEGE

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/23/2005
Publication Date: 12/15/2005
Citation: Huntley, D., Baldo, A.M., Johri, S., Sergot, M. 2005. Sean: snp prediction and display program utilizing est sequence clusters. Bioinformatics. 10.1093/bioinformatics/btk006.

Interpretive Summary: Expressed sequence tags (ESTs) are an important resource for identifying polymorphisms in transcribed regions. In humans, for example, estimates of polymorphism are in the range of 1 every 1.3kb, but in crops with a narrow genetic base such as cultivated tomatoes, polymorphisms are much less frequent, as low as 1 every 7kb. Computational methods are necessary to identify genomic regions likely to contain these polymorphisms. We developed a method using multiple sequence alignments and rules of single nucleotide polymorphism (SNP) abundance and sequence identity to predict SNPs from EST data. A unique feature of this tool is the ability to distinguish within- and between- cultivar and library polymorphisms. SNP predictions have been validated in public human and mouse data and with resequencing of tomato cultivars conserved at the USDA-ARS, PGRU, Geneva germplasm repository. Overall efficiency of SNP discovery was increased 10-fold relative to resequencing arbitrary regions of the tomato genome. This paper supports NP301 vision statement “Furnishing genetic and bioinformatic tools, genomic information, and genetic raw materials to enhance American agricultural productivity to ensure a high quality, safe supply of food, fiber, feed, ornamentals, and industrial products."

Technical Abstract: SEAN is an application that predicts SNPs using multiple sequence alignments produced from EST clusters. The algorithm uses rules of sequence identity and SNP abundance to determine the quality of the prediction. A Java viewer is provided to display the EST alignments and predicted SNPs.