Submitted to: Plant Physiology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/6/2005
Publication Date: 2/15/2005
Citation: Yao, H., Guo, L., Fu, Y., Wen, T., Borsuk, L.A., Skibbe, D.S., Cui, X., Scheffler, B.E., Cao, J., Ashlock, D.A., Schnable, P.S. 2005. Evaluation of seven ab initio gene prediction programs for the discovery of maize genes. Plant Physiology. 57(3): 445-460.
Interpretive Summary: Large DNA sequencing projects require that putative genes be identified among the millions of base pairs constituting an organism's genetic code. It is from this important characterization that new genes can be discovered and then studied. The prediction of a gene within large stretches of DNA is not an easy task. Several computer programs have been created to perform this task, but a direct comparison to determine which is the best is not easy because the programs are 'taught' using known genes from public databases. Therefore proper testing requires DNA sequences that were not used in the development of these programs. This paper deals with the evaluation of seven such programs based on data that was not used in the development of the programs.
Technical Abstract: Seven ab initio programs (FGENESH, GeneMark.hmm, GENESCAN, GlimmerR, Grail, NetGene2 and SplicePredictor) were evaluated for their accuracy in predicting maize genes using nine maize genes (gl8a, pdc2, pdc3, rf2b, rf2c, rf2d, rf2e1,rth1, rth3). These genes could not have been included in the training sets of the seven prediction programs because these sequences were not released to the public prior to this analysis. On this data set FGENESH is the most accurate program and GeneMark.hmm is second best. These seven programs were also used to identify and establish the structures of two genes in the maize a1-sh2 interval. The ability of FGENESH was also tested on a larger data set consisting of maize genomic survey sequences clustered from 1,815 methylation-filtration and 1,201 high Cot sequences using an EST-guided strategy. In this evaluation FGENESH maintained high specificity as compared to that obtained using the nine maize genes, but sensitivity was reduced. This may be due to the presence of partial exons in the genomic survey sequences and the short lengths of these sequences.