|Liang, Chengzhi -|
|Mao, Long -|
|Stein, Lincoln -|
Submitted to: Genome Research
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: September 3, 2009
Publication Date: October 19, 2009
Citation: Liang, C., Mao, L., Ware, D., Stein, L. 2009. Evidence-based gene predictions in plant genomes. Genome Research. 10(2):1912-1923. Interpretive Summary: Sequence of a genome is the starting point or blueprint to that describes the “parts” including the genes of an organism. In this work we present recent work on improvements for computationally generating protein-coding gene structures by using information that comes from expressed genes and proteins from the same species as well as closely related organisms. This method builds upon existing open source software, and allows research to combine evidence from different organisms to build gene structures in another. In this paper we present results of the performance of the software comparing to the existing annotations in arabidopsis and rice, as well as the preliminary analysis on a small region in maize.
Technical Abstract: Automated evidence-based gene building is a rapid and cost-effective way to provide reliable gene annotations on newly sequenced genomes. One of the limitations of evidence-based gene builders, however, is their requirement for gene expression evidence—known proteins, full-length cDNAs, or expressed sequence tags (ESTs)—in the species of interest. This limitation is of particular concern for plant genomes, where the rate of genome sequencing is greatly outpacing the rate of EST- and cDNA-sequencing projects. To overcome this limitation, we have developed an evidence-based gene build system (the Gramene pipeline) that can use gene expression evidence across related species. Using the previously annotated plant genomes, the dicot Arabidopsis thaliana and the monocot Oryza sativa, we show that the cross-species ESTs from within monocot or dicot class are a valuable source of evidence for gene predictions. We compare the Gramene pipeline to several widely used gene prediction programs in rice; this comparison shows the pipeline performs favorably at both the gene and exon levels with cross-species gene products only. We discuss the results of testing the pipeline on a 22-Mb region of the newly sequenced maize genome and discuss potential application of the pipeline to other genomes.