Submitted to: Genomics
Publication Type: Peer reviewed journal
Publication Acceptance Date: 11/24/2005
Publication Date: 1/9/2006
Citation: Rep, M., Duyvesteijn, R.G., Gale, L., Usgaard, T.R., Cornelissen, B.J., Ma, L., Ward, T.J. 2006. The presence of GC-AG introns in N. crassa and other Euascomycetes determined from analysis of complete genomes: Implications for automated gene prediction. Genomics. 87:338-347. Interpretive Summary: Numerous fungal genome sequencing projects have recently been completed or are currently underway. Information from these projects is expected to advance medical, agricultural and biotechnological research. However, the vast majority of gene models have not been experimentally confirmed, making accurate methods for automated gene prediction essential. Correct intron recognition and placement is one of the most critical problems for automated genome annotations. This problem is compounded by the presence of introns with non-standard splice sequences, which are not taken into account by current software for automated gene prediction. A combination of experimental and computational approaches were utilized to determine the frequency of introns with non-standard GC-AG splice sequences in the model fungus Neurospora crassa and in the genome of the plant pathogen Fusarium graminearum. Our results indicate that at least 1% of all introns in these fungi have non-standard GC-AG splice sites, which are incorrectly annotated in current gene models. This means that in each of the fungal genomes examined as many as 200 genes are incorrectly annotated or missed altogether as a result of the failure to account for GC-AG introns. In order to develop better models for automated gene prediction we developed a consensus search motif for computational prediction of fungal GC-AG introns. This motif was successfully applied to the N. crassa genome, indicating that incorporation of this, or similar, consensus motifs in automated gene prediction software will substantially improve the quality and utility of fungal genome annotations.
Technical Abstract: A combination of experimental and computational approaches was employed to identify introns with non-canonical GC-AG splice sites (GC-AG introns) within euascomycete genomes. Evaluation of 2335 cDNA-confirmed introns from Neurospora crassa revealed 27 such introns (1.2 %). A similar frequency (1.0%) of GC-AG introns was identified in Fusarium graminearum, where 3 of 292 cDNA confirmed introns contained GC-AG splice sites. Computational analyses of the N. crassa genome using a GC-AG intron consensus sequence identified an additional 20 probable GC-AG introns in this fungus. For eight of the 47 GC-AG introns identified in N. crassa a GC donor site is also present in a homolog from either M. grisea, F. graminearum or A. nidulans. In most cases, however, homologs in these fungi contain a GT-AG intron or no intron at the corresponding position. These findings have important implications for fungal genome annotation, as the automated annotations of euascomycete genomes incorrectly identified intron boundaries for all of the confirmed and probable GC-AG introns reported here.