Location: Corn Insects and Crop Genetics ResearchTitle: Constructing Zea mays genes from RNA-Seq expression data using FINDER - a fully automated gene annotator
|BANERJEE, SAGNIK - Iowa State University|
|BHANDARY, PRIYANKA - Iowa State University|
Submitted to: Maize Annual Meetings
Publication Type: Abstract Only
Publication Acceptance Date: 3/6/2021
Publication Date: 3/8/2021
Citation: Banerjee, S., Bhandary, P., Woodhouse, M.H., Sen, T.Z., Wise, R.P., Andorf, C.M. 2021. Constructing Zea mays genes from RNA-Seq expression data using FINDER - a fully automated gene annotator. Maize Annual Meetings. 41.
Technical Abstract: Gene annotation in eukaryotes is a non-trivial task that requires meticulous analysis of expression data. The presence of transposable elements and sequence repeats in eukaryotic genomes adds to this complexity, as does overlapping genes and genes that produce numerous transcripts. Currently available software annotate genomes by relying on full-length cDNA or on a database of splice junctions which makes them susceptible to the errors in the input. We present FINDER, which automates downloading of expression data from NCBI, optimizes read alignment, assembles transcript and performs gene prediction. FINDER is optimized to map reads with different settings to capture all biologically relevant alignments with special attention to micro-exons (exon length less than 51 nucleotides). We configured FINDER to apply statistical changepoint detection to read coverage data which led to the discovery of overlapping genes on the same strand and accurately redefine the boundaries of some overlapping genes on opposite strands. FINDER further reports transcripts and recognizes genes that are expressed under specific conditions. FINDER integrates prediction results from BRAKER2 with assemblies constructed from expression data to approach the goal of exhaustive genome annotation. FINDER accurately reconstructed 22,198 and 25,156 transcripts in Arabidopsis thaliana and Zea mays respectively – about 4000 more transcripts than BRAKER2, MAKER2 and PASA. Even in different groups like transcripts with micro-exons, overlapping transcripts etc., FINDER reported a superior performance. The pipeline scores genes as high confidence or low confidence based on the available evidence. It is capable of processing eukaryotic genomes of all sizes and requires no manual supervision – ideal for bench researchers with limited experience in handling computational tools.