Submitted to: Insect Biochemistry and Molecular Biology
Publication Type: Peer reviewed journal
Publication Acceptance Date: 9/17/2007
Publication Date: 3/1/2008
Citation: Park, Y., Aikins, J., Wang, L.J., Beeman, R.W., Oppert, B.S., Lord, J.C., Brown, S.J., Lorenzen, M.D., Richards, S., Weinstock, G., Gibbs, R. 2008. Analysis of transcriptome data in the red flour beetle, Tribolium castaneum. Insect Biochemistry and Molecular Biology 38: 380-386. Interpretive Summary: Although the genome sequence is now available for the red flour beetle, deciphering actual genes (i.e., annotation) remains a challenging task. We have analyzed DNA transcripts from different red flour beetle tissues to help with the genome annotation. From more than 61,000 transcripts, we were able to match about 39% of genes that had been predicted from an automated gene prediction program. However, about 13% of the transcripts were not within predicted genes, indicating that many real genes were missed by the automated program. Our analysis suggests that there are approximately 7,500 genes in the red flour beetle genome. These data provide evidence of the power of high throughput sequencing of DNA transcripts in refining genomic data. Knowledge of red flour beetle genes should enable improved methods of controlling these and other insect pests.
Technical Abstract: The genome sequence of Tribolium castaneum, a coleopteran pest of stored products, has recently been determined. In order to facilitate accurate annotation and detailed functional analysis of this genome, we have compiled and analyzed all available expressed sequence tag (EST) data. The raw data consist of a total of 61,228 EST sequences, including 10,704 downloaded from NCBI and an additional 50,524 derived from 32,544 clones generated in our laboratories. These sequences were derived from six different tissue- or stage-specific cDNA libraries, namely embryo, hindgut and Malpighian tubules, ovary, and head (from adult insects) and carcasses of mixed-stage, whole larvae. The 61,228 sequences were assembled into 12,269 clots (UniESTs) of which 10,134 mapped onto 6,463 (39%) of the 16,422 predicted genes that comprise the GLEAN set. Another ~1,600 UniESTs (13% of the total) with high matches to the genome were not predicted in the GLEAN set, suggesting that a considerable number of transcribed sequences were missed by the GLEAN prediction algorithm. We conservatively estimate that the current EST set represents more than 7,500 transcription units.