|Matthews, Benjamin - Ben|
Submitted to: Biomed Central (BMC) Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 5/11/2010
Publication Date: 7/15/2010
Publication URL: http://www.biomedcentral.com/1756-0500/3/183
Citation: Hosseini, P., Tremblay, A., Matthews, B.F., Alkharouf, N. 2010. An efficient annotation and gene expression derivation tool for Illumina Solexa datasets. Biomed Central (BMC) Genomics. 3:183. Interpretive Summary: Soybean cyst nematode and soybean rust are two important and devastating pathogens affecting soybean yields worldwide. We are determining gene expression patterns and discovering new genes in soybean infected with nematodes and with rust using a new DNA sequencing platform developed by Solexa. This platform generates extremely large amounts of DNA sequence data. The amount of data generated can be overwhelming, if computer software programs are not available to analyze the data. The soybean data we are generating is not well-supported by available software. We developed a computer software program that allows the user to determine the level of expression of soybean genes and to identify new soybean genes rapidly from this large amount of data. This software, TASE, also works using data from other organisms. It provides the user with an extremely fast means of calculating gene expression and annotates the genes with its function, if known. This software will be of use to researches who are analyzing large amounts of sequence data produced by new DNA sequencing platforms.
Technical Abstract: Next-generation DNA sequencing platforms such as 454, Solexa and SOLiD provide unprecedented genome depth and coverage unlike any other sequencing technology. Data produced from Solexa sequencing is well over a terabyte worth of images and gigabytes of short nucleotide sequences ranging from 40 - 120 nt. The ability to translate sequenced DNA into meaningful annotation for biological application is therefore of great concern and importance. Very easily, one can get overwhelmed with such a great volume of textual, unannotated data irrespective of read quality or size. We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene’s presumed function, from any given CASAVA-build. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of DNA sequence data.