Submitted to: Mid-South Computational Biology and Bioinformatics Society Conference
Publication Type: Abstract Only
Publication Acceptance Date: March 11, 2011
Publication Date: N/A
Mapping millions of short DNA sequences a reference genome is a necessary step in many experiments designed to investigate the expression of genes involved in disease resistance. This is a difficult task in which several challenges often arise resulting in a suboptimal mapping. This mapping process is even more complex in the presence of genetic material from multiple organisms. In such a metatranscriptomic study, it is important to be able to derive the likely source of the sequence reads and accurately map them to the possible genomes and genomic locations in the correct proportions. A metatranscriptomic analysis and sequence mapping pipeline is currently being developed, which uses available short read sequencing tools with in-house computational and probabalistic methods to perform this task. Sequences are initially mapped using the open source software program Bowtie and these initial mappings are used to compute probabalistic assignment scores for each sequence mapping to multiple genomes. This allows the mapping of each sequence to the “correct” genome. As sequences are assigned, the mapping probabilities are updated. This method has been applied to an Illumina short read data set to study disease resistance in Zea mays, and preliminary results are reported. Future work includes extending these methods for application to data sets with sequences possibly from more than two organisms.