The objective of this research is to analyze data obtained from 454 sequencing technologies (performed through an RSA with University of Illinois Biotechnology Center (UIBC) on several diverse nursery crops. This data will be used to develop molecular markers, understand taxonomic relationships and evolutionary history, and to mine for viruses of concern in ornamental plants.

Sequencing data will be obtained from the UIBC 454-Titanium platform from several grass species (for comparative genomics), crapemyrtle, elm, redbud, hackberry, and ash (for marker development or mining for genes related to stress tolerance). In addition, data from at least one taxa that has a high likelihood of harboring viruses of concern (such as beautyberry), and data from a plant pathogenic fungus (Rhizoctonia) may be included. Students, postdocs, and or faculty at the Cooperators facility who have experience and expertise in bioinformatics and comparative genomics will help to analyze this data, as well as compare this data to sequence data acquired from other sources (GenBank, other sequencing projects).

Cercis chinensis is a species of redbud that is becoming increasingly popular in the landscape. An understanding of its genome and the identification of molecular markers will assist in breeding new cultivars. This project has two objectives: .
1)Assemble the partial genome of Cercis chinensis using data from a 454 DNA sequencer; and.
2)Identify molecular markers from this partial preliminary genome, ie, single-nucleotide polymorphisms (SNPs). DNA sequence reads from the Cercis chinensis genome (454 DNA sequencer) were obtained, and assembled using the Abyss assembler as well as a second sequence assembler, CAP3, independently. An Entrez search of the results on the NCBI website for the Cercis chinensis revealed no records for the genome, three hits on the Protein database records, and 15 hits on the Nucleotide database records. The database search also revealed no SNPS for Cercis chinensis. A BLAST analysis of the assembled contigs from CAP3 for the dissimilar database provides results with apparent identities. However, running a BLAST analysis with database of very similar species does not provide any similarity. Using the program RepeatMasker revealed that the genome contains retro-elements of about 1603 elements, 1578 LTR elements and 335 DNA transposons.

BLAST analysis was repeated, with the updated and current version of the GenBank database- presuming additional entries have been provided by researchers globally. There were a few matches, for example, with malate dehydrogenase (Arabidopsis thaliana; bit score 165; e-49) and NAD kinase 2 (Arabidopsis thaliana; bit score 236; e-71). The former provides a match to the conserved domain in the LDH-MDH-like superfamily, with the match of 80/92 (87%) at the amino acid sequence and no gaps in the alignment.

Although the genome assemblies did not provide a single contig, i.e., there were gaps corresponding to unsequenced and presumably bridging sequences, Mummer was used to attempt alignments to whole and partial genome sequences in GenBank. Again, none of these provided overlap, indicating the limits of partial sequence analysis. As the data do not provide a single contig, there are limited options for analyzing these assemblies. In other genome assemblies, using different DNA sequencing strategies and methodologies, the application of other assembly programs have proven successful. It is possible that the sequence data provided, using the 454-Titanium platform, may be enhanced by applying other assembly software.

Last Modified: 4/20/2014
