Location: Floral and Nursery Plants Research Unit
2012 Annual Report
1)Assemble the partial genome of Cercis chinensis using data from a 454 DNA sequencer; and.
2)Identify molecular markers from this partial preliminary genome, ie, single-nucleotide polymorphisms (SNPs). DNA sequence reads from the Cercis chinensis genome (454 DNA sequencer) were obtained, and assembled using the Abyss assembler as well as a second sequence assembler, CAP3, independently. An Entrez search of the results on the NCBI website for the Cercis chinensis revealed no records for the genome, three hits on the Protein database records, and 15 hits on the Nucleotide database records. The database search also revealed no SNPS for Cercis chinensis. A BLAST analysis of the assembled contigs from CAP3 for the dissimilar database provides results with apparent identities. However, running a BLAST analysis with database of very similar species does not provide any similarity. Using the program RepeatMasker revealed that the genome contains retro-elements of about 1603 elements, 1578 LTR elements and 335 DNA transposons.
BLAST analysis was repeated, with the updated and current version of the GenBank database- presuming additional entries have been provided by researchers globally. There were a few matches, for example, with malate dehydrogenase (Arabidopsis thaliana; bit score 165; e-49) and NAD kinase 2 (Arabidopsis thaliana; bit score 236; e-71). The former provides a match to the conserved domain in the LDH-MDH-like superfamily, with the match of 80/92 (87%) at the amino acid sequence and no gaps in the alignment.
Although the genome assemblies did not provide a single contig, i.e., there were gaps corresponding to unsequenced and presumably bridging sequences, Mummer was used to attempt alignments to whole and partial genome sequences in GenBank. Again, none of these provided overlap, indicating the limits of partial sequence analysis. As the data do not provide a single contig, there are limited options for analyzing these assemblies. In other genome assemblies, using different DNA sequencing strategies and methodologies, the application of other assembly programs have proven successful. It is possible that the sequence data provided, using the 454-Titanium platform, may be enhanced by applying other assembly software.