2013 Annual Report
1a.Objectives (from AD-416):
To create and maintain a curated and integrated web-based relational database of cacao genetic and genomic data.
1b.Approach (from AD-416):
The cacao genome sequencing project will generate a large amount of sequence data, physical map data, and single-nucleotide polymorphism (SNP) data. These data will be produced by USDA and other scientists and will require a website for the deposition, curation, manipulation and distribution of the data.
The cacao genome database will contain comprehensive data of the genetically anchored cacao physical map, annotated EST databases of cacao, cacao maps and markers, all publicly available cacao sequences and the raw and assembled output of the ongoing genome sequencing project. Annotations of ESTs and genomic sequence will include contig assembly, putative function, simple sequence repeats, ORFs, Gene Ontology and anchored position to the cacao physical map where applicable. The integrated map viewer will provide a graphical interface to the genetic, transcriptome and physical mapping information. New cacao map data will be added to CMap, a web-based tool that allows users to view comparisons of genetic and physical maps. ESTs, BACs and markers will be queried by various categories and the search result sites will be linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated cacao sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm, search their sequences for microsatellites using the SSR server or assemble their ESTs using the CAP3 Server.
This project is related to the inhouse objective: The development and implementation of an international Marker Assisted Selection (MAS) Program for cacao is the major objective of this project. This objective involves a combination of hypothesis-driven and non-hypothesis driven research and includes the training of scientists from cacao producing countries in plant breeding, genetics, and use of molecular markers in a MAS program.
The objective of this agreement is to create and maintain a curated and integrated web-based relational database of cacao genetic and genomic data. The cacao genome database is complete and contains comprehensive data of the genetically anchored cacao physical map, annotated Expressed Sequence Tags (EST) databases of cacao, cacao maps and markers, all publicly available cacao sequences and the raw and assembled output of the ongoing genome sequencing project.
On September 10, 2010, the cacao genome website (www.cacaogenomedb.org) was publicly initiated with a draft version of the cacao genome (version 0.9) as well as physical and genetic recombination maps, ESTs, and genetic marker data. At that time, a user had to enter the website through a Public Intellectual Property Resource for Agriculture (PIPRA) portal and register as a user on the site before being able to use the data. This was meant to protect the data from publication by other than the members of our group that had sequenced the genome.
In November, 2011, the improved assembly of the cacao genome (version 1.0) was added to the website, along with new tools such as a simple sequence repeat (SSR) finder.
In July 2012, the cacaogenomedb.org website was migrated to a USDA-ARS server, where long term maintenance of the website continues.
In June 2013, the paper describing the sequencing of the cacao genome was published in Genome Biology. The WSU collaborators were co-authors and contributed to the writing of the paper. In addition, the annotated version 1.0 genome was deposited in Phytozome as the reference genome for cacao. The PIPRA portal was removed from the website. A publication about the creation of the website by the WSU collaborators is in preparation.