Location: Subtropical Horticulture Research2013 Annual Report
1a. Objectives (from AD-416):
The objectives are to: 1) identify genetic variations (single nucleotide polymorphisms, or SNPs) in 20 diverse cacao germplasms plus Matina 1-6 for the development of SNP markers; 2) generate transcript profiles for seven different stages during cacao floral development; 3) build a set of cacao reference sequences from data generated in this cooperative agreement and from other publicly available DNA sequences; and 4) generate and host web and FTP sites for data analysis and distribution to project participants, the cacaogenomedb website, and to the scientific public when analyses are complete.
1b. Approach (from AD-416):
RNA will be isolated from the leaves of 20 (plus Matina1-6) diverse cacao varieties by USDA-ARS SHRS staff. NCGR will generate 21 illumina mRNA-Seq libraries and sequence each library to a depth of one flowcell channel. Each library will have greater than 4 million, 90bp sequence reads for a total of more than 7.5 gigabase of mRNA sequence. the 454-generated and publicly available cacao ESTs will be assembled to generate a reference sequence that will be used to align the ilumina sequence data. A database of the cacao ilumina data will be generated and made web-accessible to project collaborators and ultimately to the scientific public. This database will enable the discovery of SNPs and expression profiling analysis. A full SNP report will be generated. This report will identify: • Reference: the reference sequence against which the variant was detected. • Variant: an internal unique identifier for the variant. • Class: S=snp, I=insertion, D=deletion (relative to the reference). • Position: the base pair at which the variant occurs. • RefAllele: the allele for the reference sequence. • VarAllele: the allele for the variant reported. • Context: sub-sequence of the reference around the variant position. • BothStrands: has the variant been reported in reads from both strands? • AvgQual: the average quality of the bases in which the variant was reported. • MaxQual: the maximum quality of all bases in which the variant was reported. • NumReadsWithAllele: how many reads total exhibited the variant allele. • UniqAlns: how many reads exhibiting the allele were from unique alignments. • FreqDiff: the absolute value of the difference in frequency between varieties. Results from this analysis will be used to generate SNP panels for use in USDA-ARS cacao breeding programs. RNA isolated for seven cacao floral developmental timepoints will be provided by USDA-ARS SHRS staff. NCGR will generate seven Illumina mRNA-Seq libraries and sequence each library to a depth of one flowcell channel. Each library will have greater than 4 million, 90bp sequence reads for a total of more than 2.5 gigabase of mRNA sequence. Sequence data for expression profiling analysis will be made available through the above mentioned database to project participants, the cacaogenomedb website, and ultimately to the scientific public.
3. Progress Report:
This research is directly related to inhouse objective 1. Identify, map, and characterize host-plant resistance genes for priority cacao diseases and insect pests, and develop genetic markers for those genes. USDA-ARS and NCGR collaborated on a project to develop single nucleotide polymorphisms (SNPs) for Theobroma cacao. USDA-ARS isolated RNA from cacao leaves and flowers which was sent to NCGR. NCGR prepared cDNA libraries from the RNA and sequenced them on both a Roche454 platform to create a reference transcriptome and the IlluminaGAII platform to generate sufficient sequencing reads to be able to call SNPs. The initial variant report identified ~450,000 SNPs that were eventually used to design an Illumina SNP chip containing 6,000 SNP assays. The chip was used to genotype three cacao mapping populations and generate a saturated genetic recombination map for cacao. These maps were then used as an aid in the assembly of the cacao genome sequence, which was first released on the cacaogenomedb.org website in September, 2010 and recently published in Genome Biology in June, 2013.