Start Date: May 13, 2009
End Date: Jun 30, 2013
RNA will be isolated from the leaves of 20 (plus Matina1-6) diverse cacao varieties by USDA-ARS SHRS staff. NCGR will generate 21 illumina mRNA-Seq libraries and sequence each library to a depth of one flowcell channel. Each library will have greater than 4 million, 90bp sequence reads for a total of more than 7.5 gigabase of mRNA sequence. the 454-generated and publicly available cacao ESTs will be assembled to generate a reference sequence that will be used to align the ilumina sequence data. A database of the cacao ilumina data will be generated and made web-accessible to project collaborators and ultimately to the scientific public. This database will enable the discovery of SNPs and expression profiling analysis. A full SNP report will be generated. This report will identify: • Reference: the reference sequence against which the variant was detected. • Variant: an internal unique identifier for the variant. • Class: S=snp, I=insertion, D=deletion (relative to the reference). • Position: the base pair at which the variant occurs. • RefAllele: the allele for the reference sequence. • VarAllele: the allele for the variant reported. • Context: sub-sequence of the reference around the variant position. • BothStrands: has the variant been reported in reads from both strands? • AvgQual: the average quality of the bases in which the variant was reported. • MaxQual: the maximum quality of all bases in which the variant was reported. • NumReadsWithAllele: how many reads total exhibited the variant allele. • UniqAlns: how many reads exhibiting the allele were from unique alignments. • FreqDiff: the absolute value of the difference in frequency between varieties. Results from this analysis will be used to generate SNP panels for use in USDA-ARS cacao breeding programs. RNA isolated for seven cacao floral developmental timepoints will be provided by USDA-ARS SHRS staff. NCGR will generate seven Illumina mRNA-Seq libraries and sequence each library to a depth of one flowcell channel. Each library will have greater than 4 million, 90bp sequence reads for a total of more than 2.5 gigabase of mRNA sequence. Sequence data for expression profiling analysis will be made available through the above mentioned database to project participants, the cacaogenomedb website, and ultimately to the scientific public.