2010 Annual Report
1a.Objectives (from AD-416)
The objectives are to:.
1)identify genetic variations (single nucleotide polymorphisms, or SNPs) in 20 diverse cacao germplasms plus Matina 1-6 for the development of SNP markers;.
2)generate transcript profiles for seven different stages during cacao floral development;.
3)build a set of cacao reference sequences from data generated in this cooperative agreement and from other publicly available DNA sequences; and.
4)generate and host web and FTP sites for data analysis and distribution to project participants, the cacaogenomedb website, and to the scientific public when analyses are complete.
1b.Approach (from AD-416)
RNA will be isolated from the leaves of 20 (plus Matina1-6) diverse cacao varieties by USDA-ARS SHRS staff. NCGR will generate 21 illumina mRNA-Seq libraries and sequence each library to a depth of one flowcell channel. Each library will have greater than 4 million, 90bp sequence reads for a total of more than 7.5 gigabase of mRNA sequence. the 454-generated and publicly available cacao ESTs will be assembled to generate a reference sequence that will be used to align the ilumina sequence data. A database of the cacao ilumina data will be generated and made web-accessible to project collaborators and ultimately to the scientific public. This database will enable the discovery of SNPs and expression profiling analysis. A full SNP report will be generated. This report will identify:
• Reference: the reference sequence against which the variant was detected.
• Variant: an internal unique identifier for the variant.
• Class: S=snp, I=insertion, D=deletion (relative to the reference).
• Position: the base pair at which the variant occurs.
• RefAllele: the allele for the reference sequence.
• VarAllele: the allele for the variant reported.
• Context: sub-sequence of the reference around the variant position.
• BothStrands: has the variant been reported in reads from both strands?
• AvgQual: the average quality of the bases in which the variant was reported.
• MaxQual: the maximum quality of all bases in which the variant was reported.
• NumReadsWithAllele: how many reads total exhibited the variant allele.
• UniqAlns: how many reads exhibiting the allele were from unique alignments.
• FreqDiff: the absolute value of the difference in frequency between varieties.
Results from this analysis will be used to generate SNP panels for use in USDA-ARS cacao breeding programs. RNA isolated for seven cacao floral developmental timepoints will be provided by USDA-ARS SHRS staff. NCGR will generate seven Illumina mRNA-Seq libraries and sequence each library to a depth of one flowcell channel. Each library will have greater than 4 million, 90bp sequence reads for a total of more than 2.5 gigabase of mRNA sequence. Sequence data for expression profiling analysis will be made available through the above mentioned database to project participants, the cacaogenomedb website, and ultimately to the scientific public.
This research relates to inhouse objective: The development and implementation of an international Marker Assisted Selection (MAS) program for cacao is the major objective of this project. This objective involves a combination of hypothesis-driven and non-hypothesis driven research and includes the training of scientists from cacao producing countries in plant breeding, genetics, and the use of molecular markers in a MAS program.
High quality leaf RNA was isolated from 16 genetically diverse cacao accessions, including Matina 1-6 by USDA-ARS SHRS staff. Matina 1-6 leaf RNA was sequenced on the 454 Titanium platform to provide the transcriptome reference sequence. All other leaf RNAs were sequenced on the Illumina platform and the shorter reads aligned to the reference sequence for SNP identification using GSNAP. Approximately 275,000 variants were identified, which after filtering for SNPs appropriate to the Illumina Infinium chip was reduced to ~100,000 SNPs. After further quality filtering, 6,000 SNPs were chosen for the Infinium chip, sufficient for genotyping. Expression data of the ten timepoints of the self-incompatibility time course has been filtered and is currently under analysis.
Monitoring Activities: Project management has been accomplished through regular conference calls, e-mails and two meetings per year.