SNP DISCOVERY BY AUTOMATED SEQUENCING OF RNA
Subtropical Horticulture Research
2012 Annual Report
1a.Objectives (from AD-416):
The objectives are to:.
1)identify genetic variations (single nucleotide polymorphisms, or SNPs) in 20 diverse cacao germplasms plus Matina 1-6 for the development of SNP markers;.
2)generate transcript profiles for seven different stages during cacao floral development;.
3)build a set of cacao reference sequences from data generated in this cooperative agreement and from other publicly available DNA sequences; and.
4)generate and host web and FTP sites for data analysis and distribution to project participants, the cacaogenomedb website, and to the scientific public when analyses are complete.
1b.Approach (from AD-416):
RNA will be isolated from the leaves of 20 (plus Matina1-6) diverse cacao varieties by USDA-ARS SHRS staff. NCGR will generate 21 illumina mRNA-Seq libraries and sequence each library to a depth of one flowcell channel. Each library will have greater than 4 million, 90bp sequence reads for a total of more than 7.5 gigabase of mRNA sequence. the 454-generated and publicly available cacao ESTs will be assembled to generate a reference sequence that will be used to align the ilumina sequence data. A database of the cacao ilumina data will be generated and made web-accessible to project collaborators and ultimately to the scientific public. This database will enable the discovery of SNPs and expression profiling analysis. A full SNP report will be generated. This report will identify:
• Reference: the reference sequence against which the variant was detected.
• Variant: an internal unique identifier for the variant.
• Class: S=snp, I=insertion, D=deletion (relative to the reference).
• Position: the base pair at which the variant occurs.
• RefAllele: the allele for the reference sequence.
• VarAllele: the allele for the variant reported.
• Context: sub-sequence of the reference around the variant position.
• BothStrands: has the variant been reported in reads from both strands?
• AvgQual: the average quality of the bases in which the variant was reported.
• MaxQual: the maximum quality of all bases in which the variant was reported.
• NumReadsWithAllele: how many reads total exhibited the variant allele.
• UniqAlns: how many reads exhibiting the allele were from unique alignments.
• FreqDiff: the absolute value of the difference in frequency between varieties.
Results from this analysis will be used to generate SNP panels for use in USDA-ARS cacao breeding programs. RNA isolated for seven cacao floral developmental timepoints will be provided by USDA-ARS SHRS staff. NCGR will generate seven Illumina mRNA-Seq libraries and sequence each library to a depth of one flowcell channel. Each library will have greater than 4 million, 90bp sequence reads for a total of more than 2.5 gigabase of mRNA sequence. Sequence data for expression profiling analysis will be made available through the above mentioned database to project participants, the cacaogenomedb website, and ultimately to the scientific public.
This project is related to the inhouse objective: The development and implementation of an international Marker Assisted Selection (MAS) program for cacao is the major objective of this project. This objective involves a combination of hypothesis-driven and non-hypothesis driven research and includes the training of scientists from cacao producing countries in plant breeding, genetics and the use of molecular markers in a MAS program.
The single nucleotide polymorphism (SNP) data was analyzed to generate haplotypes for all parents of the cacao mapping populations. Use of haplotypes in quantitative trait loci (QTL) analyses and in analysis of qualitative traits improved the identification of candidate genes.