Location: Subtropical Horticulture Research
Project Number: 6038-21000-022-03-S
Project Type: Non-Assistance Cooperative Agreement
Start Date: Jul 1, 2012
End Date: Dec 31, 2015
1. Develop a reference sequence of expressed genes in leaf and flower of various tropical and subtropical fruit trees including mango (Mangifera indica) and longan (Dimocarpus longan). 2. Identify genetic variations (single nucleotide polymorphisms, or SNPs) in expressed genes from genetically diverse cultivars of these tree species for the development of SNP markers. 3. Generate and host web-based visualization of data and FTP sites for data analysis and distribution to project participants and to the scientific public when analyses are complete and a manuscript describing the work has been accepted.
RNA will be isolated from the leaves and flowers and developing fruit of a reference cultivar of the tree species, which is usually the most important commercial cultivar, by USDA-ARS SHRS staff. The collaborator will generate transcriptome (mRNA-Seq) libraries using in-house developed protocols. The reference cultivar library will be normalized and sequenced on the 454 platform and assembled to provide a reference transcriptome dataset. Individual read lengths are expected to average >400 nts, and assembled transcript sequences are expected to range up to ~12 kb. RNA from the other genetically diverse cultivars of that tree species will be sequenced on the Illumina platform using the most recent mRNA-Seq protocol improvements and read lengths of up to 120 nts. Sequence reads will be mapped to the reference transcriptome. A variant report will be generated that describes positions and statistical significance of mismatches. From these analyses single nucleotide polymorphisms and in/dels will be identified. A full SNP report will be generated. This report will identify: • Reference: the reference sequence against which the variant was detected. • Variant: an internal unique identifier for the variant. • Class: S=snp, I=insertion, D=deletion (relative to the reference). • Position: the base pair at which the variant occurs. • RefAllele: the allele for the reference sequence. • VarAllele: the allele for the variant reported. • Context: sub-sequence of the reference around the variant position. • BothStrands: has the variant been reported in reads from both strands? • AvgQual: the average quality of the bases in which the variant was reported. • MaxQual: the maximum quality of all bases in which the variant was reported. • NumReadsWithAllele: how many reads total exhibited the variant allele. • UniqAlns: how many reads exhibiting the allele were from unique alignments. • FreqDiff: the absolute value of the difference in frequency between varieties. Results from this analysis will be used to generate single nucleotide polymorphisms (SNP_ panels for use in USDA-ARS breeding programs for these tree species, generation of genetic maps when mapping populations are available and evaluation of genetic diversity of the germplasm collection of these tree species.