Location: Subtropical Horticulture Research
Project Number: 6038-21000-022-03
Start Date: Jul 01, 2012
End Date: Dec 31, 2015
RNA will be isolated from the leaves and flowers and developing fruit of a reference cultivar of the tree species, which is usually the most important commercial cultivar, by USDA-ARS SHRS staff. The collaborator will generate transcriptome (mRNA-Seq) libraries using in-house developed protocols. The reference cultivar library will be normalized and sequenced on the 454 platform and assembled to provide a reference transcriptome dataset. Individual read lengths are expected to average >400 nts, and assembled transcript sequences are expected to range up to ~12 kb. RNA from the other genetically diverse cultivars of that tree species will be sequenced on the Illumina platform using the most recent mRNA-Seq protocol improvements and read lengths of up to 120 nts. Sequence reads will be mapped to the reference transcriptome. A variant report will be generated that describes positions and statistical significance of mismatches. From these analyses single nucleotide polymorphisms and in/dels will be identified. A full SNP report will be generated. This report will identify: • Reference: the reference sequence against which the variant was detected. • Variant: an internal unique identifier for the variant. • Class: S=snp, I=insertion, D=deletion (relative to the reference). • Position: the base pair at which the variant occurs. • RefAllele: the allele for the reference sequence. • VarAllele: the allele for the variant reported. • Context: sub-sequence of the reference around the variant position. • BothStrands: has the variant been reported in reads from both strands? • AvgQual: the average quality of the bases in which the variant was reported. • MaxQual: the maximum quality of all bases in which the variant was reported. • NumReadsWithAllele: how many reads total exhibited the variant allele. • UniqAlns: how many reads exhibiting the allele were from unique alignments. • FreqDiff: the absolute value of the difference in frequency between varieties. Results from this analysis will be used to generate single nucleotide polymorphisms (SNP_ panels for use in USDA-ARS breeding programs for these tree species, generation of genetic maps when mapping populations are available and evaluation of genetic diversity of the germplasm collection of these tree species.