Location: Fruit and Nut ResearchTitle: Mining and comparison of haplotype-based expressed sequence tag single nucleotide polymorphisms among citrus cultivars) Author
Submitted to: Biomed Central (BMC) Genomics
Publication Type: Peer reviewed journal
Publication Acceptance Date: 10/22/2013
Publication Date: 11/1/2013
Citation: Chen, C., Gmitter, F.J. 2013. Mining and comparison of haplotype-based expressed sequence tag single nucleotide polymorphisms among citrus cultivars. Biomed Central (BMC) Genomics. www.biomedcentral.com/1471-2164/14/746. Interpretive Summary: Conventional hybridization breeding is hindered by lack of genetic knowledge of most traits. Many available genomic sequences and resources have not been fully exploited, which in turn has limited our understanding of trait genetics and slowed breeding progress. Single nucleotide polymorphisms (SNPs), the most abundant polymorphisms in a genome, have been mined out of gene-derived sequences, called expressed sequence tags (ESTs), and compared among citrus cultivars to select most useful sets of SNPs. Some of these gene-derived SNPs, likely associated with certain traits, can be exploited in large-scale genetic studies and thereby enhance selection efficiency of trait-targeted breeding.
Technical Abstract: In this paper, haplotype-based SNPs were mined out of publicly available citrus expressed sequence tags (ESTs) from different citrus cultivars (genotypes) individually and collectively for comparison. There were a total of 567,297 ESTs belonging to 27 cultivars in varying numbers and consequentially yielding different numbers of haplotype-based quality SNPs. These so-called quality SNPs had both high SNP and allele confidence scores as defined by the QualitySNP mining algorithm, and were distinguished from these potential SNPs, representing all nucleotide discrepancies among ESTs in a contig. Sweet orange (SO) had the most (213,830) ESTs, generating 11,182 quality SNPs in 3,327 out of 4,228 usable contigs. Summed from all the individually mining results, a total of 25,417 quality SNPs were discovered – 15,010 (59.1%) were transitions (AG and CT), 9,114 (35.9%) were transversions (AC, GT, CG, and AT), and 1,293 (5.0%) were insertion/deletion events (indels). On average, there were 2.4 SNPs per contig and one SNP every 1,064 bp of all the SNP-containing contig sequences. A vast majority of SNP-containing contigs consisted of only 2 haplotypes, as expected, but the percentages of 2 haplotype contigs varied in a wide range in these citrus cultivars. The variation might largely result from ESTs under each citrus type being generated from different cultivar lines, which was supported by the fact that substantially more SNPs and >2 haplotypes were discovered in contigs assembled from ESTs combined from different cultivars, compared to the sum of these separately mined individuals. BLAST of the 25,417 25-mer SNP oligos to the Clementine reference genome scaffolds revealed 2,947 SNP had “no hits found”, 19,943 had 1 unique hit / alignment, 1,571 had one hit and 2+ alignments per hit, and 956 had 2+ hits and 1+ alignments per hit. Of the total 24,293 scaffold hits, 23,955 (98.6%) were on the main scaffolds 1 to 9, and only 338 were on 87 minor scaffolds. Most alignments had 100% (25/25) or 96% (24/25) nucleotide identities, accounting for 93% of all the alignments. Considering almost all the nucleotide discrepancies in the 24/25 alignments were at the SNP sites, it served well as in silico validation of these SNPs, in addition to and consistent with the rate (81%) validated by sequencing and SNaPshot assay. The distribution of SNPs uniquely and/or multiply aligned in each main scaffold was revealed and could facilitate selection of core sets of SNPs applied to different genotyping utilizations. Development of an appropriate high throughput SNP genotyping platform using these high-quality, haplotype-based, well-characterized, and double-validated SNPs is underway.