Submitted to: Biomed Central (BMC) Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: June 15, 2007
Publication Date: June 15, 2007
Citation: Crane, C.F. 2007. Patterned sequence in the transcriptome of vascular plants. Biomed Central (BMC) Genomics [abstract]. Paper No. 173. Interpretive Summary: Microsatellites are direct tandem repeats of a series (motif) of one to six bases in the DNA of an organism. Minisatellites are the same thing, for motifs longer than six bases. Both microsatellites and minisatellites are useful as frequently polymorphic genetic markers that can be detected as length variants after polymerase chain reaction (PCR), although software tools have not progressed as much for recognizing minisatellites in published DNA sequence. This paper reports and statistically analyzes the results of custom-written software that was used to identify a nonredundant set of micro- and minisatellite loci in 6.7 million publicly available expressed gene DNA sequences drawn from 88 genera of plants. The motifs were one to 250 bases long, and the analysis was replicated with zero, one, or two allowed deviations from perfect repetition within the locus. Frequency, repeat count, and length polymorphism were compared among genera by distance measures, rankings, and the reconstruction of phylogenetic trees. It was shown that microsatellites evolve faster and more reversibly than the genera themselves. This work will benefit geneticists and breeders who use microsatellites for genetic mapping, varietal fingerprinting, identification of quantitative trait loci, marker-assisted selection, and analysis of the evolution of populations and species. It will especially benefit workers for minor crops or wild species for which microsatellite studies have not been published, and it presents a broad picture of microsatellite characteristics under uniform conditions across the vascular plants.
Technical Abstract: The large number of EST sequences now available in public databases offers an opportunity to compare microsatellite and minisatellite properties and evaluate their evolution over a broad range of plant taxa. Repeated motifs from one to 250 nucleotides long were identified in 6793306 expressed sequence tags (ESTs) from 88 genera of vascular plants, using a custom data-processing pipeline that allowed limited variation among repeats. The pipeline processed trimmed but otherwise unfiltered sequence and output nonredundant loci of at least 15 nucleotides, with degree of polymorphism and PCR primers wherever possible. Motifs that were an integral multiple of three in length were more abundant and richer in G/C than other motifs. From 80 to 85% of minisatellite motifs represented repeats within proteins, but not all of these repeats preserved reading frame. The remaining 15 to 20% of minisatellite motifs were associated with repetitive elements, e.g., retrotransposons. Microsatellites were less frequent in the transcriptome of genera with large genomes, but there was no evidence for greater dilution of the transcriptome with transposable element transcripts in these genera. Although evolution of increased microsatellite and EST GC content was evident within the grasses, relative microsatellite motif frequencies did not correlate tightly to phylogenetic relationship. This suggests that repeat loci evolve more rapidly than the surrounding sequence, although tissue specificity of the different EST libraries is a complicating factor. Motifs of four to six nucleotides are as polymorphic in EST collections as the commonly used motifs of two and three nucleotides, and they can be exploited as genetic markers with little additional effort.