EVALUATION, IMPROVEMENT, AND DEVELOPMENT OF NEW/ALTERNATIVE INDUSTRIAL CROPS
Location: Plant Physiology and Genetics Research
Title: DATA MINING FOR MICROSATELLITES IN EXPRESSED SEQUENCE TAGS (ESTS) FROM ARABIDOPSIS THALIANA AND BRASSICA SPECIES FOR USE IN LESQUERELLA (BRASSICACEAE)
Submitted to: Association for the Advancement of Industrial Crops Conference
Publication Type: Abstract Only
Publication Acceptance Date: August 15, 2004
Publication Date: September 19, 2004
Citation: Salywon, A.M., Barber, M., Herling, N., Stewart, W., Dierig, D.A. 2004. Data mining for microsatellites in expressed sequence tags (ests) from arabidopsis thaliana and brassica species for use in lesquerella (brassicaceae).Association for the Advancement of Industrial Crops Conference. p. 20
Lesquerella (Brassicaceae) species are being developed as sources of hydroxy fatty acids, with potential for industrial applications. Microsatellite markers can accelerate selection for favorable traits in breeding programs and may be useful in identifying trait loci. Yet, developing microsatellite markers de novo is costly and time-consuming. A cheap and relatively less time-consuming alternative for developing microsatellites entails data-mining public databases for these sequences and designing primers in the flanking regions. For some taxa, the frequency of microsatellites is higher in transcribed regions than in genomic DNA; therefore, data-mining expressed sequence tags (ESTs) has proven to be an attractive method for many research groups. Because the primers developed for these microsatellite-ESTs are derived from transcribed regions they have a higher probability of transferring between related species and genera, thereby, significantly extending the usefulness of the primers.
The purpose of this study was to use databases of expressed sequence tags from Arabidopsis thaliana L. and Brassica crop species to determine the potential for development of microsatellite markers that amplify across generic boundaries within Brassicaceae and especially in Lesquerella.
In April 2004, 347,844 Arabidopsis thaliana and 44,851 Brassica EST sequences were downloaded from public databases. Sequences containing microsatellites were identified using Perl script and microsatellite-ESTs were then masked using RepeatMasker Program and clustered using StackPACK 2.0 system (with associated d2_cluster, Phrap, and CRAW programs). The EST database was then again queried with microsatellite containing ESTs clusters to extend the consensus sequences and reduce redundancy by clustering significantly similar ESTs. Information from all stages was stored in a relational database.
Output files from StackPACK after clustering identified 2,058 microsatellite-ESTs for Arabidopsis and 540 microsatellite-ESTs for Brassica spp. In both Arabidopsis and Brassica, tri-nucleotide repeat motifs were found to be the most abundant (69 and 59% respectively), followed by di-nucleotide repeat motifs (23 and 36% respectively) and tetra-nucleotide repeat motifs (8 and 5% respectively). Preliminary results from clustering of orthologous microsatellite-ESTs from Arabidopsis and Brassica indicate that this method can be used to develop microsatellite primers that may amplify across the genera in the family for population or genomic studies. Although the realized number of orthologous regions for comparison is currently limited by the number of ESTs available.
This study indicates that EST data from the model organism Arabidopsis thaliana and from the agriculturally important Brassica spp. can potentially benefit molecular genetics studies other related genera, such as Lesquerella, for which little, if any, sequence data may exist.