Submitted to: Genome Research
Publication Type: Peer reviewed journal
Publication Acceptance Date: 10/1/2005
Publication Date: 3/1/2006
Citation: Udall, J.A., Taliercio, E.W., Turley, R.B., Payton, P.R., Scheffler, J.A., Wendel, J. 2006. A Global Assembly of Cotton ESTs. Genome Research. 16(3):441-450. Interpretive Summary: Over the past 10 years, advances in technology and decreased costs have resulted in the relative ease of cloning and sequencing large numbers of genes. Subsequently, this has resulted in the generation of novel collections of expressed sequence tags (ESTs) representing genes from specific tissues of interest to particular labs. In the absence of large amounts of funding directed toward whole genome sequencing in cotton, several laboratories have invested in the creation of EST libraries from a variety of cultivars, tissues, developmental stages, and stress treatments. While the NSF-sponsored cotton genome project resulted in the generation of a unique set of 14,000 genes isolated from fiber, our laboratory and others have generated thousands of additional ESTs from cotton. We report here the sequencing, clustering, and analysis of thirty EST libraries generated by an international consortium of research groups. We assembled over 170,000 ESTs into over 50,000 tentative consensus sequences. Further analysis has resulted in the identification of 33,665 sequences representing unique genes. This collection enables an examination of sequence divergence within a well defined system of diploid and polyploid plant species on an unprecedented scale, provides insight into gene expression in numerous different tissues and environmental conditions, and sets the stage for the development of a cotton oligonucleotide microarray with deep genomic coverage. These projects have provided useful tools for genomic comparisons, gene identification, molecular marker studies, microarray development, and gene discovery. Additionally, it represents a community-wide effort in establishing an indepth resource for cotton genomics research.
Technical Abstract: Approximately 170,000 Gossypium EST sequences comprising > 94,800,000 nucleotides were amassed from 30 cDNA libraries constructed from a variety of tissues and organs under a range of conditions, including drought stress and pathogen challenges. These libraries were derived from allopolyploid cotton (Gossypium hirsutum; AT and DT genomes) as well as its two diploid progenitors, G. arboreum (A-genome) and G. raimondii (D-genome). ESTs were assembled using a novel Program for Assembling and Viewing ESTs (PAVE), resulting in 22,030 contigs and 29,077 singletons (51,107 unigenes). Further comparisons among the singletons and contigs led to recognition of 33,665 exemplar sequences that represent a non-redundant set of putative Gossypium genes containing partial or full-length coding regions and usually one or two UTRs. The assembly may be viewed at http://agcol.arizona.edu/pave/cotton/, along with their UniProt BLAST hits, GO annotation, and Pfam analysis. Because ESTs from diploid and allotetraploid Gossypium were combined in a single assembly, we were in many cases able to bioinformatically distinguish duplicated genes in allotetraploid cotton and assign them to either the A or D genome. The assembly and associated information provides a framework for future investigation of cotton functional and evolutionary genomics using both long and short oligonucleotide microarrays.