Submitted to: Genome
Publication Type: Peer reviewed journal
Publication Acceptance Date: 11/16/2001
Publication Date: 5/1/2002
Citation: SHOEMAKER, R.C., VODKIN, L., KEIM, P., RETZEL, E., CLIFTON, S.W., SMOLLER, D., CORYELL, V., KHANNA, A., ERPELDING, J.E., GRANGER, C.L. A COMPILATION OF SOYBEAN ESTS: GENERATION AND ANALYSIS. GENOME. 2002. v. 45. p. 329-338. Interpretive Summary: The hereditary material of any organism is comprised of individual genes. Each gene produces a product that is unique from all other gene products. Random sampling of these gene messages and partial decoding of the messages is an efficient way to sample the genetic make-up of an organism. In this study the research team evaluated more than 120,000 soybean gene messages. This represents the largest public plant gene discovery project of this type. By using several statistical methods the research team identified genes that seemed to act in concert and that may be co-regulated. This may be a method by which the function of genes can be resolved. This type of gene discovery project provides a huge source of cloned genes for public and private researchers, and provided insight into the structure, function and evolution of a major crop legume. The results of this project will greatly increase the free availability of soybean genes and gene codes for provide and public researchers, thus saving many millions of dollars in gene discovery costs.
Technical Abstract: Whole-genome sequencing is fundamental to understanding the genetic composition of an organism. Because of the size and complexity of the soybean genome targeted, random gene sequencing provides an immediate and productive method of gene discovery. In this study, more than 120,000 soybean expressed sequence tags (ESTs) generated from more than 50 cDNA libraries were evaluated. These ESTs collapsed into 16,928 contigs with a additional 17,336 singletons. The average size and length of each contig was 6 ESTs and 788 bases. The average sequence length submitted to dbEST was 414 bases. Using only libraries with more than 800 ESTs and contigs with 10 or more ESTs, correlated patterns of gene expression among libraries and among genes were discerned. Two-dimensional qualitative representations of contig and library similarities were generated based on expression profiles. Genes with similar expression patterns, and, potentially, similar functions were identified. These studies provide a rich source of publicly available genes and gene sequences and provides valuable insight into the structure, function and evolution of a model crop legume genome.