Page Banner

United States Department of Agriculture

Agricultural Research Service

Research Project: CURATION AND DEVELOPMENT OF THE SOYBEAN BREEDER'S TOOLBOX AND ITS INTEGRATION WITH OTHER PLANT GENOME DATABASES

Location: Corn Insects and Crop Genetics Research

2009 Annual Report


1a.Objectives (from AD-416)
Objective 1: Implement web-accessible computational and visualization tools, including semantic web technologies, to enable comparison and transfer of agronomically important genetic information among soybean and other legume and related dicot species. Objective 2: Continue to curate and enhance SoyBase and the Soybean Breeder’s Toolbox (SBT), more fully integrating the genetic, phenotypic, physical map, and whole-genome sequence data from soybean and other legumes. Objective 3: Coordinate the quality assembly and annotation of the soybean whole-genome sequence.


1b.Approach (from AD-416)
Soybean ontologies will be prepared to describe selected data types from the Soybean Breeders Toolbox (SBT). Data exchange descriptions (“RDF graphs”) will be developed to allow integration of the data into the Virtual Plant Information Network (VPIN). To let researchers transparently find, retrieve, or apply analytical methods to data contained in the SBT, web services will be developed to make these services accessible through a single portal. Soybase and the SBT will be maintained and updated with new data classes as needed. The Williams 82 physical map and the soybean whole genome sequence, new sequence-based data types in SoyBase, and comparative data from other legumes will be integrated and displayed. The project works closely with DOE-JGI to enhance the quality of the soybean whole-genome sequence assembly. This will include analysis of sequence-based genetic markers, comparative analyses with other genomes, and various informatic analyses.


3.Progress Report
The dissemination of information about phenotypic characteristics is often hampered by the use of imprecise descriptions of phenotypic characters using “field” terms. This makes discovery of information by computer programs difficult. Applying a numerical labeling system, ARS scientists at Ames, Iowa have constructed a terminology to describe the development of soybean plants. This activity has produced a descriptive vocabulary that covers the growth and development of both vegetative and reproductive components of a soybean plant. The terminology is composed of approximately 1000 terms in SoyBase, and which have been submitted to the Plant Ontology Consortium. Two hundred forty-three soybean trait terms were identified and linked to their nearest synonym in the greater plant trait ontology. Common names for the same phenotype have been collected and compiled into a searchable database available on the SoyBase website (soybase.org) that will allow researchers to identify phenotypic traits using common or field terms. This controlled vocabulary will be important in associating genetically mapped phenotypes (i.e., Qualititative Trait Loci (QTL)) with genes identified in the genome sequence. Semantic web technologies provide a way of exchanging data based on meaning and not on labels for the data. Thus, databases that have similar data described in various ways can make their data available to others based on a common semantic rather than a common vocabulary for the data. This system will facilitate the discovery of pertinent data that will greatly reduce the time curatorial staff and researchers spend in literature analysis. We have developed and deployed six Simple Semantic Web and Protocol (SSWAP) semantic web services that will allow researchers and programs to systematically discover and transfer all data in the Soybean Breeders Toolbox (SBT) QTL class. The SBT database and display engine have been modified to provide links to the Germplasm Resource Information Network (GRIN) database based on the use of both GRIN accession numbers and germplasm common names. This allows researchers to incorporate GRIN data into the context of SoyBase genetic and genomic information. As available data has increased, it has been necessary to add new web pages to accommodate the changes. We developed a two-tier system of navigational tabs that both briefly summarize the contents of each of the major sections in SoyBase and allow rapid movement between them. Access to the soybean genomic sequence permits the visualization of both the soybean physical map and the soybean genomic sequence in the context of the mature soybean genetic map. In response to stakeholder requests, SoyBase displays have been modified through the use of contextual menus to allow a seamless transition between the SBT data and displays of the soybean physical, sequence and genetic maps and between the map displays. In cooperation with ARS-BARC personnel, we have increased the density of the soybean genetic map by the inclusion of 1600 single nucleotide polymorphim (SNP) markers identified by BARC personnel. The new genetic markers were positioned in the soybean genome sequence.


4.Accomplishments
1. Development of a repetitive element database and web interface. Development of a repetitive element database and web interface. Working in collaboration with researchers at Purdue University, ARS scientists in Ames, Iowa have developed a relational database that contains the location, identification and description of the almost 40,000 transposable elements in the genome. The database also possesses a comprehensive web interface for analyzing the features of the transposable elements. Among other functions, the web interface allows access to these data based on evolutionary relationship, primary DNA sequence or chromosomal location. This database provides one of the primary annotations applied to the soybean genome sequence and will aid in the understanding of the evolution of a complex genome.

2. 1600 SNP markers placed on the soybean physical and sequence map. In cooperation with ARS-BARC personnel, the density of the soybean genetic map was increased by the inclusion of 1600 single nucleotide polymorphim (SNP) markers identified by BARC personnel. The new genetic markers were also positioned in the soybean genome sequence. Both maps are available to the public through the SoyBase web site (soybase.org). A genetic map with many molecular markers is an essential component of the efforts of soybean breeders to precisely introduce important traits into soybean elite cultivars. Using a high-density map allows a more precise mapping of the traits and thus allows breeders to modify the soybean germplasm more efficiently. Additionally, this mapping will also greatly facilitate allele discovery that forms the basis of genetic stock improvement.

3. Updated the Affymetrix SoyChip annotation database. The Affymetrix SoyChip contains tens of thousands of genes on a small glass slide and is used by many researchers to investigate soybean gene function. Having a source for obtaining the identity of the probesets on the chip will facilitate the analysis of soybean gene expression studies for many researchers. The SoyBase Affymetrix GeneChip annotation database was re-analyzed by ARS scientists at Ames, Iowa, using the gene annotation provided by the Joint Genome Institute (JGI) soybean genome sequencing project. This correlation of the GeneChip probesets with the JGI soybean gene calls enables the soybean genomic sequence to be associated with a physical location on the soybean chromosomes. This will allow researchers to analyze gene function in the context of their genomic locations. This in turn may lead to more insights into gene regulation in soybean and how the expression of these genes is related to agronomic traits.

4. Preparation of a soybean developmental ontology and inclusion into SoyBase. The ability to accurately describe soybean growth and development is critical to the information translation between soybean molecular geneticists and plant breeders. Both groups need a common terminology that uses biologically important concepts to connect these two views of the soybean. Applying a numerical labeling system used previously by model plant researchers, ARS scientists at Ames, Iowa have constructed a preliminary vocabulary to describe the development of soybean plants. This activity has produced an ontology that covers the growth and development of both vegetative and reproductive components of a soybean plant. The ontology currently is composed of approximately 1000 terms in SoyBase, and which have been submitted to the Plant Ontology Consortium. This controlled vocabulary will be important in associating genetically mapped traits (i.e., QTL) with genes identified in the genome sequence. The ability to translate soybean trait differences between molecular researchers and field observers is critical for the improvement of the US soybean germplasm collection, both private and public.

5. Completion and description of the soybean genome sequence assembly. The sequencing and assembly of the soybean genome sequence was a very large, multiagency effort. The ARS Corn Insects and Crop Genomics Research Unit used markers, developed with other ARS-BARC researchers, as well as novel computational methods, to integrate sequence assemblies generated by the Department of Energy-Joint Genome Institute sequencing group, into full chromosome-scale assemblies. The resulting 950 million letters of DNA sequence, in 20 chromosomes, contains more than 46 thousand predicted and at least partially validated genes, as well as the codes that describe how those genes are used in the plant. The genome sequence was made publically available in October, 2008, at http://soybase.org. There, a genome browser displays features of interest on the chromosomes, including genetic markers, gene boundaries, and locations of related sequences from other legume crops (such as pea, common bean, cow pea, and peanut). Genetic markers are also linked to the genetic maps in the Soybean Breeder's Toolbox, where the locations of many agronomic traits are associated with the markers. The impact of the assembled, annotated genome of soybean will be enormous, as it contains every gene and regulatory sequence used in the plant's development. The genome sequence is expected to dramatically speed progress in identifying the basis for many important traits. Already, researchers have used the genome sequence to identify genes involved in disease resistance (including Asian Soybean Rust), nutrition (including the "phytate" anti-nutritive compound), and protein and oil production.


6.Technology Transfer

Number of Web Sites Managed2

Last Modified: 8/1/2014
Footer Content Back to Top of Page