Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Research Project #425040

Research Project: SoyBase and the Legume Clade Database

Location: Corn Insects and Crop Genetics Research

2014 Annual Report


1a. Objectives (from AD-416):
Objective 1: Support stewardship of soybean and other major reference legume genetic, genomic, and phenotypic datasets. Sub-objective 1.A Develop and deploy infrastructure to support both the current reference soybean genome sequence, improved versions of that sequence, and new re-sequenced soybean genomes and haplotype data. Sub-objective 1.B Develop processes and tools to provide access to soybean gene model structural and functional annotations as these are revised over time. Sub-objective 1.C Provide standardized access to reference genome and affiliated sequences for the major crop and model legume species. Sub-objective 1.D Curate high-quality soybean datasets created by the community at large. These may include expression, mutant, phenotype, epigenetic, haplotype, small-RNA, QTL, and other data types. Sub-objective 1.E Maintain infrastructure to enable acquisition, storage, and community access to major public data sets for various legume species. Objective 2: Cooperate with other database developers and plant researchers to develop gene and trait ontologies and open, standardized data exchange mechanisms to enhance database interoperability. Objective 3: Provide community support and research coordination services for the research and breeding communities for soybean and other legumes. Expand outreach activities through workshops, web-based tutorials, and other communications. Objective 4: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources.


1b. Approach (from AD-416):
Incorporate revised primary reference genome sequence for soybean into SoyBase. House and provide access to genome sequences for other soybean accessions, haplotype data, and related annotations. Incorporate revised gene models and annotations into SoyBase. Install or implement web-based tools for curation and improvement of soybean gene models and gene annotations. Incorporate available legume genome sequences and annotations. Working with collaborators, collect and add genetic map and QTL data for crop legumes. Extend web-based tools for navigation among biological sequence data across the legumes. Extend and develop methods and storage capacity for accepting genomic data sets for soybean and other legume species. Develop a complete set of descriptors (ontologies) for soybean biology (anatomy, traits, and development), and for other significant crop legumes as needed. Work with the relevant ontology communities-of-practice to incorporate these descriptors into broadly accessible ontologies. Develop web tutorials for important typical uses of SoyBase and the Legume Clade Database. Present and train about features at relevant conferences and workshops. Regularly seek feedback from users about desired features and usability.


3. Progress Report:
Soybean genomics and SoyBase. The U.S. soybean crop, valued in excess of $35 billion (USDA-NASS), depends on continued breeding improvements in order to achieve yield gains and avoid losses due to pathogens and environmental stresses. The USDA-ARS soybean genetics database, http://soybase.org, provides access to the complete genome sequence for soybean, as well as to predicted genes, markers, valuable traits and their locations, and many other genetic features. We have worked with the Department of Energy - Joint Genome Institute, the Hudson Alpha Institute, and National Center for Biotechnology Information, to assess and improve gene predictions for soybean. These gene models have been incorporated into the genome browser at SoyBase, along with detailed report pages for all gene predictions. A translation tool allows researchers to find and compare soybean gene predictions from older and newer research. This work addresses Project Sub-objective 1.B Develop processes and tools to provide access to soybean gene model structural and functional annotations as these are revised over time. The SoyBase database continues to be actively extended, through addition of publications that describe the locations of traits, genes, and features of interest. A new interface provides access to soybean mutant and gene knockout data (http://www.soybase.org/mutants ). This enables researchers to browse through images of known mutants, or to find plants with known genetic lesions. A new version of SoyCyc soybean metabolic database, based on the latest plant metabolic pathways, has also been incorporated into SoyBase. This work addresses Project Objective 4: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources. A video tutorial page has been deployed at SoyBase that provides both how-to instruction for using SoyBase and links to community-developed tutorials, and which acts as a permanent repository for video content covering many aspects of soybean research and production. This resource of tutorial videos addresses Objective 3: Provide community support and research coordination services for the research and breeding communities for soybean and other legumes, and expand outreach activities through workshops, web-based tutorials, and other communications. Crop legume genomics, the Legume Information System, and allied databases. Approximately two-dozen species in the bean and pea family are grown as protein-rich crops. These provide a significant portion of the increasing global demand for protein and nutrition. In the U.S. alone, these crops have a market value in excess of $13 billion (USDA-NASS). We have worked in the past year with international collaborators to analyze and publish the first formal description of the genome sequence of common bean (Nature Genetics 2014), and to assemble and make available the genome sequences of two wild relatives of peanut, which will be used to assemble the genome sequence of cultivated peanut. We have continued improving two genome databases for diverse crop legume species: the Legume Information System, http://legumeinfo.org ; and PeanutBase, http://peanutbase.org . These Web resources now provide access to the genome and gene sequences of eight legume species: common bean, pigeonpea, soybean (via SoyBase), chickpea, Medicago truncatula and Lotus japonicus (two forage and model research species), and Arachis duranensis and Arachis ipaensis (two wild relatives of peanut). These resources will enable plant breeders and researchers to more rapidly develop new crop varieties with favorable yield, disease resistance, or stress tolerance characteristics. This work addresses Project Sub-objective 1.C: Provide standardized access to reference genome and affiliated sequences for the major crop and model legume species. Work on the Legume Information System has had three primary focuses in the past year: first, to transfer the database and web site to a widely used genomic web and database framework (Tripal and Chado), in order to better integrate with other genomic databases and make use of a large group of developers for these tools; second, to extend the collection of genetic maps and qualitative trait loci (to more than 250 QTL in common bean and more than 390 QTL in peanut); and third, to develop and share data-collection templates with collaborators, in order to take advantage of specialized knowledge across many crop research communities. This work on the Legume Information System has particularly focused on meeting this project objective: Objective 2: Cooperate with other database developers and plant researchers to develop gene and trait ontologies and open, standardized data exchange mechanisms to enhance database interoperability.


4. Accomplishments
1. Assembly of peanut genome sequences. Global demand continues to increase for protein-rich, nutrient dense foods and oil crops. Peanut is unique in being both very nutrient-dense, and consumable with minimal or no processing. The farm value of the U.S. crop is in excess of $1 billion annually (USDA NASS). We have worked over the past year with other U.S. and international researchers to help assemble the genome sequences of the two closest wild relatives of peanut, as part of an ongoing effort to sequence the genome of cultivated peanut. We have made these genome sequences and related genetic resources available at PeanutBase. The availability of the genome sequences for peanut will make it possible for researchers to more rapidly breed varieties that have improved yield, disease resistance, and stress tolerance.

2. Publication of the common bean genome sequence, with web access to the bean genome and genes. Common bean is the most important grain legume for human consumption worldwide, and plays an important role in agriculture due to its ability to utilize atmospheric nitrogen as a natural fertilizer. The farm value of the U.S. common bean crop (dry bean and snap bean) is approximately $1 billion annually (USDA NASS). The USDA has worked with U.S. and international partners to report the genome sequence of common bean. This research confirms that common bean was independently domesticated in South America and in Mesoamerica, and also reports that important traits such as seed size are based on different genetic factors in the two distinct domestications. This result means that plant breeders have a larger genetic toolbox available from the South American and Mesoamerican seed collections.

3. A new on-line tool for plant researchers to access information about soybean lines with known DNA mutations. A continuing research challenge to soybean breeders is to integrate the wealth of genomic data into a coherent plant breeding scheme. A particular difficulty has been determining the function of predicted genes. Working with other USDA researchers, we have developed a comprehensive tool for searching, visualizing and reporting information for plant lines that contain DNA mutations (derived from radiation, transposons, or other sources). The mutated plant lines have been extensively characterized in terms of both their growth patterns and the chromosomal locations of the DNA mutations, and the predicted genes contained in DNA that has undergone mutations. The new tools in SoyBase enable users to explore and use these data and associations. This information will be used by breeders and other researchers to generate improved soybean varieties.


Review Publications
Kudapa, H., Azam, S., Sharpe, A.G., Taran, B., Li, R., Deonovic, B., Cameron, C., Farmer, A.D., Cannon, S.B., Varshney, R.K. 2014. Comprehensive transcriptome assembly of chickpea (Cicer arietinum L.) using Sanger and next generation sequencing platforms: development and applications. PLoS One. 9(1):e86039. DOI: 10.1371/journal.pone.0086039.

Shmutz, J., Mcclean, P., Mamidi, S., Wu, A., Cannon, S.B., Grimwood, J., Jenkins, J., Shu, S., Song, Q., Chavarro, C., Geffroy, V., Moghaddam, S.M., Dongying, G., Abernathy, B., Barry, K., Blair, M., Brick, M.A., Chovatia, M., Gepts, P., Goodstein, D.M., Gonzales, M., Hellsten, U., Hyten, D.L., Gaofeng, J., Kelly, J., Kudrna, D., Lee, R., Manon, R.M., Miklas, P.N., Osorno, J.M., Rodrigues, J., Thareau, V., Urrea, C.A., Wang, M., Yu, Y., Zhang, M., Wing, R.A., Cregan, P.B., Rokhsar, D.S., Jackson, S.A. 2014. A reference genome for common bean and genome wide analysis of dual domestications. Nature Genetics. 46: 707-713. DOI: 10.1038/ng.3008.

Anderson, J.E., Kantar, M.B., Kono, T.Y., Fu, F., Stec, A.O., Song, Q., Cregan, P.B., Specht, J.E., Diers, B.W., Cannon, S.B., McHale, L.K., Stupar, R.M. 2014. A roadmap for functional structural variants in the soybean genome. Genes, Genomes, Genetics. DOI:10.1534/g3.114.011551.