Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Research Project #425040

Research Project: SoyBase and the Legume Clade Database

Location: Corn Insects and Crop Genetics Research

2015 Annual Report

1a. Objectives (from AD-416):
Objective 1: Support stewardship of soybean and other major reference legume genetic, genomic, and phenotypic datasets. Sub-objective 1.A Develop and deploy infrastructure to support both the current reference soybean genome sequence, improved versions of that sequence, and new re-sequenced soybean genomes and haplotype data. Sub-objective 1.B Develop processes and tools to provide access to soybean gene model structural and functional annotations as these are revised over time. Sub-objective 1.C Provide standardized access to reference genome and affiliated sequences for the major crop and model legume species. Sub-objective 1.D Curate high-quality soybean datasets created by the community at large. These may include expression, mutant, phenotype, epigenetic, haplotype, small-RNA, QTL, and other data types. Sub-objective 1.E Maintain infrastructure to enable acquisition, storage, and community access to major public data sets for various legume species. Objective 2: Cooperate with other database developers and plant researchers to develop gene and trait ontologies and open, standardized data exchange mechanisms to enhance database interoperability. Objective 3: Provide community support and research coordination services for the research and breeding communities for soybean and other legumes. Expand outreach activities through workshops, web-based tutorials, and other communications. Objective 4: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources.

1b. Approach (from AD-416):
Incorporate revised primary reference genome sequence for soybean into SoyBase. House and provide access to genome sequences for other soybean accessions, haplotype data, and related annotations. Incorporate revised gene models and annotations into SoyBase. Install or implement web-based tools for curation and improvement of soybean gene models and gene annotations. Incorporate available legume genome sequences and annotations. Working with collaborators, collect and add genetic map and QTL data for crop legumes. Extend web-based tools for navigation among biological sequence data across the legumes. Extend and develop methods and storage capacity for accepting genomic data sets for soybean and other legume species. Develop a complete set of descriptors (ontologies) for soybean biology (anatomy, traits, and development), and for other significant crop legumes as needed. Work with the relevant ontology communities-of-practice to incorporate these descriptors into broadly accessible ontologies. Develop web tutorials for important typical uses of SoyBase and the Legume Clade Database. Present and train about features at relevant conferences and workshops. Regularly seek feedback from users about desired features and usability.

3. Progress Report:
Soybean genomics and SoyBase. The U.S. soybean crop, valued in excess of $35 billion (USDA-NASS), depends on continued breeding improvements in order to achieve yield gains and avoid losses due to pathogens and environmental stresses. The USDA-ARS soybean genetics database,, provides access to the complete genome sequence for soybean, as well as to predicted genes, markers, valuable traits and their locations, and many other genetic features. The SoyBase database continues to be actively extended, through addition of publications that describe the locations of traits, genes, and features of interest. A section of SoyBase was developed to present the SoyNAM Project, which is a multi-institution, multi-year nested association study designed to identify genomic regions of agronomic interest. Pedigrees for all of the entries in the Northern and Southern Uniform Testing Trials were collected and a database and web site developed to present them to the soybean breeding community. A database and web site was developed for the Uniform Testing Trials and their population initiated. Two additional RNA-seq gene atlases were added to SoyBase. This along with the other data in SoyBase allows users to use tissue and developmental stage expression profiles to help identify genes conditioning important soybean traits. The SoyBase video tutorial page has been updated with additional and improved tutorials. A detailed site navigation page was developed to better expose the SoyBase content. Crop legume genomics, the Legume Information System, and allied databases. Approximately two-dozen species in the bean and pea family are grown as protein-rich crops. These provide a significant portion of the increasing global demand for protein and nutrition. In the U.S. alone, these crops have a market value in excess of $13 billion (USDA-NASS). We have worked in the past year with international collaborators to assemble and analyze the genome sequences of the two closest wild progenitors of cultivated peanut. We have continued improving two genome databases for diverse crop legume species: the Legume Information System, ; and PeanutBase, . These Web resources now provide access to the genome and gene sequences of eight legume species: common bean, pigeonpea, soybean (via SoyBase), chickpea, Medicago truncatula and Lotus japonicus (two forage and model research species), and Arachis duranensis and Arachis ipaensis (two wild relatives of peanut). These resources will enable plant breeders and researchers to more rapidly develop new crop varieties with favorable yield, disease resistance, or stress tolerance characteristics. Work on the Legume Information System ( in the past year has included: a new set of gene families and genes in the legume crops (i.e. sets of related genes), and viewers for these gene families; new search capabilities for features such as plant traits, genetic markers, and genes; new gene descriptors for chickpea and pigeonpea; new sequence search tools; and many improvements and features in genome browsers for the eight legume species in LegumeInfo, PeanutBase, and SoyBase. Work on PeanutBase ( in the last year has included: analysis of the genome sequences of the diploid progenitors of cultivated peanut (for submission in July, 2015); submission of these genome sequences to GenBank for long-term maintenance; a gene expression atlas for peanut, showing which genes are expressed in 22 different tissues and developmental stages; new genetic trait and marker information; new marker-assisted selection pages for peanut breeders and other researchers; addition of more than a thousand germplasm images of peanut varieties (seeds, pods, and plants). A tutorial for PeanutBase was also presented to about 160 peanut researchers at the main international peanut research meeting, “Advances in Arachis Genetics and Genomics”, in Savannah, Georgia. Both the PeanutBase and Legume Information System projects have also included substantial outreach to other software database developers, by sharing data-collection templates and software modules that can be used in other contexts – for example, modules for sequence search and display, and viewers for visualizing evolutionary relationships among related genes from different species.

4. Accomplishments
1. Identification of a key step in the origin of the nitrogen-fixing capacity of legume plants. The high protein content in legume plants such as soybean, chickpea, and alfalfa derives from symbiotic relationships that these plants have with nitrogen-fixing soil bacteria called rhizobia. How this plant-bacteria symbiosis evolved has been a biological puzzle, with important applied applications. Are there ways to increase the efficiency of this symbiosis, or even introduce this relationship into other crop plants? ARS researchers in Ames, IA sequenced and examined the genes of 20 diverse legume plants, including both those with the nitrogen-fixing symbiosis and without. They determined that the symbiosis arose at least twice, separated by about 55 million years; and that prior to the origin of the symbiotic capacity in these two groups, the chromosomes (the genomes) doubled in the ancestral legume species in each of these groups. Thus, genome duplication may have been an important precursor step for the evolution of nitrogen-fixing symbiosis in plants. Understanding of this fundamental capability of this group of plants is important for breeding new crop varieties with improved nitrogen-fixing ability. In turn, this is important for farmers, consumers, and the environment, because nitrogen fertilization derived from this symbiosis is more cost-effective, reduces dependence on fossil-fuel-derived nitrogen fertilizer, and reduces excess nitrogen runoff into waterways.

2. Characterization of a group of genes that help soybean and other plants respond to drought and salinity. Drought and salinity are major concerns for farmers throughout the world. Salinity often comes with irrigation, so it occurs in many arid parts of the world. ARS researchers in Ames, IA characterized a family of related genes in soybean, and identified several of these that are particularly active during soybean response to salt and desiccation stress. The corresponding genes in other species, including rice and the model plant Arabidopsis thaliana, have also been shown to help those plants respond to salt and drought. These genes are candidates for enhancement in soybean and other crops.

3. Description of a breeding collection for "potato bean": a potential new legume crop. Reliance on a small number of plant species for our major food sources increases our vulnerability to failures in the food system. Therefore, new crops should be of great interest to farmers and consumers. A native North American bean relative, Apios americana (sometimes called "potato bean" or "ground nut"), was once a staple crop of Native American Indians. This plant produces high-protein, potato-like tubers, which grow along underground shoots (stolons). ARS researchers from Ames, IA selected lines from an Apios breeding program that began in the 1980s, and evaluated and identified several high-yielding cultivars. The plant is also able to grow in some challenging soil conditions, including wet or waterlogged soil. The selected varieties show good promise as a new crop -- one that is nutritious, disease resistant, high-yielding, easy to cook, and pleasing to eat. The plant shows good promise as a crop that can help diversify our food system.

Review Publications
Oellrich, A., Walls, R.L., Cannon, E., Cannon, S.B., Cooper, L., Gardiner, J., Gkoutos, G.V., Harper, E.C., He, M., Hoehndorf, R., Jaiswal, P., Kalberer, S.R., Lloyd, J., Meinke, D., Menda, N., Moore, L., Nelson, R., Pujar, A., Lawrence, C.J., Huala, E. 2015. An ontology approach to comparative phenomics in plants. Plant Methods. 11:10. DOI: 10.1186/s13007-015-0053-y.

Cannon, S.B., McKain, M.R., Harkess, A., Nelson, M.N., Dash, S., Deyholos, M.K., Peng, Y., Joyce, B., Stewart, C.N., Rolf, M., Kutchan, T., Xuemei, T., Chen, C., Zhang, Y., Carpenter, E., Wong, G., Doyle, J., Leebens-Mack, J. 2014. Multiple polyploidy events in the early radiation of nodulating and non-nodulating legumes. Molecular Biology and Evolution. 32(1):193-210. DOI: 10.1093/molbev/msu296.

Belamkar, V., Wenger, A., Kalberer, S.R., Bhattacharya, G.V., Blackmon, W.J., Cannon, S.B. 2015. Evaluation of phenotypic variation in a collection of Apios americana: an edible tuberous legume. Crop Science. 55(2):712-726. DOI: 10.2135/cropsci2014.04.0281.

Belamkar, V., Weeks, N.T., Bharti, A.K., Farmer, A.D., Graham, M.A., Cannon, S.B. 2014. Comprehensive characterization and RNA-Seq profiling of the HD-Zip transcription factor family in soybean (Glycine max) during dehydration and salt stress. Biomed Central (BMC) Genomics. 15:950. DOI:10.1186/1471-2164-15-950.

Martin, K.M., Hill, J.H., Cannon, S.B. 2014. Occurrence and characterization of Bean common mosaic virus strain NL1 in Iowa. Plant Disease. 98(11):1593. DOI: 10.1094/PDIS-07-14-0673-PDN.