Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Research Project #425040

Research Project: SoyBase and the Legume Clade Database

Location: Corn Insects and Crop Genetics Research

2016 Annual Report

1a. Objectives (from AD-416):
Objective 1: Support stewardship of soybean and other major reference legume genetic, genomic, and phenotypic datasets. Sub-objective 1.A Develop and deploy infrastructure to support both the current reference soybean genome sequence, improved versions of that sequence, and new re-sequenced soybean genomes and haplotype data. Sub-objective 1.B Develop processes and tools to provide access to soybean gene model structural and functional annotations as these are revised over time. Sub-objective 1.C Provide standardized access to reference genome and affiliated sequences for the major crop and model legume species. Sub-objective 1.D Curate high-quality soybean datasets created by the community at large. These may include expression, mutant, phenotype, epigenetic, haplotype, small-RNA, QTL, and other data types. Sub-objective 1.E Maintain infrastructure to enable acquisition, storage, and community access to major public data sets for various legume species. Objective 2: Cooperate with other database developers and plant researchers to develop gene and trait ontologies and open, standardized data exchange mechanisms to enhance database interoperability. Objective 3: Provide community support and research coordination services for the research and breeding communities for soybean and other legumes. Expand outreach activities through workshops, web-based tutorials, and other communications. Objective 4: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources.

1b. Approach (from AD-416):
Incorporate revised primary reference genome sequence for soybean into SoyBase. House and provide access to genome sequences for other soybean accessions, haplotype data, and related annotations. Incorporate revised gene models and annotations into SoyBase. Install or implement web-based tools for curation and improvement of soybean gene models and gene annotations. Incorporate available legume genome sequences and annotations. Working with collaborators, collect and add genetic map and QTL data for crop legumes. Extend web-based tools for navigation among biological sequence data across the legumes. Extend and develop methods and storage capacity for accepting genomic data sets for soybean and other legume species. Develop a complete set of descriptors (ontologies) for soybean biology (anatomy, traits, and development), and for other significant crop legumes as needed. Work with the relevant ontology communities-of-practice to incorporate these descriptors into broadly accessible ontologies. Develop web tutorials for important typical uses of SoyBase and the Legume Clade Database. Present and train about features at relevant conferences and workshops. Regularly seek feedback from users about desired features and usability.

3. Progress Report:
Soybean genomics and SoyBase. The U.S. soybean crop, valued in excess of $35 billion (USDA-NASS), depends on continued breeding improvements in order to achieve yield gains and avoid losses due to pathogens and environmental stresses. The USDA-ARS soybean genetics database,, provides access to the complete genome sequence for soybean, as well as to predicted genes, markers, valuable traits and their locations, and many other genetic features. The SoyBase database continues to be actively extended, through addition of publications that describe the locations of traits, genes, and features of interest. Updated data for fast neutron- and transposable element-generated mutants were added to SoyBase. In collaboration with GRIN (the USDA Germplasm Resources Information Network) a section of SoyBase was developed to allow searching of GRIN data and other community submitted data in a user friendly search page. The search results are presented in the context of other data in SoyBase. Data for micro-RNAs (miRNAs) in soybean were added to the SoyBase genome browser. The SoyBase video tutorial page has been updated with additional and improved tutorials. Poster and oral presentations about SoyBase were made at the Plant and Animal Genome meeting, the Soybean Precision Genomics and Mutant Finder Workshop, the Crop Science Society of America meeting, and the Soybean Breeders Workshop. A SoyBase tutorial was held at the Soybean Breeders Workshop and the Soy2016: Biennial Molecular and Cellular Biology of the Soybean Conference. Crop legume genomics, the Legume Information System, and allied databases. Approximately two-dozen species in the bean and pea family are grown as protein-rich crops. These provide a significant portion of the increasing global demand for protein and nutrition. In the U.S. alone, these crops have a market value in excess of $13 billion (USDA-NASS). We have worked in the past year with international collaborators to assemble and analyze the genome sequences of narrow-leafed lupin (cultivated as a high-protein seed crop, much like soybean), and have added the genome sequences for red clover, mungbean, and adzuki bean to the Legume Information System (LIS), We have also continued improving other aspects of LIS and PeanutBase,, for improved user experience and data-handling capacity. These Web resources now provide access to the genome and gene sequences of eleven legume species: common bean, pigeonpea, soybean (via SoyBase), chickpea, Medicago truncatula and Lotus japonicus (two forage and model research species), and Arachis duranensis and Arachis ipaensis (two wild relatives of peanut), and red clover, mungbean, adzuki bean. LIS also has a new viewer for interactively displaying wild peanut species and accessions on a geographical map. These resources will enable plant breeders and researchers to more rapidly develop new crop varieties with favorable yield, disease resistance, or stress tolerance characteristics. Work on the Legume Information System in the past year has included: improved viewers for genes in gene families (groups of related genes); new search capabilities for features such as plant traits, genetic markers, and genes; new gene descriptors for red clover, mungbean, and adzuki bean; and many improvements and features in genome browsers for the eight legume species in LegumeInfo, PeanutBase, and SoyBase. Work on PeanutBase ( in the last year has included: publication of the genome sequences of the diploid progenitors of cultivated peanut (February 2016 in Nature Genetics); two new gene expression atlas for peanut, showing which genes are expressed in under drought and under nematode attack, in resistant and susceptible varieties; a viewer for interactively displaying wild peanut species and accessions on a geographical map; new genetic trait and marker information; and new search tools for publications and data sets. Outreach to describe PeanutBase included a presentation to about 200 peanut researchers, growers, and industry representatives at the main U.S. peanut research meeting, “American Peanut Research and Education Society”, in Clearwater, Florida. Both the PeanutBase and Legume Information System projects have also included substantial outreach to other software database developers, by sharing data-collection templates and software modules that can be used in other contexts – for example, modules for sequence search and display, and viewers for visualizing evolutionary relationships among related genes from different species. In the past year, this outreach has increased through the project (NSF-funded, with ARS participation).

4. Accomplishments
1. First published description of the genome sequences of the two wild ancestors of cultivated peanut. Peanut is very important in human nutrition, providing a calorie-dense, versatile, high-protein food source - one that is especially unusual in that it is palatable without cooking or preparation. An international consortium of researchers, with key USDA-ARS participation by scientists from Ames, Iowa, Tifton and Athens, Georgia, and Stoneville, Mississippi, has sequenced the genomes of the two closest wild ancestors of cultivated peanut. Those ancestors merged to form a new species, which was domesticated to become modern cultivated peanut. An important finding of this research is that the unusual hybridization of these two species was likely the direct result of early agriculturalists in South America. The genome sequences from these wild species thus comprise essentially all of the genetic material from the modern cultivated peanut. This research will be used by plant researchers and breeders to more efficiently select improved peanut varieties, and to speed development of varieties that are well suited for growing in various regions of the world. The genome sequence has already been useful in helping identify mechanisms for resistance to root-knot nematodes and rust (a fungal disease), which are serious challenges for many peanut farmers.

2. Identification of the genome sequence of a newly identified variety of “Bean common mosaic virus” and the responses of two bean varieties. Common bean, which is used for dry seed and as fresh vegetable as snap beans, is the legume crop with highest direct consumption worldwide. Beans also play an important role in sustainable agriculture, through their ability to fix atmospheric nitrogen into for use as a nitrogen fertilizer. Viruses are a significant problem for bean growers. ARS researchers in Ames, IA sequenced the genes and determined the gene expression of two bean varieties in response to viral infection by two strains of a major bean virus, Bean common mosaic virus (BCMV). This work also determined the full genome sequences of two varieties of BCMV. One of the BCMV varieties was previously known, but one is newly reported in this work. An important result is that bean plants respond to BCMV by generating new gene forms, through “alternate splicing” of the gene sequences. Understanding these varied plant responses and the distinct BCMV strains will be helpful in diagnosing and avoiding viral infection in the bean crops of growers in the U.S. and worldwide.

5. Significant Activities that Support Special Target Populations:

Review Publications
Bertioli, D., Cannon, S.B., Froenicke, L., Huang, G., Farmer, A.D., Cannon, E., Dash, S., Liu, X., Barkley, N.L., Guo, B., Scheffler, B.E., et al. 2016. The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nature Genetics. 48:438-446. doi: 10.1038/ng.3517.

Yangcheng, H., Belamkar, V., Cannon, S.B., Jane, J. 2016. Characterization and development mechanism of Apios americana tuber starch. Carbohydrate Polymers. 151:198-205. doi:10.1016/j.carbpol.2016.05.062.

Dash, S., Campbell, J.D., Cannon, E., Cleary, A.M., Huang, W., Kalberer, S.R., Karingula, V., Rice, A.G., Singh, J., Umale, P.E., Weeks, N.T., Wilkey, A.P., Farmer, A.D., Cannon, S.B. 2015. Legume Information System ( a key component of a set of federated data resources for the legume family. 2016. Nucleic Acids Research. 44 (D1):D1181-D1188. doi: 10.1093/nar/gkv1159.

Gazave, E., Tassone, E.E., Ilut, D.C., Wingerson, M., Datema, M., Witsenboer, H., Davis, J.B., Grant, D.M., Dyer, J.M., Jenks, M.A., Brown, J., Gore, M.A. 2016. Population genomic analysis reveals differential evolutionary histories and patterns of diversity across subgenomes and subpopulations of Brassica napus L. Frontiers in Plant Science. 7:525. doi: 10.3389/fpls.2016.00525.

Dash, S., Cannon, E., Kalberer, S.R., Farmer, A.D., Cannon, S.B. 2016. PeanutBase and other bioinformatic resources for peanut. In: Stalker, T.H., Wilson, R.F., editors. Peanuts: Genetidcs, Processing, and Utilization (AOCS Monograph Series on Oilseeds). Waltham, MA: Academic Press and AOCS Press. p. 241-252.