Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Research Project #434522

Research Project: SoyBase and the Legume Clade Database

Location: Corn Insects and Crop Genetics Research

2018 Annual Report

Objective 1: Accelerate trait analyses, germplasm analyses, genetic studies, and breeding of soybean and other economically important legume crops through stewardship of genomes, genetic data, genotype data, and phenotype data. Objective 2: Develop an infrastructure that enhances the integration of genotype and phenotype information and corresponding data sets with query and visualization tools to facilitate efficient plant breeding for soybean and select legume crops. Objective 3: Collaborate with database developers and plant researchers to develop improved methods and mechanisms for open, standardized data and knowledge exchange to enhance database utility and interoperability. Objective 4: Provide support and research coordination services for the soybean and other legume research and breeding communities; train new scientists and expand outreach activities through workshops, web-based tutorials, and other communications.

Incorporate revised primary reference genome sequence for soybean into SoyBase. House and provide access to genome sequences for other soybean accessions, haplotype data, and related annotations. Incorporate revised gene models and annotations into SoyBase. Install or implement web-based tools for curation and improvement of soybean gene models and gene annotations. Incorporate available legume genome sequences and annotations. Working with collaborators, collect and add genetic map and QTL data for crop legumes. Extend web-based tools for navigation among biological sequence data across the legumes. Extend and develop methods and storage capacity for accepting genomic data sets for soybean and other legume species. Develop a complete set of descriptors (ontologies) for soybean biology (anatomy, traits, and development), and for other significant crop legumes as needed. Work with the relevant ontology communities-of-practice to incorporate these descriptors into broadly accessible ontologies. Develop web tutorials for important typical uses of SoyBase and the Legume Clade Database. Present and train about features at relevant conferences and workshops. Regularly seek feedback from users about desired features and usability.

Progress Report
Work in the Legume Clade Database project in this project period has focused on finalization of the genome assemblies for three new soybean reference genomes: cultivars Williams 82 (an updated assembly of the primary reference accession), Lee (a new assembly of a southern U.S. accession), and Glycine soja (a wild soybean relative). This work was conducted in collaboration with researchers at the University of Missouri, the University of Western Australia, and the HudsonAlpha Institute for Biotechnology. These improved genome assemblies will assist researchers in making more accurate predictions of gene function, and to more efficiently select for important agricultural traits. Also in the project period, the genome assembly and gene predictions for cultivated peanut were made available at PeanutBase and LegumeInfo. This represents the culmination of half a decade of concerted effort by U.S. and international researchers through the International Peanut Genome Initiative project. Another major project completed in the first quarter of the project period was a rewrite of the CMap software, for display of genetic maps. The CMap-js software ( is a highly responsive web application that replaces CMap, which uses much older and less responsive web technologies. The CMap-js software is anticipated to replace CMap in hundreds of instances worldwide. Lastly, incorporation of the cowpea genome assembly and genes into LegumeInfo. All of the above genomes and genes are available for browsing and searching through the respective project web portals: SoyBase, LegumeInfo, and PeanutBase.

1. Genetic maps are widely used in plant breeding and research, in order to determine how genetic markers and genetic traits are localized on chromosomes. Visualizing these relationships requires software for displaying and exploring mapped traits and markers. For more than a decade, the predominant visualization software for this purpose has been CMap – but this was developed using older software visualization methods, and has needed to be rewritten in order to accommodate higher marker densities and new characteristics of genomic data. The CMap-js software, written by ARS researchers in Ames, Iowa, is a complete rewrite of the widely-used CMap software, employing current web technologies. CMap-js is available at One example of the software, in-use with genetic and genomic maps for common bean, is available at LegumeInfo ( The CMap-js software is anticipated to replace CMap in hundreds of instances worldwide, and to play an important role in crop genetic improvement, by helping researchers to identify markers for important traits and identify corresponding genetic factors in related species.

Review Publications
Blair, M.W., Cortes, A.J., Farmer, A.D., Huang, W., Penmetsa, V., Cannon, S.B. 2018. Uneven recombination and linkage disequilibrium across a reference SNP map for common bean (Phaseolus vulgaris L.). PLoS One. 13(3):e0189597.