Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Research Project #434522

Research Project: SoyBase and the Legume Clade Database

Location: Corn Insects and Crop Genetics Research

2019 Annual Report

Objective 1: Accelerate trait analyses, germplasm analyses, genetic studies, and breeding of soybean and other economically important legume crops through stewardship of genomes, genetic data, genotype data, and phenotype data. Objective 2: Develop an infrastructure that enhances the integration of genotype and phenotype information and corresponding data sets with query and visualization tools to facilitate efficient plant breeding for soybean and select legume crops. Objective 3: Collaborate with database developers and plant researchers to develop improved methods and mechanisms for open, standardized data and knowledge exchange to enhance database utility and interoperability. Objective 4: Provide support and research coordination services for the soybean and other legume research and breeding communities; train new scientists and expand outreach activities through workshops, web-based tutorials, and other communications.

Incorporate revised primary reference genome sequence for soybean into SoyBase. House and provide access to genome sequences for other soybean accessions, haplotype data, and related annotations. Incorporate revised gene models and annotations into SoyBase. Install or implement web-based tools for curation and improvement of soybean gene models and gene annotations. Incorporate available legume genome sequences and annotations. Working with collaborators, collect and add genetic map and QTL data for crop legumes. Extend web-based tools for navigation among biological sequence data across the legumes. Extend and develop methods and storage capacity for accepting genomic data sets for soybean and other legume species. Develop a complete set of descriptors (ontologies) for soybean biology (anatomy, traits, and development), and for other significant crop legumes as needed. Work with the relevant ontology communities-of-practice to incorporate these descriptors into broadly accessible ontologies. Develop web tutorials for important typical uses of SoyBase and the Legume Clade Database. Present and train about features at relevant conferences and workshops. Regularly seek feedback from users about desired features and usability.

Progress Report
Work in the Legume Clade Database project had four main focuses over the past year: (1) to complete, describe, and incorporate several genome assemblies; (2) to examine and describe major evolutionary events in the legume family; (3) to develop methods for comparing and incorporating large genotyping data sets for soybean; and (4) website and database development. In the first focus area, the group participated in completion and publication of four soybean genomes, including two wild soybean accessions and two widely-used cultivated varieties. This work was conducted in collaboration with researchers at the University of Missouri, the University of Western Australia, the Hudson Alpha Institute for Biotechnology, and the Chinese University of Hong Kong. Also completed in this project period were high-quality genome assemblies of peanut and cowpea, both involving international collaborations. The genome assemblies for these important crops will assist researchers in making more accurate predictions of gene function, and to more efficiently select for important agricultural traits. In the second focus area, the group reported important features of the evolutionary history of the legume family, including a reconstruction of chromosome evolutionary history for all crop legume species (e.g. chickpea, common bean, cowpea, peanut, alfalfa). This work provides a roadmap for researchers to trace which genes correspond among these species. As genes are typically under local control within chromosomes, chromosomal disruptions that occur over the course of evolution can change how genes are regulated and what genetic markers they associate with. The group also reported older evolutionary features in the legume family, including an early-diverging group of species that have a simpler genomic structure than all other legume species examined to date (“simpler” in the sense of lacking a genome duplication that is seen in other legume species). This discovery provides a basis for understanding how genes that affect important agronomic traits have changed and diversified among the many crop legume species. In the third focus, the group worked out methods for incorporating and comparing large genotyping data sets in soybean (and, in principle, for any species). Genotyping experiments involve determining the DNA letters at many (potentially, millions of) chromosomal locations for many (potentially, thousands of) cultivars or varieties. Preparing the data to allow comparisons between experiments is nontrivial, but important. Enabling these comparisons helps unlock results from disparate research projects and from different countries and research groups. Comparisons can be used to test and validate research results across studies and can help identify germplasm resources (cultivars or wild accessions) that can be used in breeding projects; for example, to identify and incorporate traits such as salt-tolerance or disease resistance. In the fourth focus, the group has continued the development of the project websites: SoyBase ( and LegumeInfo/Legume Information System (LIS: SoyBase has added four new genome assemblies and associated gene predictions, including tools for browsing and exploring these resources. Two of the assemblies are for important U.S. cultivars and two are of distinct wild soybean relatives (from the species Glycine soja). The wild relatives can be crossed with cultivated soybean, and may be used for breeding new varieties with improved disease-resistance and drought- and salt-tolerance qualities. LegumeInfo/LIS has also received numerous additions and updates, including: genome assemblies and associated resources for peanut and cowpea, a new interactive viewer for comparing multiple genome assemblies. The data has also been made available in a search tool called InterMine for exploring data for five crop legumes (common bean, chickpea, cowpea, peanut, and soybean). The work on LegumeInfo/LIS has been the work of a Cooperative Agreement with the National Center for Genome Resources in Santa Fe, New Mexico.

1. Publication of the genome sequence for cowpea. Cowpea is a highly nutritious food, used both for its seed and leaves. It is relatively tolerant of drought and heat and is a vital food source in many countries; particularly in Sub-Saharan Africa, where cowpea originated. Cowpea is also valuable for its ability, shared with many legume crops, to fix atmospheric nitrogen into a form of fertilizer that is directly useable by plants, enabling cowpea to be grown on relatively nutrient-poor soils. ARS researchers in Ames, Iowa, through a collaboration with researchers at the University of California, Davis, reported the complete genome sequence and predicted genes for cowpea. This research identifies a large genomic change in a chromosomal region that is responsible for resistance to a parasitic plant (Striga or "witchweed") that is a serious problem in cowpea fields in Africa. The research also identifies a probable genetic factor that is responsible for the desirable traits of large seed and pod size in cowpea. This work will assist breeders and other scientists to more rapidly develop improved cowpea varieties, to benefit both small-holder farmers and consumers worldwide.

2. Publication of the genome sequence for peanut. Peanut is a major crop in the U.S. and in many countries around the world. It is particularly important as a subsistence crop in many developing countries, where its high protein and oil content and its edibility without cooking make it especially valuable. Working with a large international research consortium, including ARS collaborators in Stoneville, Mississippi, and Tifton, Georgia, ARS researchers in Ames, Iowa reported the complete genome sequence for a variety of peanut that is important in the U.S. The genome sequence consists of all of the DNA letters for all of the chromosomes. It provides a valuable road map for breeders and plant biologists. The genome sequence contains all the genes that control the characteristics of the plant. The sequence can also be used to identify genetic markers that plant breeders use to more efficiently target and improve traits of interest. Among the new findings from this research is an explanation for a unique way in which new variation arises to generate new peanut varieties: sequences exchange between sets of corresponding chromosomes. This mechanism is uncommon. It is seen in peanut because domesticated peanut contains two complete sets of chromosomes that are highly similar, but not identical. The similarity allows occasional pairings and exchanges between the similar chromosomes. The genome sequence of peanut will help breeders and other researchers improve this important crop and develop varieties with improved nutrition and better response to environmental stresses.

3. Publication of the genome sequence for wild soybean. Plant breeders and scientists work to identify what genes are responsible for important traits (yield, nutrition, etc.) and where these genes are located within the species of interest’s DNA. Sequencing a species’ genome, down to the level of individual DNA bases, helps researchers link genes with traits. The soybean genome sequence, developed from one variety, has been available for the last eight years and has enabled many discoveries about gene function. However, more rapid progress could be made if multiple genome sequences, for distinct soybean varieties, could be examined to see how DNA changes alter particular traits. The work reported by ARS researchers in Ames, Iowa, conducted as part of a collaboration with researchers at the Chinese University of Hong Kong, is the complete, high-resolution sequence of approximately one billion DNA bases for the closest wild relative of soybean (Glycine soja). This sequence will allow researchers to determine how changes during domestication occurred. For example, identifying genes that changed wild soybean from a vine with small, hard seeds, into domesticated soybean, a robust, short plant with larger, softer seeds. The genome sequence for wild soybean is also of use because it will help researchers more finely pinpoint regions responsible for valuable traits, and assist breeders to more efficiently move desirable genes into cultivated varieties, to benefit farmers and consumers worldwide.

Review Publications
Min, X., Yik-Lok, C., Man-Wah, L., Fuk-Ling, W., Xin, W., Ailin, L., Zhili, W., King-Yung, A., Tin-Hang, W., Suk-Wah, I., Zhixia, X., Kejing, F., Ming-Sin, N., Linfeng, Y., Tianquan, D., Lijuan, H., Lu, C., Aisi, F., Qiong, D., Junxian, H., Gyuhwa, C., Sachiko, I., Babu, V., Nguyen, H., Cannon, S.B., Foyer, C.H., Ting-Fung, C., Hon-Ming, L. 2019. A reference-grade wild soybean genome. Nature Communications. 10:1216.
Stai, J.S., Yadav, A., Sinou, C., Bruneau, A., Doyle, J.J., Fernandez-Baca, D., Cannon, S.B. 2019. Cercis: A non-polyploid genomic relic within the generally polyploid legume family. Frontiers in Plant Science. 10:345.
Ren, L., Huang, W., Cannon, S.B. 2019. Reconstruction of ancestral genome reveals chromosome evolution history for selected legume species. New Phytologist.
Assefa, T., Rubyogo, J., Mahama, A.A., Brown, A.V., Cannon, E., Blair, M.W., Cannon, S.B. 2019. A review of breeding objectives, genomic resources, and marker-assisted methods in common bean (Phaseolus vulgaris L.). Molecular Breeding. 39:20.
Bertioli, D.J., Jenkins, J., Clevenger, J., Dudchenko, O., Gao, D., Seijo, G., Leal-Bertioli, S., Ren, L., Farmer, A., Pandey, M., Samoluk, S.S., Abernathy, B., Agarwal, G., Ballen-Taborda, C., Cameron, C., Campbell, J., Chavarro, C., Chitikineni, A., Chu, Y., Dash, S., El Baidouri, M., Guo, B., Huang, W., Kim, K.D., Korani, W., Lanciano, S., Lui, C.G., Mirouze, M., Moretzsohn, M.C., Pham, M., Shin, J.H., Shirasawa, K., Sinharoy, S., Sreedasyam, A., Weeks, N.T., Zhang, X., Zheng, Z., Sun, Z., Froenicke, L., Aiden, E.L., Michelmore, R., Varshney, R.K., Holbrook Jr, C.C., Cannon, E.K., Scheffler, B.E., Grimwwood, J., Ozias-Akins, P., Cannon, S.B., Jackson, S.A., Schmutz, J. 2019. The genome sequence of segmental allotetraploid peanut Arachis hypogaea. Nature Genetics. 51:877-884.
Brown, A.V., Campbell, J.D., Assefa, T., Grant, D.M., Nelson, R., Weeks, N.T., Cannon, S.B. 2018. Ten quick tips for sharing open genomic data. PLoS Computational Biology.
Lonardi, S., Munoz-Amatriain, M., Liang, Q., Shu, S., Wanamaker, S., Lo, S., Tanskanen, J., Zhu, T., Schulman, A.H., Luo, M., Alhakami, H., Ounit, R., Abid, H., Verdier, J., Roberts, P.A., Santos, J., Ndeve, A., Dolezel, J., Vrana, J., Hokin, S.A., Farmer, A.D., Cannon, S.B., Close, T.J. 2019. The genome of cowpea (Vigna unguiculata [L.] Walp.). Plant Journal. 98(5)767-782.