Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Soybean Genomics & Improvement Laboratory » Research » Research Project #434471

Research Project: Characterization of Genetic Diversity in Soybean and Common Bean, and Its Application toward Improving Crop Traits and Sustainable Production

Location: Soybean Genomics & Improvement Laboratory

2022 Annual Report

Objective 1: Discover QTL and genes controlling biotic and abiotic stress tolerance, and agronomic and quality traits in soybean and common bean and develop new DNA markers that define haplotype variation across new and previously identified genomic regions. [NP301, C1, PS1A; C3, PS3B] The aim of objective 1 is to develop community resources for efficient identification of genes/QTL impacting a range of traits and to facilitate marker assisted selection of alleles in soybean and common bean in collaboration with breeders. These include highly polymorphic markers, core germplasm collection and genotypic datasets of new exotic elite germplasm introduced to USDA Soybean Germplasm Collection. Objective 2: Evaluate diverse soybean populations developed from hybridization with wild soybean to discover unique QTL controlling seed protein and oil content, develop molecular markers, and make these available to breeders for improving soybean quality. [NP301, C1, PS1A; C3, PS3B] As many wild soybean germplasm may has different alleles controlling high protein and oil content than cultivated soybean, here we will explore wild soybean for the improvement of U.S. soybean seed protein and oil content with the markers developed from Objective 1 and genomic tools previously developed in our laboratory. Objective 3: Characterize genetic diversity of the Soybean Rhizobium Germplasm Collection using whole genome sequencing, evaluate nitrogen fixation efficiency of the core strains, and use the information to identify rhizobium genes associated with host-specific nodulation and nitrogen fixation in specific soybean genotype/rhizobium symbioses. [NP301, C1, PS1A; C3, PS3B] Genetic diversity of the rhizobia will be evaluated using genomic information and their influence on the nitrogen fixation efficiency in soybean will be analyzed. The research will result in the identification of efficient strains and genes for enhanced nitrogen fixation in soybean, resulting in better utilization of the diversity of rhizobium strains and soybean ancestors to improve biological nitrogen fixation in commercial soybean cultivars.

Objective 1: Solexa short genomic DNA sequences from 16 diverse genotypes of different common bean market classes will be aligned to the common bean whole genome sequence (WGS) for SSR marker discovery. After filtering, primer sets will be designed to amplify the SSRs. A subset of 100 primer pairs will be randomly selected for testing polymorphism using genomic DNA from the 16 diverse common bean genotypes. A total of 12 pairs of diverse genotypes from different market classes of the Andean Diverse Panel of common bean will be sequenced. Called SNPs will be filtered based on a number of factors for beadchip assay. SNPs that are polymorphic within multi- market classes will be added to the Illumina Infinium BARCBean6K_3 BeadChip pool or used for KASP markers to fine map gene/QTL in targeted genomic regions. Based on the SNP data of the >18,000 cultivated soybean accessions assayed with SoySNP50K BeadChip, core sets of soybean accessions for each soybean maturity group will be created. The software Core Hunter 3 will be used to select the core collection with high allelic richness. Objective 2: a nested association mapping panel consisting of 150-300 F6 lines from each of 10 crosses of NC-Raleigh x wild soybean from the wild soybean core collection will be developed. The parents and the RILs will be grown in the field at two locations in two years. DNA isolated from the RILs and parents will be genotyped with Illumina BARCSoySNP6K BeadChips. Protein content and oil content of the parents and lines will be measured using a DA 7250 NIR Analyzer. The dataset will be used to identify QTL, genes and haplotypes controlling high seed protein and oil content in wild soybean that will be used for improving cultivated soybean and to predict accuracy of genomic selection. Objective 3: Genomic DNA of 760 soybean Bradyrhizobium strains will be isolated and sequenced at using NextSeq500 Sequencer. The resulting sequence will be aligned to the WGS of the B. japonicum strain USDA110 for variant discovery. Redundant or highly similar strains with 99.9% similarity among the soybean rhizobia will be identified. Within each cluster with 99.9% similarity, an accession from each cluster will be evaluated for nitrogen fixation efficiency using 8 ancestral cultivars which contribute more than 70% of the genetic diversity to the Southern and Northern American elite cultivars. Plant will be measured for chlorophyll content and biomass with or without inoculation of the stains, and scored for plant vegetative growth based on the growth of the plant inoculated with USDA110, a recommended soybean strain. The test in eight ancestors will be carried out in a greenhouse with replications.

Progress Report
Progress was made to develop a database for Short Sequence Repeat (SSR) markers. We screened SSRs based on the whole genome sequence of the Phaseolus vulgaris v1.0 assembly and identified a total of 85,646 SSRs with di-, tri- or tetra-nucleotide repeats of five or more. Among these, 16,225, 2597 and 97 SSRs contained repeat units of di-(=10), tri-(=8), and tetra-nucleotide (=7), respectively. As in soybean, (AT)n, (ATT)n and (AAAT)n are the most abundant motifs among di-, tri- and tetra-nucleotide SSRs, respectively. In addition, we previously sequenced a set of 52 diverse Andean common bean germplasm from different market classes (beige, dark red kidney, small red, yellow, red mottled, brown, purple mottled, cranberry, light red kidney, white, purple speckled and small red) of the Andean Diverse Panel with an average of 25x genome sequence coverage per genotype and identified 1.2 million indels among the genotypes, the indels associated with the SSRs were used to further evaluate the polymorphism of the SSRs. After screening for a number of factors including locus-specificity using e-PCR, and polymorphism among the accession, a common bean SSR database (BARCPvSSR_1.0) containing genome position, primer sequences and motif types of 13,700 SSRs was created. To examine the likelihood that primers in the database would function to amplify locus-specific polymorphic products, 1152 primer sets were evaluated by amplifying DNAs of 18 diverse common bean accessions from market classes: dark red kidney, light red kidney, black, navy, pinto, great northern, yellow, and tan. A total of 1064 (92.4%) of the primer sets amplified a single polymerase chain reaction (PCR) product and 759 (65.9%) amplified polymorphic amplicons as determined by 3.5% agarose gel electrophoresis. We have provided some SSR markers to the USDA-ARS scientists at East Lansing, Michigan, Prosser, Washington, and Beltsville, Maryland, for the mapping of genes controlling the traits associated with the common bean slow darkening trait and disease resistance. The 2nd year field test of the populations from the crosses between cultivated soybean and wild soybean to discover quantitative trait loci (QTL) controlling seed protein and oil content in wild soybean and to genotype the populations under Objective 2 was completed. A total of >1000 recombinant inbred lines (RILs) derived from the ten G. max x G. soja crosses (NC-Raleigh × PI 549032, NC-Raleigh × PI378684B, NC-Raleigh × PI378690, NC-Raleigh × PI378696B, NC-Raleigh × PI407020, NC-Raleigh × PI407228, NC-Raleigh × PI424007, NC-Raleigh × PI424045, NC-Raleigh × PI424083A, and NC-Raleigh × PI562551) were grown at North Carolina for one replication and Beltsville for two replications in the second year. Protein content and oil content for >3,000 plots were obtained in collaboration with researchers at University of Georgia. Because genotypic data from all the lines and parents were assayed with BARCSoySNP6K and genomic variants from the parents were obtained via whole genome sequencing analysis, we imputed the RIL with the parent’s genotypes using the optimized parameters in the computer programs AlphaPlantimpute and the dataset is used for the detection of alleles controlling protein and oil content from wild soybean. In addition, via collaboration with the scientist at USDA-ARS, St Louis, Missouri, a CCT-motif gene controlling the most important QTL of protein content and oil content on Chromosome 20 was determined. The analysis was based on the linkage maps and QTL identified from one of the G. max × G. soja populations developed at USDA-ARS, Beltsville, Maryland, transcriptome analysis, etc. A 304-bp transposable element insertion in the CCT gene was associated with a 5.5% oil increase and a 5.7% protein decrease. Similar regulation of the seed traits by the CCT gene in Arabidopsis was also observed. The results lay a foundation to further understand molecular mechanism underlying protein and oil content and to design strategies to effectively improve soybean seed quality. The manuscript describing the study is published at Nature Communication. Progress was made in the characterization of genetic diversity of the USDA Rhizobium Germplasm Collection using whole genome sequencing under Objective 3. All 760 soybean Bradyrhizobium strains were cultured for genomic DNA extraction and high-quality DNA from 650 strains was successful obtained. High-quality and sufficient DNA for the remaining strains could not be obtained due to slow-growth, low DNA yield or DNA degradation of the strains. The 650 strains are being sequenced in collaboration with Joint Genome Institute, Department of Energy. Assembled whole genome sequences for approximately 100 of the isolates are available for public access at the JGI website ( Of the 650 accessions, a diverse set of 100 soybean rhizobium accessions were selected to be evaluated on eight soybean ancestors (Lincoln, Mandarin (Ottawa), CNS, Richland, S-100, Ogden, AK (Harrow) and Dunfield) as well as Williams 82 for chlorophyll content of soybean leaves, the number of nodules on the root and plant growth in the greenhouse with four replications. A distinct difference of chlorophyll content and vegetative growth of soybean were consistently observed after being inoculating with different strains. The results showed that some strains such as USDA20, 27, 36, 45, 48, 50, 64, and 70 were efficient to fix nitrogen across all ancestors. Some strains were restricted in specific soybean genotypes, e.g. CNS restricts USDA 1 and USDA 7, Dunfield restricts USDA 67 and 83, Ogden restricts USDA74. The rhizobium sequence variants and phenotypic information will be used to discover candidate variants/genes that are associated with nitrogen fixation efficiency in rhizobium. Progress was made in the discovery of genes or QTL controlling disease resistance, agronomic and seed composition traits in soybean and common bean. The molecular markers and assays such as the BARCSoySNP6K,3K for soybean and the BARCBean12K for common bean, which were developed by USDA-ARS scientists at Beltsville, Maryland, were used to analyze soybean and common bean genetic populations created by collaborators across the U.S. and other countries. The analyses resulted in mapping of genomic regions or genes controlling numerous soybean traits including aluminum tolerance, seed protein and major soybean composition traits in collaboration with researchers in University of Missouri and University of Georgia and cloning of a high protein gene from wild soybean in collaboration with researchers at Danforth Plant Science Center at St Louis. In common bean, the analyses led to the mapping of loci associated with soybean cyst nematode resistance, and seed quality traits including cooking time, flavor, and texture in yellow dry bean in collaboration with researchers at USDA-ARS, East Lansing, Michigan, and University of Arkansas.

1. Discovery of the gene controlling a major protein content locus in soybean. The seed protein content in U.S. commercial cultivars released over the past several decades has gradually decreased to a level below the threshold used in the animal feed industry, which impacts competitiveness of U.S. soybean in the world. Wild soybean is a valuable source to improve cultivated soybean seed quality due to its higher protein content than cultivated soybean. Previous studies reported a major locus controlling high protein content from wild soybean on chromosome 20, however, the corresponding gene for the locus is unknown. Having applied a multi-disciplinary approach, USDA-ARS scientists at Beltsville, Maryland, and St. Louis, Missouri, discovered a key gene regulating soybean seed protein content on the chromosome. A 304-bp transposable element insertion in the CCT gene was responsible for the decreased protein content in soybean. The results lay a foundation to further understand molecular mechanism underlying the trait, and to design new strategies to efficiently improve soybean seed quality through genome editing and molecular breeding approaches.

2. Identification of the best tools for soybean genotype imputation. In modern breeding programs, germplasm is frequently required to be genotyped with a large number of molecular markers in order to track or identify genomic regions associated with traits of interest. Although sequencing approaches are cheaper than ever before, it is still costly for large scale studies. Genotype imputation is a method to infer breeding line marker genotypes using markers in a reference population, but knowledge of the best package and their imputation accuracy for self-pollinated crops like soybean is lacking. USDA-ARS scientists and collaborators in universities compared imputation performance of three commonly-used imputation software packages in soybean populations. They identified the most efficient software and optimized parameters for imputation in soybean. They also demonstrated that imputed datasets could significantly reduce the interval of genomic regions controlling seed quality traits, thus improving the efficiency of candidate gene identification. The information will help breeders and geneticists to improve genotyping imputation accuracy not only in soybean but in other self-pollinated crops as well. Results will facilitate fine-mapping genes controlling different traits and downstream utilization by soybean breeders.

3. Identification of soybean genetic resources for aluminum tolerance. Earth's crust consists of 8% aluminum, mostly as harmless oxides, and aluminosilicates. In acidic soil, however, aluminum can be solubilized into forms toxic to plants. Aluminum toxicity interferes with nutrient and water absorption, disrupts calcium homeostasis, inhibits root elongation and lateral root initiation, and consequently reduces crop yield or causes plant death. Genes that help alleviate aluminum toxicity have been studied in model plants like Arabidopsis and rice, but not in soybean. Researchers at University of Missouri and USDA-ARS, Beltsville, Maryland, identified aluminum tolerant soybean genetic resources and found two novel aluminum tolerance genes. They discovered that the aluminum tolerance resulted from leaf tissue tolerance instead of aluminum exclusion or transport alteration. The soybeans and novel genes controlling aluminum tolerance provide breeders new resources to help eliminate the effects of aluminum toxicity on commercial soybean production.

Review Publications
Chen, L., Yang, S., Araya, S., Quigley, C.V., Taliercio, E.W., Mian, R.M., Specht, J., Diers, B., Song, Q. 2022. Genotype imputation for soybean nested association mapping population to improve precision of QTL detection. Theoretical and Applied Genetics. 135(5), pp.1797-1810.
Mian, R.M., Mcneece, B.T., Gillen, A.M., Carter Jr, T.E., Bagherzadi, L. 2021. Registration of USDA-N6005 germplasm combining high yield, elevated protein and 25% pedigree from Japanese cultivar Tamahikari. Journal of Plant Registrations.
Zhang, H., Jiang, H., Hu, Z., Song, Q., An, Y. 2022. Development of a versatile resource for post-genomic research through consolidating and characterizing 1500 diverse wild and cultivated soybean genomes. BMC Genomics. 23. Article 250.
Wei, H., Zhang, H.Y., Liang, Y., Li, J.Y., Li, H.C., Song, Q., Wu, Y., Lei, C.F., Wang, S.W., Wang, J.S., Lu, W.G. 2022. Identification of candidate genes controlling soybean cyst nematode resistance in ‘Handou 10’ based on genomic and transcriptome analyses. Frontiers in Plant Science. 13. Article 860034.
Goettel, H.W., Zhang, H., Li, Y., Qiao, Z., Jiang, H., Hou, D., Song, Q., Pantalone, V., Song, B., Yu, D., An, Y. 2022. POWR1 is a domestication gene pleiotropically regulating seed quality and yield in soybean. Nature Communications. 13. Article 3051.
Gomes Viana, J.P., Fang, Y., Avalos, A., Song, Q., Nelson, R., Hudson, M.E. 2022. Impact of multiple selective breeding programs on genetic diversity in soybean germplasm. Theoretical and Applied Genetics. 135(5):1591–1602.
Arnold, B., Menke, E., Mian, R.M., Song, Q., Buckley, B., Li, Z. 2021. Mining QTL for elevated protein and other major seed composition traits from diverse soybean germplasm. Molecular Breeding.
Bayer, P.E., Valliyodari, B., Hu, H., Marsh, J., Yuan, Y., Vuong, T.D., Patil, G., Song, Q., Batley, J., Varshney, R.K., Lam, H., Edwards, D., Nguyen, H.T. 2021. Sequencing the USDA core soybean collection reveals gene loss during domestication and breeding. The Plant Genome. 15(1):e20109.
Viscarra-Torrico, R., Pajak, A., Garzon, A., Zhang, B., Pandurangan, S., Diapari, M., Song, Q., Conner, R.L., House, J.D., Miklas, P.N., Hou, A., Marsolais, F. 2021. Common bean (Phaseolus vulgaris L.) with increased cysteine and methionine concentration. Legume Science.
Shi, A., Gepts, P., Song, Q., Xiong, H., Michaels, T.E. 2021. Genome-wide association study and genomic prediction for soybean cyst nematode resistance in USDA common bean (Phaseolus vulgaris) core collection. Frontiers in Plant Science. 12:624156.
Li, Y., Ye, H., Song, L., Vuong, T.D., Song, Q., Zhao, L., Shannon, J., Li, Y., Nguyen, H.T. 2021. Identification and characterization of novel QTL conferring internal detoxification of aluminium in soybean. Journal of Experimental Botany. 72(13):4993-5009.
Bassett, A.N., Katuuramu, D.N., Song, Q., Cichy, K.A. 2021. QTL mapping of seed quality traits including cooking time, flavor, and texture in a yellow dry bean (Phaseolus vulgaris L.) population. Frontiers in Plant Science. 12:670284.
He, T., Ding, X., Zhang, H., Chen, L., Wang, T., Yang, L., Nie, Z., Song, Q., Gai, J., Yang, S. 2021. Comparative analysis of mitochondrial genomes of soybean cytoplasmic male sterile lines and their maintainer lines. Frontiers in Plant Science. 21(1):43-57.