Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Soybean Genomics & Improvement Laboratory » Research » Publications at this Location » Publication #411590

Research Project: Characterization and Utilization of Genetic Diversity in Soybean and Common Bean and Management and Utilization of the National Rhizobium Genetic Resource Collection

Location: Soybean Genomics & Improvement Laboratory

Title: Long-read sequencing reveals novel structural variation markers for key agronomic and quality traits of soybeans

Author
item WANG, ZHIBO - Virginia Polytechnic Institution & State University
item BELAY, KASSAYE - Virginia Polytechnic Institution & State University
item PATERSON, JOE - Virginia Polytechnic Institution & State University
item BEWICK, PATRICK - Virginia Polytechnic Institution & State University
item SONGER, WILLIAM - Virginia Polytechnic Institution & State University
item Song, Qijian
item ZHANG, BO - Virginia Polytechnic Institution & State University
item LI, SONG - Virginia Polytechnic Institution & State University

Submitted to: Frontiers in Plant Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/17/2025
Publication Date: 4/8/2025
Citation: Wang, Z., Belay, K., Paterson, J., Bewick, P., Songer, W., Song, Q., Zhang, B., Li, S. 2025. Long-read sequencing reveals novel structural variation markers for key agronomic and quality traits of soybeans. Frontiers in Plant Science. 16. Article e1557748. https://doi.org/10.3389/fpls.2025.1557748.
DOI: https://doi.org/10.3389/fpls.2025.1557748

Interpretive Summary: Decades of research have shown that structural variation (SV), including deletions, insertions, duplications and chromosomal rearrangements, is an important element in plant evolution, affecting traits such as branch structure, flowering time, seed size and stress resistance. Third-generation long-read sequencing technology is revolutionizing plant genomics, providing unprecedented opportunities to identify SVs that short-read sequencing cannot reliably capture. Although a number of soybeans have been sequenced, there are significant gaps in current soybean whole genome sequences: most of the resequenced genotypes came from Chinese breeding programs, which are not available in the United States; efforts have focused primarily on animal feed soybean genotypes not intended for human consumption, such as natto, edamame, bean sprouts, tofu and soy milk. Furthermore, previously reported SVs were almost all identified using short DNA sequence reads, which may have lower reliability in identifying SVs. We resequenced 29 soybean varieties used for food consumption using nanopore long-read sequencing technology, identified SVs from the varieties using long-read sequence, experimentally verified the association of SVs with soybean production and food quality traits and deposited the sequences and SVs into the public domain. This study not only adds valuable resources for marker development, but also aids in understanding the underlying mechanisms controlling soybean traits and conducting other basic and applied genetic research.

Technical Abstract: In plant genomic research, long read sequencing has been widely used to detect structure variations that are not captured by short read sequencing. In this letter, we described an analysis of whole genome re-sequencing of 29 soybean varieties using nanopore long-read sequencing. The compiled germplasm reflects diverse applications, including livestock feeding, soy milk and tofu production, as well as consumption of natto, sprouts, and vegetable soybeans (edamame). We have identified 365,497 structural variations in these newly re-sequenced genomes and found that the newly identified structural variations are associated with important agronomic traits. These traits include seed weight, flowering time, plant height, oleic acid content, methionine content, and trypsin inhibitor content, all of which significantly impact soybean production and quality. Experimental validation supports the roles of predicted candidate genes and structural variant in these biological processes. Our research provides a new source for rapid marker discovery in crop genomes using structural variation and whole genome sequencing.