Location: Soybean and Nitrogen Fixation Research
Title: Soybean genome-wide association study of seed weight, protein, and oil content in the southeastern USAAuthor
![]() |
PATEL, JINESH - Auburn University |
![]() |
PATEL, SEJAL - Auburn University |
![]() |
COOK, LAUREN - Auburn University |
![]() |
Fallen, Benjamin |
![]() |
KOEBERNICK, JENNY - Auburn University |
Submitted to: Molecular Genetics and Genomics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 1/22/2025 Publication Date: N/A Citation: N/A Interpretive Summary: Soybean is a principal legume crop due to its two main seed components: protein and oil. With an expanding global population and rising demand for oil and protein feed, soybean cultivation has substantially increased worldwide. Over the last two decades, soybean production has more than doubled, from 161 to nearly 353 million tons. Improving soybean yield and quality has been a major objective of soybean breeding programs for decades. Recently, efforts have been made to increase the seed oil and protein content in soybean. Soybean seeds are comprised of 40% protein, 20% oil, 35% carbohydrates, and 5% minerals on a dry weight basis, underscoring its nutritional significance. So, making improvements, especially to the protein or oil content could have significant economic value. Seed weight is a pivotal yield-related trait component, which could also be beneficial for improving seed composition. This study aimed to identify genomic regions and genes that control the components of seed weight, oil, and protein content for maturity group (MG) V soybeans in the southeastern United States. Over three years, a replicated study was conducted on 285 diverse accessions to analyze seed weight, oil, and protein content. The data collected from this multi-year field trial was analyzed to identify the genomic architecture of these traits through GWAS. A Genome-wide association study (GWAS) is a commonly used method to pinpoint genetic variations that affect complex traits. This strategy combines genome-wide markers with phenotypic data to identify variants associated with the trait(s) of interest. The study provides a base to select germplasms that can be integrated into breeding programs to optimize seed weight and improve oil and protein content simultaneously. Technical Abstract: Soybean is a globally significant legume crop that provides an essential source of protein and oil for human and livestock nutrition. Improving soybean yield and quality has always been a primary objective in soybean breeding efforts. Recently, efforts have been made to increase the seed oil and protein content in soybean and considerable efforts have been made to decipher the genetic basis of these traits. Biparental populations have been instrumental in this process and many new and improved cultivars have been developed. However, they do come with inherent limitations, particularly in terms of genetic diversity and the labor-intensive process of identifying QTLs. In this study, a Genome-Wide Association Study (GWAS) was conducted on 285 diverse soybean accessions genotyped using a 50 K SoySNP array to analyze three traits: seed weight, protein and oil content across three years. The markers with minor allele frequency >0. 05 were used to estimate linkage disequilibrium (LD) and subpopulations. As a result, a 249 kb block of LD and nine subpopulations were identified. The study identified 18, 23, and 26 significant SNPs associated with seed weight, seed oil, and protein content, respectively. Upon comparison of significantly associated regions from this study with previously reported QTLs, 10, 7, and 9 overlapping regions were found for seed oil, protein, and seed weight, respectively. By exploring the candidate genes within a window of a 250-kilobase window of the significant SNPs 394, 317, and 414 potential candidate genes were found to be linked with seed weight, seed oil, and protein content, respectively. As an example, SWEET2, cytochrome P450, protein phosphatase 2C, WRKY transcription factor, MYB transcription factor, basic leucine-zipper, cellulose synthase-like B protein, tonoplast intrinsic protein, UDP-D-glucuronate 4-epimerase, and galacturonosyl transferase were identified, which are known to influence seed size, weight, oil, and protein content. The study provides a base to select germplasms that can be integrated into breeding programs to optimize seed weight and improve oil and protein content simultaneously. Moreover, the study provides novel alleles that might prove effective in speeding up the breeding process. Modern gene editing techniques can target the genes found in this study and improve the traits without the need for conventional breeding. |