Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Soybean Genomics & Improvement Laboratory » Research » Publications at this Location » Publication #389383

Research Project: Characterization of Genetic Diversity in Soybean and Common Bean, and Its Application toward Improving Crop Traits and Sustainable Production

Location: Soybean Genomics & Improvement Laboratory

Title: Genotype imputation for soybean nested association mapping population to improve precision of QTL detection

item CHEN, LINFENG - Forest Service (FS)
item YANG, SHOUPING - Nanjing Agricultural University
item ARAYA, SUSAN - Oak Ridge Institute For Science And Education (ORISE)
item Quigley, Charles - Chuck
item Taliercio, Earl
item Mian, Rouf
item SPECHT, JAMES - University Of Nebraska
item DIERS, BRIAN - University Of Illinois
item Song, Qijian

Submitted to: Theoretical and Applied Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/25/2022
Publication Date: 3/11/2022
Citation: Chen, L., Yang, S., Araya, S., Quigley, C.V., Taliercio, E.W., Mian, R.M., Specht, J., Diers, B., Song, Q. 2022. Genotype imputation for soybean nested association mapping population to improve precision of QTL detection. Theoretical and Applied Genetics. 135(5), pp.1797-1810.

Interpretive Summary: In modern breeding programs, germplasm is frequently required to be genotyped with a large number of molecular markers in order to track or identify genomic regions associated with traits of interest. Although sequencing approaches are cheaper than ever before, it is still costly for large scale studies. Genotype imputation is a method to infer breeding line marker genotypes using markers in a reference population, but knowledge of the best package and their imputation accuracy for self-pollinated crops like soybean is lacking. ARS-USDA scientists and collaborators in the universities compared imputation performance of three commonly-used imputation software packages in soybean populations. They demonstrated that the software could be used to estimate marker locations in soybean and that imputed dataset could significantly reduce the interval of genomic regions controlling seed quality traits, thus improving the efficiency of candidate gene identification. This is the first study to identify the best software and optimize parameters for imputation in soybean. The information will help breeders and geneticists to improve genotyping imputation accuracy not only in soybean but other self-pollinated crops. Results obtained will facilitate fine-mapping genes controlling different traits and downstream applications in soybean.

Technical Abstract: Genotype imputation is a strategy to increase marker density of existing datasets without additional genotyping. We compared imputation performance of software BEAGLE5.0, IMPUTE5 and AlphaPlantImpute and tested software parameters that may help to improve imputation accuracy in soybean populations. Several factors including marker density of individual lines, extent of linkage disequilibrium (LD), minor allele frequency (MAF) and genetic map distance vs. physical distance were examined for their effects on imputation accuracy in soybean across different software. Our results showed that AlphaPlantImpute had a higher imputation accuracy than BEAGLE5.0 or IMPUTE5 in each soybean family, especially if the study progeny were genotyped with an extremely low number of markers. The results also showed that LD extent, MAF and reference panel size were positively correlated with imputation accuracy, a minimum number of 50 markers per chromosome and MAF of SNPs greater than 0.2 in soybean line are required to avoid a significant loss of imputation accuracy. Using the software, we imputed 5176 soybean recombinant inbred line genotypes in the soybean nested mapping population with the genotypes of the 40 parents which were sequenced and beadchip assayed with high-density markers. The dataset containing 423,419SNP markers for 5176 RILs and 40 parents was deposited at the Soybase for public access. The imputed NAM dataset was further examined for the improvement of mapping quantitative trait loci (QTL) controlling soybean seed protein content in linkage mapping analysis. Most of the QTL identified were at identical or at similar position based on initial and imputed datasets, however, QTL interval was greatly narrowed. The resulting high-quality genotypic dataset of soybean NAM population will facilitate QTL mapping of soybean traits and downstream applications. The information will also help to improve genotyping imputation accuracy in soybean populations.