|CHEN, LINFENG - Forest Service (FS)
|YANG, SHOUPING - Nanjing Agricultural University
|ARAYA, SUSAN - Oak Ridge Institute For Science And Education (ORISE)
|Quigley, Charles - Chuck
|SPECHT, JAMES - University Of Nebraska
|DIERS, BRIAN - University Of Illinois
Submitted to: Theoretical and Applied Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/25/2022
Publication Date: 3/11/2022
Citation: Chen, L., Yang, S., Araya, S., Quigley, C.V., Taliercio, E.W., Mian, R.M., Specht, J., Diers, B., Song, Q. 2022. Genetic architecture of soybean seed and meal protein and oil. Theoretical and Applied Genetics. https://doi.org/10.1007/s00122-022-04070-7.
Interpretive Summary: In modern breeding programs, germplasm is frequently required to be genotyped with a large number of molecular markers in order to track or identify genomic regions associated with traits of interest. Although sequencing approaches are cheaper than ever before, it is still costly for large scale studies. Genotype imputation is a method to infer breeding line marker genotypes using markers in a reference population, but knowledge of the best package and their imputation accuracy for self-pollinated crops like soybean is lacking. ARS-USDA scientists and collaborators in the universities compared imputation performance of three commonly-used imputation software packages in soybean populations. They demonstrated that the software could be used to estimate marker locations in soybean and that imputed dataset could significantly reduce the interval of genomic regions controlling seed quality traits, thus improving the efficiency of candidate gene identification. This is the first study to identify the best software and optimize parameters for imputation in soybean. The information will help breeders and geneticists to improve genotyping imputation accuracy not only in soybean but other self-pollinated crops. Results obtained will facilitate fine-mapping genes controlling different traits and downstream applications in soybean.
Technical Abstract: Genotype imputation is a strategy to increase marker density of existing datasets without additional genotyping. We compared imputation performance of software BEAGLE5.0, IMPUTE5 and AlphaPlantImpute and tested software parameters that may help to improve imputation accuracy in soybean populations. Several factors including marker density of individual lines, extent of linkage disequilibrium (LD), minor allele frequency (MAF) and genetic map distance vs. physical distance were examined for their effects on imputation accuracy in soybean across different software. Our results showed that AlphaPlantImpute had a higher imputation accuracy than BEAGLE5.0 or IMPUTE5 in each soybean family, especially if the study progeny were genotyped with an extremely low number of markers. The results also showed that LD extent, MAF and reference panel size were positively correlated with imputation accuracy, a minimum number of 50 markers per chromosome and MAF of SNPs greater than 0.2 in soybean line are required to avoid a significant loss of imputation accuracy. Using the software, we imputed 5176 soybean recombinant inbred line genotypes in the soybean nested mapping population with the genotypes of the 40 parents which were sequenced and beadchip assayed with high-density markers. The dataset containing 423,419SNP markers for 5176 RILs and 40 parents was deposited at the Soybase for public access. The imputed NAM dataset was further examined for the improvement of mapping quantitative trait loci (QTL) controlling soybean seed protein content in linkage mapping analysis. Most of the QTL identified were at identical or at similar position based on initial and imputed datasets, however, QTL interval was greatly narrowed. The resulting high-quality genotypic dataset of soybean NAM population will facilitate QTL mapping of soybean traits and downstream applications. The information will also help to improve genotyping imputation accuracy in soybean populations.