Submitted to: Biomed Central (BMC) Genomics
Publication Type: Peer reviewed journal
Publication Acceptance Date: 12/21/2013
Publication Date: 1/2/2014
Citation: Hwang, E., Song, Q., Jia, G., Specht, J.E., Hyten, D.L., Costa, J., Cregan, P.B. 2014. A genome-wide association study in soybean. Biomed Central (BMC) Genomics. 15:1-12. Interpretive Summary: A Genome Wide Association Study (GWAS) is a method that analyzes sets of human, plant, animal or other living organisms with a large set of DNA markers that are spread across all the chromosomes of the species being analyzed. Data are also collected on the set of individuals for genetically controlled traits such as resistance to disease or in the case of the current study, the level of protein and oil in the seeds produced by 298 soybeans that had been collected in Asia over the past 60 years. The seed protein and oil content was measured in seeds harvested from experiments grown in both Maryland and Nebraska. DNA from each of the 298 soybeans was analyzed with 55,159 single nucleotide polymorphism (SNP) DNA markers. Using the combined seed protein and oil data and the genetic marker data, GWAS was applied to identify regions along the 20 pairs of soybean chromosomes that contain genes associated with seed protein and oil content. A total of 17 different chromosome regions were identified that were associated with seed protein content and 13 regions were associated with seed oil content. This information will be useful to scientists interested in the application of GWAS for gene discovery and the genetic markers identified that are associated with seed protein and oil content can be used by plant breeders to select soybean breeding lines with altered seed protein and oil content.
Technical Abstract: A genome-wide association study (GWAS) was performed to estimate the feasibility of identifying genes controlling the quantitative traits, seed protein and oil concentration, in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content. A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency>0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r2) rapidly declined to 0.2 within 600 Kbp, whereas the mean LD remained greater than 0.2 at 10,000 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein quantitative trait locus (QTL) has been mapped there and potential candidate genes were recently identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil. This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise marker-assisted allele selection and will expedite positional cloning of the causal gene(s).