Author
Gillman, Jason | |
KIM, WON-SEOK - University Of Missouri | |
SONG, BO - Northeast Agricultural University | |
Oehrle, Nathan | |
TAWARI, NILESH - Genome Institute Of Singapore | |
LIU, SHANSHAN - Northeast Agricultural University | |
Krishnan, Hari |
Submitted to: G3, Genes/Genomes/Genetics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 5/16/2017 Publication Date: 6/6/2017 Citation: Gillman, J.D., Kim, W., Song, B., Oehrle, N.W., Tawari, N.R., Liu, S., Krishnan, H.B. 2017. Whole-genome resequencing identifies the molecular genetic cause for the absence of a Gy5 glycinin protein in soybean PI 603408. G3, Genes/Genomes/Genetics. 7(7):2345-2352. doi:10.1534/g3.117.039347. Interpretive Summary: Soybean seeds are a major source of edible oil and protein, but are among the eight most significant allergenic foods for humans. Several seed storage proteins in soybean have all been shown to be allergenic and are present in all high-yielding soybeans. In an effort to identify soybean lines that lack the major allergens, we screened a soybean germplasm collection and identified a line that is missing one of the allergenic proteins. Genomic resequencing allowed us to identify the causative molecular genetic basis for the absence of this allergen. Additionally, a new computational method was developed to predict the impact of gene changes in soybean seed composition. This new information was used to develop an experimental soybean line whose seeds lack three of the major soybean allergens. The results of this study will be useful to soybean breeders for the development of high-yielding soybean lines with less allergenic proteins. Technical Abstract: During ongoing proteomic analysis of the soybean (Glycine max (L.) Merr) germplasm collection, PI 603408 was identified as a landrace whose seeds lack accumulation of one of the major seed storage glycinin protein subunits. Whole genomic resequencing was used to identify a two-base deletion affecting glycinin 5. We confirmed that the newly discovered deletion was causative through immunological, genetic and proteomic analysis, and determined that there were no significant differences in total seed protein content due to the glycinin 5 loss-of-function mutation. In addition to focused studies on this one specific glycinin subunit-encoding gene, we identified a total of 1,858,185 nucleotide variants, of which 39,344 were predicted to affect protein coding regions. In order to semi-automate analysis of a large number of soybean gene variants, we developed a new SIFT 4G database designed to predict the impact of non-synonymous single nucleotide soybean gene variants, which may enable more rapid analysis of soybean resequencing data in the future. |