Location: Plant Genetics Research
Title: Major soybean maturity gene haplotypes revealed by SNPViz analysis of 72 sequenced sobyean genomes Authors
Submitted to: PLoS One
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: March 14, 2014
Publication Date: April 11, 2014
Repository URL: http://handle.nal.usda.gov/10113/58986
Citation: Langewisch, T., Zhang, H., Vincent, R., Joshi, T., Xu, D., Bilyeu, K.D. 2014. Major soybean maturity gene haplotypes revealed by SNPViz analysis of 72 sequenced sobyean genomes. PLoS One. 9(4):e94150. Available: doi:10.1371/journal.pone.0094150. Interpretive Summary: The rate at which the sequencing of complete genomes is progressing has necessitated the development of new tools to fully capture and utilize the information revealed by the analysis of genome sequences. Until recently we could only compare the sequence of a limited number of genes in a small number of genomes. However, we now have access to the sequence of every gene within the genome of some crops as well as the genome sequences of several cultivars (varieties) within those crops. In many cases, the genome sequences that have been generated are made publicly available. The objective of this research was to analyze and compare four soybean genes involved in photoperiod responses (light responsive genes that determine which part of the US the plants will develop seed within the growing season) from seventy-two available and sequenced soybean lines. We developed computer software that allows the user to select a specific regions of a genome in all of the sequenced varieties and to compare them and identify sequences that change from one variety to another: gene sequence variations. The software can determine the number and relatedness of each gene type within the complete set seventy-two sequenced genomes. The results revealed the subgrouping of genes by characteristic sequence changes that we refer to as "allele haplotypes". The impact of this research is a better understanding of the genetic diversity for each gene that may enhance the ability to target soybean breeding for particular environments. In addition, the software can be used to evaluate other genes in soybean or other crops that have been targeted for genome sequencing.
Technical Abstract: In this Genomics Era, vast amounts of next generation sequencing data have become publicly-available for multiple genomes across hundreds of species. Analysis of these large-scale datasets can become cumbersome, especially when comparing nucleotide polymorphisms across many samples within a dataset and among different datasets or organisms. To facilitate the exploration of allelic haplotypes, we have developed and deployed computer software to categorize and visualize these haplotypes. The SNPViz software enables analysis of whole genome sequence SNP datasets for haplotypes of user-defined gene regions for different sequenced genomes. The examination of allelic variation and diversity of important soybean [Glycine max (L.) Merr.] flowering time and maturity genes may provide additional insight into flowering time regulation and enhance researchers’ ability to target soybean breeding for particular environments. For this study, we utilized two available soybean genomic datasets for a total of seventy-two soybean genotypes encompassing cultivars, landraces, and the wild species Glycine soja. The major soybean maturity genes E1, E2, and E3 along with the Dt1 gene for plant growth architecture were analyzed in an effort to determine the number of major haplotypes for each gene, the consistency of the allele haplotypes with characterized variant alleles, and for evidence for genetic bottlenecks or adaptive selection. The results indicated classification of a small number of predominant allele haplotypes for each gene and important insights into possible genetic bottlenecks and diversity of alleles for each gene within the context of known causative mutations. The software can be used to analyze other genes, with additional soybean datasets, or it can be used with similar genome sequence SNP datasets from other species.