Genome-Wide Discovery of the Genes/locus Determining the Oil Composition and Oil "functional" Markers by Exploring Soybean Genetic Diversity
Plant Genetics Research
2012 Annual Report
1a.Objectives (from AD-416):
1. Determine DNA sequences and expression levels of genes expressed in soybean seeds from fourteen genotypes.
2. Discover variations of gene sequences and expression levels in soybean seeds of the fourteen genotypes.
3. Discover a set of soybean oil "functional" markers for the fourteen genotypes.
1b.Approach (from AD-416):
Overall objective of the collaborative research proposal is to identify oil and meal traits and genes that influence those traits to improve the quality and value of US Soybean in the target area of composition. The proposed research will use genomics and bioinformatics approaches to predict the gene important for oil storage composition and identify gene markers to facilitate oil gene discovery. The proposed research will select fourteen genotypes containing diverse levels of oleic acid, linolenic acid and stearic acid levels, and determine their seed storage lipid profiles. In addition, deep sequencing technology will be used to determine sequences and accumulation levels of genes expressed in seeds from the fourteen genotypes and to identify their sequence and/or accumulation level variations among the genotypes. Association and a variety of data mining approaches will be used to predict the genes potentially regulating and participating in seed storage oil production. The sequence and expression variations that occur in those oil related genes will be further developed as a set of oil "functional" markers.
The project is designed to apply next generation sequence technologies to determine gene sequence and expression patterns in developing soybean seeds that lead to variation in seed oil composition and content in major soybean lines. The project shares the same objectives and goals as its parent project, which aims to discover genes important in seed oil quality traits and develop new germplasm with superior seed quality traits.
To date, we constructed cDNA libraries from seeds at a mid-maturation stage for 9 genotypes. The libraries were sequenced using Illumina Hi-Seq technology. An average of 35 million sequence reads was generated for each library. We aligned the sequence reads to the soybean genome and annotated the derived transcript sequences. Analyses showed that an average of 30 million sequencing reads can be mapped to annotated genes within the soybean genome. We detected an average of 30,167 transcript variations that were expressed in the examined seeds for each line. A total of 4337 genes were differentially expressed among the nine germplasm. M23 is a previously characterized mutant line with a mid-oleic acid phenotype and has a deletion of a 160 kb genome segment encoding an important fatty acid desaturase (FAD) that affects oil composition. The transcriptome-sequencing strategy could be applicable for screening fast neutron mutants that the soybean community has developed. Additional bioinformatics tools and experiments have been initiated to further identify and validate other genetic variations in the 9 genotypes.