Location: Plant Genetics Research
Project Number: 5070-21000-037-03-R
Project Type: Reimbursable Cooperative Agreement
Start Date: Apr 1, 2014
End Date: Dec 31, 2016
The following specific objectives will be achieved in the project. 1. Sequence RNA populations (transcriptomes) in seeds for 100 germplasm lines important to US soybean breeding and agriculture that vary in seed oil content and composition. 2. Discover genes specifically regulating oil quality traits and their gene variations that lead to the difference in oil content and composition among those germplasm using a variety of data analysis and mining strategies, and further develop and validate a set of highly effective functional markers. 3. Develop a relational database and G-browser for soybean community to access and utilize the sequencing data, genes, functional markers and networks. 4. Develop transgenic plants with altered activity for two identified genes specifically controlling oil content.
The difference in seed oil composition and content among soybean germplasm is caused by variation in protein coding sequences and/or expression levels of the genes involved in oil synthesis and storage. To identify those oil genes and their variations, we had proposed that we would use next generation sequencing technology to determine transcript sequences and accumulation levels of all genes expressed in seeds, and develop a bioinformatic pipeline to identify transcript sequence variations that lead to seed quality variation among the germplasm. Although the soybean genome is large, transcribed sequences only account for less than 5% of its entire genome. In comparison with whole genome sequencing approach, the transcriptome sequencing approach does not only dramatically reduces the cost of determining both protein coding sequences and expression levels of seed genes in soybean germplasm, also significantly reduces the background noise from non-functional genome sequence to identify the genes and gene variants for oil quality traits. Low cost of sequencing for each germplasm makes it feasible to sequence large number of germplasm to identify the genes and gene variants using genome-wide association studies.