Location: Plant Genetics Research
Project Number: 5070-21000-037-03
Start Date: Apr 01, 2014
End Date: Mar 31, 2016
The difference in seed oil composition and content among soybean germplasm is caused by variation in protein coding sequences and/or expression levels of the genes involved in oil synthesis and storage. To identify those oil genes and their variations, we had proposed that we would use next generation sequencing technology to determine transcript sequences and accumulation levels of all genes expressed in seeds, and develop a bioinformatic pipeline to identify transcript sequence variations that lead to seed quality variation among the germplasm. Although the soybean genome is large, transcribed sequences only account for less than 5% of its entire genome. In comparison with whole genome sequencing approach, the transcriptome sequencing approach does not only dramatically reduces the cost of determining both protein coding sequences and expression levels of seed genes in soybean germplasm, also significantly reduces the background noise from non-functional genome sequence to identify the genes and gene variants for oil quality traits. Low cost of sequencing for each germplasm makes it feasible to sequence large number of germplasm to identify the genes and gene variants using genome-wide association studies.