1a.Objectives (from AD-416):
1. Use state of the art genomic tools to reveal the key genetic changes behind seventy years of soybean improvement.
2. Evaluate key soybean genotypes developed over seventy years using modern agronomic practices.
3. Reveal the changes in patterns of gene expression in soybean with an emphasis on yield and stress.
4. Reveal patterns of methylation and histone modification in the soybean genome.
5. Identify and evaluate transcription factors and small RNAs in the soybean transcriptome.
1b.Approach (from AD-416):
State of the art genomic tools will be used to improve breeding strategies for soybean improvement. Next-generation sequencing technologies will be used to 're-sequence' the genome of landrace ancestors contributing to soybean germplasm, milestone cultivars representing 70 years of incremental increases in genetic yield potential, and 40 parents used in development of a Nested Association Mapping (NAM) population. The genomic sequence will be overlaid onto the whole-genome sequence of Williams 82. Chromosomal segments and allele combinations will be identified that have been selected for over decades of breeding. These breeder 'signatures' will tell us what we did that was 'right' and what we changed in the genome to achieve yield improvement. Selected NAM parents and progeny extremes (high yield vs. low yield) will be selected and changes in transcriptomes will be evaluated in an attempt to identify metabolic pathways contributing to 'yield'. The same lines will be evaluated for epigenetic changes in expression by mapping their methylome. Data will be analyzed and entered into SoyBase for public distribution. Personnel on the project will coordinate with the Department of Energy-Joint Genome Initiative in the development of an in-depth gene expression atlas.
We now have sequencing data on 39 soybean milestone cultivars. Coverage of these lines ranged from 7 times to 42 times. We have developed a bioinformatic pipeline that allows us to align sequences to the reference genome and perform downstream bioinformatic and statistical analyses. The pipeline has been automated to allow us to treat future incoming data in the exact same manner, allowing direct comparisons of different data sets. The aligned sequence data is being used to identify single nucleotide polymorphisms (SNPs) that can be used in various analyses. A major focus has been to examine the effects of different filters used to “clean-up” the data. As proof of concept for how the data can be used, we used the SNP data generated from >20 lines to confirm pedigree information, make phylogenetic comparisons between lines and conduct a genome-wide association study using yield data. These proof of concept studies confirm that the data generated by the project is of sufficient quality for the intended studies.