2012 Annual Report
1a.Objectives (from AD-416):
1. Epigenetic analysis in the form of methylation data and histone modification data will be collected on one subset of a NAM population. This will be the same population from which RNA-Seq gene expression data will be collected; 2. Loci identified for which epigenetic changes appear to be correlated to agronomic traits will be verified in other NAM populations.
1b.Approach (from AD-416):
Seventy lines will be sequenced to 20x coverage for the determination of methyl cytosines. DNA from the 70 lines will be treated using sodium bisulfate conversion kits that will then be sequenced using an Illumina HiSeq, 1 line per sample (70 lines of sequencing). This will be done in two replicates to reduce sample variation and provide more confidence in ‘epialleles’ (the epigenetic variants of the same gene). Seventy lines will be sequenced to 5x using ChIP pulldowns for H3K9-methylation. Commercial antibodies for H3K9 methylation will be used to pull down DNA associated with this epigenetic mark. DNA will be sequenced to 5x per accession, four accessions per lane. Seventy lines sequenced to 5x using ChIP pulldowns for H3K27-methylation. Commercial antibodies for H3K27 methylation will be used to pull down DNA associated with this epigenetic mark. DNA will be sequenced to 5x per accession, four accessions per lane. Informatic analysis of data to identify the epialleles correlated with agronomic traits. Collaboration with University of Delaware will allow the public access of all the data generated. In addition, the informaticists and postdoc on this project will analyze the data to find epigenetic marks, epialleles, that appear to be correlated with specific agronomic traits as identified by ARS. We will confirm the association of putative epialleles with agronomic traits. The putative epialleles are not really useful until they are validated and the NAM populations provide a unique opportunity for validation. Validation will be done using polymerase chain reaction (PCR) to amplify the putative alleles. We will then sequence those amplicons in an indexed format. We will first focus on replicates of the initial population we targeted, then expand to other NAM subpopulations to determine how robust these epialleles are.
The first steps of a robust protocol for methylome sequencing have been implemented. Using Williams 82, several different protocols were tested and compared to refine our protocol. We now have a consistent DNA library preparation for whole genome bisulfite sequencing that produces consistent results. The Williams 82 DNA library was test sequenced using an Illumina MiSeq System, and a 151bp paired-end sequencing generated 9271554 reads and ~ 1.4Gbp of methylome sequence. Bisulfite conversion rate were calculated and determined as more than 99.8 percent showing that nearly all of unmethylated Cytosines were converted Thymines. Also, only 0.062 percent of the reads were reported as ‘duplicated alignment’ showing that this library is very robust. Other methylome libraries were also constructed by the same method and are being sequenced on an Illumina HiSeq2000. To establish the pipeline for data analysis, sequence data from four test libraries generated from different tissues of soybean (leaf, root, root hair and seed) was utilized. After quality filtering of the reads, alignment to soybean genome was performed to assign methylation calls. Among the three DNA methylation contexts, CpG (45-47 percent) methylation was found to be most prevalent, followed by CHG (35-38 percent) and CHH (17-19 percent) in all the samples analyzed, which is similar to that reported in other plants. Further, methylated cytosines were equally distributed on both strands of DNA. The establishment of further analysis pipeline is still in progress and will be undertaken as soon as the sequencing data with high read depth is obtained from few tissue samples.