Location: Children's Nutrition Research CenterTitle: Identification of cell type-specific methylation signals in bulk whole genome bisulfite sequencing data
|SCOTT, C. ANTHONY - Children'S Nutrition Research Center (CNRC)|
|DURYEA, JACK - Children'S Nutrition Research Center (CNRC)|
|MACKAY, HARRY - Children'S Nutrition Research Center (CNRC)|
|BAKER, MARIA - Children'S Nutrition Research Center (CNRC)|
|LARITSKY, ELEONORA - Children'S Nutrition Research Center (CNRC)|
|GUNASEKARA, CHATHURA - Children'S Nutrition Research Center (CNRC)|
|COARFA, CRISTIAN - Baylor College Of Medicine|
|WATERLAND, ROBERT - Children'S Nutrition Research Center (CNRC)|
Submitted to: Genome Biology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 5/29/2020
Publication Date: 7/1/2020
Citation: Scott, A.C., Duryea, J.D., MacKay, H., Baker, M.S., Laritsky, E., Gunasekara, C.J., Coarfa, C., Waterland, R.A. 2020. Identification of cell type-specific methylation signals in bulk whole genome bisulfite sequencing data. Genome Biology. 21(1):156. https://doi.org/10.1186/s13059-020-02065-5.
Interpretive Summary: Epigenetics is a system for molecular marking of DNA – it tells the different cells in the body which genes to turn on or off in that cell type. A key epigenetic mechanism is methylation of cytosine nucleotides in DNA (which occurs specifically at cytosine-guanine – so-called 'CpG' – sites). Once established during development, DNA methylation can stably silence gene expression. The current gold standard method to study DNA methylation across the entire genome is called whole-genome bisulfite sequencing (WGBS). We noted that although WGBS data inherently carry molecule-specific information that can be traced back to individual cells in the tissue being studied, standard approaches for analysis of WGBS data do not capture this information. We therefore developed new open-access software called Cluster-Based analysis of CpG methylation (CluBCpG). We show that analyzing WGBS data using CluBCpG reveals DNA methylation signatures that are specific to individual cell types within tissues. This new computational tool should help investigators to achieve an improved understanding of how DNA methylation works to regulate cell type-specific epigenetic regulation.
Technical Abstract: The traditional approach to studying the epigenetic mechanism CpG methylation in tissue samples is to identify regions of concordant differential methylation spanning multiple CpG sites (differentially methylated regions). Variation limited to single or small numbers of CpGs has been assumed to reflect stochastic processes. To test this, we developed software, Cluster-Based analysis of CpG methylation (CluBCpG), and explored variation in read-level CpG methylation patterns in whole genome bisulfite sequencing data. Analysis of both human and mouse whole genome bisulfite sequencing datasets reveals read-level signatures, which are mostly orthogonal to classical differentially methylated regions, are enriched at cell type-specific enhancers and allow estimation of proportional cell composition in synthetic mixtures and improved prediction of gene expression. In tandem, we developed a machine learning algorithm, Precise Read-Level Imputation of Methylation (PReLIM), to increase coverage of existing whole genome bisulfite sequencing datasets by inputing CpG methylation states on individual sequencing reads. PReLIM both improves CluBcpG coverage and performance and enables identification of novel differentially methylated regions, which we independently validate. Our data indicate that, rather than stochastic variation, read-level CpG methylation patterns in tissue whole genome bisulfite sequencing libraries reflect cell type. Accordingly, these new computational tools should lead to an improved understanding of epigenetic regulation by DNA methylation.