Skip to main content
ARS Home » Plains Area » Houston, Texas » Children's Nutrition Research Center » Research » Publications at this Location » Publication #395860

Research Project: Metabolic and Epigenetic Regulation of Nutritional Metabolism

Location: Children's Nutrition Research Center

Title: A machine learning case-control classifier for schizophrenia based on DNA methylation in blood

item GUNASEKARA, CHATHURA - Children'S Nutrition Research Center (CNRC)
item HANNON, ELLIS - University Of Exeter
item MACKAY, HARRY - Children'S Nutrition Research Center (CNRC)
item COARFA, CRISTIAN - Baylor College Of Medicine
item MCQUILLIN, ANDREW - University College London
item ST. CLAIR, DAVID - University Of Aberdeen
item MILL, JONATHAN - University Of Exeter
item WATERLAND, ROBERT - Children'S Nutrition Research Center (CNRC)

Submitted to: Translational Psychiatry
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/22/2021
Publication Date: 8/3/2021
Citation: Gunasekara, C., Hannon, E., MacKay, H., Coarfa, C., McQuillin, A., St. Clair, D., Mill, J., Waterland, R.A. 2021. A machine learning case-control classifier for schizophrenia based on DNA methylation in blood. Translational Psychiatry. 11(1). Article 412.

Interpretive Summary: Epigenetics is a system for molecular marking of DNA – it tells the different cells in the body which genes to turn on or off in that cell type. A key epigenetic mechanism is methylation of cytosine nucleotides in DNA. Once established during development, DNA methylation can stably silence gene expression. Over the last decade we have identified human genomic regions that show interindividual differences in DNA methylation that are consistent across all tissues and cell types in the body. We refer to these as correlated regions of systemic interindividual epigenetic variation (CoRSIVs). Early embryonic establishment of DNA methylation at CoRSIVs is influenced by maternal nutrition before pregnancy. To better understand the long-term consequences of such epigenetic changes, we re-analyzed an existing data set from a genome-scale analysis of DNA methylation in blood of individuals with schizophrenia (SZ). We obtained genome-scale methylation data on 414 SZ cases and 433 matched controls and, using only data mapping to CoRSIVs, trained a machine-learning classification algorithm to identify SZ cases. We then tested the algorithm using an independent methylation data set on 353 SZ cases and 322 matched controls, and found that our model classified 303 individuals as SZ cases, with accuracy far exceeding that of a classifier based only on genetic variants. These findings do not appear to be explained by differences in cigarette smoking or medication use in SZ patients. Our results suggest that SZ has two innate dimensions of risk: one based on genetic, and the other based on systemic epigenetic variants.

Technical Abstract: Epigenetic dysregulation is thought to contribute to the etiology of schizophrenia (SZ), but the cell type-specificity of DNA methylation makes population-based epigenetic studies of SZ challenging. To train an SZ case–control classifier based on DNA methylation in blood, therefore, we focused on human genomic regions of systemic interindividual epigenetic variation (CoRSIVs), a subset of which are represented on the Illumina Human Methylation 450K (HM450) array. HM450 DNA methylation data on whole blood of 414 SZ cases and 433 non-psychiatric controls were used as training data for a classification algorithm with built-in feature selection, sparse partial least squares discriminate analysis (SPLS-DA); application of SPLS-DA to HM450 data has not been previously reported. Using the first two SPLS-DA dimensions we calculated a "risk distance" to identify individuals with the highest probability of SZ. The model was then evaluated on an independent HM450 data set on 353 SZ cases and 322 non-psychiatric controls. Our CoRSIV-based model classified 303 individuals as cases with a positive predictive value (PPV) of 80%, far surpassing the performance of a model based on polygenic risk score (PRS). Importantly, risk distance (based on CoRSIV methylation) was not associated with medication use, arguing against reverse causality. Risk distance and PRS were positively correlated (Pearson r = 0.28, P = 1.28 x 10^-12 ), and mediational analysis suggested that genetic effects on SZ are partially mediated by altered methylation at CoRSIVs. Our results indicate two innate dimensions of SZ risk: one based on genetic, and the other on systemic epigenetic variants.