Location: Jean Mayer Human Nutrition Research Center On Aging
Title: Using machine learning to predict obesity based on genome-wide, epigenome-wide gene-gene and gene-diet interactionsAuthor
Lee, Yu Chi | |
CHRISTENSEN, JACOB - University Of Oslo | |
PARNELL, LAURENCE | |
SMITH, CAREN - Jean Mayer Human Nutrition Research Center On Aging At Tufts University | |
SHAO, JONATHAN | |
MCKEOWN, NICOLA - Jean Mayer Human Nutrition Research Center On Aging At Tufts University | |
ORDOVAS, JOSE - Jean Mayer Human Nutrition Research Center On Aging At Tufts University | |
LAI, CHAO QIANG |
Submitted to: Frontiers in Genetics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 11/29/2021 Publication Date: 1/3/2022 Citation: Lee, Y., Christensen, J.J., Parnell, L.D., Smith, C.E., Shao, J.Y., McKeown, N.M., Ordovas, J.M., Lai, C. 2022. Using machine learning to predict obesity based on genome-wide, epigenome-wide gene-gene and gene-diet interactions. Frontiers in Genetics. 12:783845. https://doi.org/10.3389/fgene.2021.783845. DOI: https://doi.org/10.3389/fgene.2021.783845 Interpretive Summary: The incidence of obesity is progressively rising worldwide and reaching epidemic proportions in some countries; therefore, predicting an individual’s risk of becoming obese and the likelihood of developing chronic diseases become important objectives in Public Health and Precision Nutrition/Medicine. Towards achieving such a goal, we have developed machine learning-based computational approaches utilizing a combination of large datasets containing genomic, epigenomic, and dietary data and considering their complex interactions. We built a machine learning model that showed 72% accuracy in predicting overweight and obesity status through an iterative process of training and testing the computational model. This result is on par with other approaches. Still, it is notable in using exclusively genomic, epigenomic, and dietary information and not relying on anthropometric parameters and clinical conditions. Besides identifying critical genetic and epigenetic elements that influence overweight and obesity, we identified several highly informative nutritional factors. Without directionality of the effect, the list includes processed meat, diet soda, fried potatoes, high-fat dairy, artificial sweeteners, alcohol intake, and specific nutrients and food components, such as calcium and flavonols. The strategy employed here chiefly shows that it is feasible to build an accurate predictive algorithm for obesity status using genetic, epigenetic, and dietary data. This represents an important step towards the development of Precision Nutrition. Technical Abstract: Obesity is associated with multiple chronic diseases that hamper healthy aging and is defined by genetic, epigenetic, environmental factors and their complex interactions. This study aimed to better characterize these relations and interactions, focusing on diet-related factors. For this purpose, we conducted a combined genome-wide and epigenome-wide scan for body mass index (BMI) and up to three-way interactions among 402,793 single nucleotide polymorphisms (SNPs), 415,202 DNA methylation sites (DMS), and 397 dietary and lifestyle factors using the Generalized Multifactor Dimensionality Reduction (GMDR) method. The training set consisted of 1,573 participants in exam 8 of the Framingham Offspring Study (FOS) cohort. After identifying genetic, epigenetic and dietary factors that passed statistical significance thresholds, we applied machine learning (ML) algorithms to predict participants’ obesity status in the testing set, taken as a subset of independent samples (n = 394) from the same cohort. The quality of prediction models was evaluated using Area Under the Receiver Operating Characteristic Curve (ROC-AUC) and accuracy. The GMDR method identified 213 SNPs, 530 DMS, and 49 dietary and lifestyle factors as significant predictors of obesity. Comparing several ML algorithms, we found that the stochastic gradient boosting model provided the best prediction accuracy for obesity in the training set and overall accuracy of 72% and ROC-AUC of 0.70 in test set samples. Top predictors of the best-fit model were 21 SNPs, 230 DMS in genes such as CPT1A, ABCG1, SLC7A11, RNF145, and SREBF1, and 26 diet-related factors, including processed meat, diet soda, french fries, high-fat dairy, artificial sweeteners, alcohol intake and specific nutrients and food components such as calcium and flavonols. Further studies will be needed to define the roles of these top predictors. In conclusion, we developed an integrated approach with ML to predict obesity using omics and dietary data. This extends our knowledge of the drivers of obesity. Such knowledge can inform precision nutrition strategies for the prevention and treatment of obesity. |