Location: Immunity and Disease Prevention ResearchTitle: Machine learning identifies stool pH as predictor of bone mineral density in healthy multiethnic US adults
|VAN LOAN, MARTA - University Of California, Davis|
|BONNEL, ELLEN - University Of California, Davis|
Submitted to: Journal of Nutrition
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/21/2021
Publication Date: 9/6/2021
Citation: Chin, E.L., Van Loan, M., Spearman, S., Bonnel, E.L., Laugero, K.D., Stephensen, C.B., Lemay, D.G. 2021. Machine learning identifies stool pH as predictor of bone mineral density in healthy multiethnic US adults. Journal of Nutrition. 151(11):3379-3390. https://doi.org/10.1093/jn/nxab266.
Interpretive Summary: Bone mineral content (BMC) and bone mineral density (BMD) are related to bone health and can be used to determine risk of osteoporosis. A variety of directly-modifiable and not-directly-modifiable variables have been shown to influence BMC or BMD. However, previous studies are usually limited to analysis of variables of interest or to a specific subset of the population. Machine learning models can be used to find complex patterns in the data and can be used to identify variables that are important for making predictions. In this study, we used several machine learning models to predict whole body, femoral neck, and spine BMC and BMD in a healthy men and women using not-directly-modifiable and directly-modifiable variables. Not-directly-modifiable variables include variables that cannot be altered without direct intervention such as anthropometric, physiological, and demographic measurements. Directly-modifiable variables are variables or biomarkers for which there is a recommendation for in the Dietary Guidelines for Americans and include nutrient and food group dietary data, serum 25(OH)D (a biomarker for sunlight exposure), and stool pH (a biomarker for fermentable fiber consumption). Machine learning models had better performance (predictive value) than linear regression when using only the directly-modifiable variables, but had similar performance to linear regression when using not-directly modifiable variables. This indicates that directly-modifiable variables are relatively weaker predictors compared to not-directly-modifiable variables, and sophisticated machine learning models are useful for predicting BMC/BMD when using directly-modifiable variables. Specifically, body mass index, body fat percent, height, and menstruation history were consistent not-directly-modifiable predictors of BMC and BMD. For the directly modifiable features, betaine, cholesterol, hydroxyproline, menaquinone-4, dihydrophylloquinone, eggs, cheese, cured meat, refined grains, fruit juice, and alcohol consumption were predictors of BMC and BMD. Low stool pH was also predictive of higher whole-body, femoral neck, and spine BMC and BMD. These results show the utility of machine learning to find previously unforeseen features that predict bone health. Specifically, we show that stool pH may be a useful predictor of BMC/BMD and future studies to investigate this relationship are warranted.
Technical Abstract: Objective: Previous studies of bone health have been largely limited to analyses focused on a few a priori variables. The objective of this study was to use dietary, physiological, and lifestyle data to identify directly modifiable and non-modifiable variables predictive of bone mineral content (BMC) and bone mineral density (BMD) in healthy US men and women using machine learning models. Methods: Ridge, lasso, elastic net, and random forest models were used to predict whole-body, femoral neck, and spine BMC and BMD in healthy US adults (n = 313) using non-modifiable anthropometric, physiological, and demographic variables, directly modifiable lifestyle (physical activity, tobacco use) and dietary (nutrient or food groups intake via food frequency questionnaire) variables, and variables approximating directly modifiable behavior (circulating vitamin D and stool pH). Results: Machine learning models using non-modifiable variables explained more variation in BMC and BMD (highest R2 = 0.750) compared to when using only directly modifiable variables (highest R2 = 0.107). Machine learning models had better performance compared to multivariate linear regression, which had lower predictive value (highest R2 = 0.063) when using directly modifiable variables only. BMI, body fat percent, height, and menstruation history were predictors of BMC and BMD. For the directly modifiable features, betaine, cholesterol, hydroxyproline, menaquinone-4, dihydrophylloquinone, eggs, cheese, cured meat, refined grains, fruit juice, and alcohol consumption were predictors of BMC and BMD. Low stool pH, a proxy for fermentable fiber intake, was also predictive of higher BMC and BMD. Conclusion: Modifiable factors, such as diet, explained less variation in the data compared to non-modifiable factors, such as age, sex, and ethnicity. Low stool pH was associated with higher BMC and BMD in a healthy US population.