Location: Immunity and Disease Prevention ResearchTitle: Nutrient estimation from 24-hour food recalls using machine learning and database mapping: a case study with lactose
|SIMMONS, GABRIEL - University Of California, Davis|
|BOUZID, YASMINE - University Of California, Davis|
|KAN, ANNIE - University Of California, Davis|
|BURNETT, DUSTIN - University Of California, Davis|
|TAGKOPOULOS, LLIAS - University Of California, Davis|
Submitted to: Nutrients
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/6/2019
Publication Date: 12/6/2019
Publication URL: https://handle.nal.usda.gov/10113/6811098
Citation: Chin, E.L., Simmons, G., Bouzid, Y.Y., Kan, A., Burnett, D.J., Tagkopoulos, L., Lemay, D.G. 2019. Nutrient estimation from 24-hour food recalls using machine learning and database mapping: a case study with lactose. Nutrients. 11(12):3045. https://doi.org/10.3390/nu11123045.
Interpretive Summary: A 24-hour dietary recall is commonly used in nutrition research to assess dietary intake in the previous 24-hour period. The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) and the Nutrition Data System for Research (NDSR) are two commonly used 24-hour dietary recall programs. NDSR uses the Nutrition Coordinating Center’s Food and Nutrient Database (NCC database). Sixty-two nutrients are shared between the ASA24 output and the NCC database, but the NCC database also includes about one hundred more nutrients than the ASA24 output such as lactose, soluble fiber, and individual amino acids. ASA24 foods can be looked up in NDSR to obtain NCC-exclusive nutrient values but this is time-consuming. In this study, we used lactose as an example to assess prediction and database matching methods to estimate lactose (an NCC-exclusive nutrient) from ASA24-reported foods. ASA24 foods were manually looked up into NDSR to estimate lactose, which was then used for method development and evaluation. For the predictive method, nine machine learning models were developed to predict the amount of lactose from the nutrients shared between the ASA24 output and the NCC database. For the database matching method, ASA24 foods were matched to NCC foods based on only the shared nutrients (“Nutrient-Only”) or nutrient and text information from the food descriptions (“Nutrient + Text”). The lactose values predicted by some of the simpler machine learning models correlated well with that of the manual lookup. Overall, the Nutrient + Text database matching returned lactose values that were the most similar to those of the manual lookup; the best NCC food matched for each ASA24 food was also more similar to the manual lookup compared to the Nutrient-Only method. Though all methods required some review of the output, these methods substantially reduce the time to attain lactose estimates compared to the status quo of manual lookup. These results suggest that computational methods can successfully be used to estimate an NCC-exclusive nutrient for foods reported in ASA24.
Technical Abstract: The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) is a free dietary recall system that outputs fewer nutrients than the Nutrition Data System for Research (NDSR). NDSR uses the Nutrition Coordinating Center (NCC) Food and Nutrient Database; both require a license. Manual lookup of ASA24 foods into NDSR is time consuming but currently the only way to acquire NCC-exclusive nutrients. Using lactose as an example, we evaluated machine learning and database matching methods to estimate this NCC-exclusive nutrient from ASA24 reports. ASA24-reported foods were manually looked up into NDSR to obtain lactose estimates and split into training (n= 378) and test (n = 189) datasets. Nine machine learning models were developed to predict lactose from the nutrients common between ASA24 and the NCC database. Database matching algorithms were developed to match NCC foods to an ASA24 food using only nutrients (“nutrient-only”) or the nutrient and food descriptions (“nutrient+text”). For both methods, the lactose predictions were compared to the manual curation. Among machine learning models, Bounded-LASSO and Bounded-Ridge performed best on held-out test data (R2 = 0.52 and 0.50, respectively). For the database matching method, nutrient+text matching yielded the best lactose estimates (R2=0.76), a vast improvement over the status quo of no estimate. These results suggest that computational methods can successfully estimate an NCC-exclusive nutrient for foods reported in the ASA24.