Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BHNRC) » Beltsville Human Nutrition Research Center » Methods and Application of Food Composition Laboratory » Research » Publications at this Location » Publication #391003

Research Project: Advanced Technology for Rapid Comprehensive Analysis of the Chemical Components

Location: Methods and Application of Food Composition Laboratory

Title: Deep learning accurately predicts food categories and nutrients based on ingredient statements

Author
item MA, PEIHUA - University Of Maryland
item WANG, QIN - University Of Maryland
item YU, NING - University Of Maryland
item LI, YING - University Of Maryland
item ZHANG, ZHIKUN - Helmholtz Centre
item SHENG, JIPING - Renmin University Of China
item Ahuja, Jaspreet
item MCGINTY, HANDE - University Of Maryland

Submitted to: Food Chemistry
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 5/16/2022
Publication Date: 5/19/2022
Citation: Ma, P., Wang, Q., Yu, N., Li, Y., Zhang, Z., Sheng, J., Ahuja, J.K., Mcginty, H. 2022. Deep learning accurately predicts food categories and nutrients based on ingredient statements. Food Chemistry 2022, 133243. https://doi.org/10.1016/j.foodchem.2022.133243.
DOI: https://doi.org/10.1016/j.foodchem.2022.133243

Interpretive Summary: Determining attributes such as taxonomy and nutrients for foods can be a challenging and resource-intensive task, albeit important for better understanding of foods. In this study, a novel strategy has been developed to predict food categories and nutrient values based on the ingredient statement from USDA Branded Food Products Database, using deep learning models. The Multi-layer Perceptron (MLP) method (ingredient encoding using term frequency-inverse document frequency and dataset rebalancing with synthetic minority oversampling technique-edited nearest neighbors) obtained the highest learning efficiency for AI food natural language processing tasks, which achieved up to 99% accuracy for food classification and 0.98 for R2 for calcium estimation. The deep learning approach has great potential to be embedded in other food classification and regression tasks and as an extension to other applications in the food and nutrient scope. The automation of these resource-intensive tasks can help with precision nutrition and better understanding of dietary intakes, and can be useful for food composition database managers, dietitians, and epidemiologists.

Technical Abstract: Determining attributes such as taxonomy and nutrients for foods can be a challenging and resource-intensive task, albeit important for better understanding of foods. In this study, a novel strategy has been developed to predict food categories and nutrient values based on the ingredient statement using deep learning models. A novel dataset, 134k BFPD, was collected from USDA BFPD (USDA Branded Food Products Database) with modification and labeled with three food taxonomy and nutrient values and became an artificial intelligence (AI) dataset that covered the largest food types to date. The deep learning strategy encompassed parsing ingredients of high-frequency ingredient encoding using term frequency-inverse document frequency (TF-IDF, TF in short) and dataset rebalancing with synthetic minority oversampling technique-edited nearest neighbors (SMOTEENN, SE in short). Overall, the Multi-layer perceptron (MLP)-TF-SE method obtained the highest learning efficiency for AI food natural language processing tasks, which achieved up to 99% accuracy for food classification and 0.98 for R2 for calcium estimation (0.79~ 0.97 for calories, protein, sodium, total carbohydrate, total lipids, etc). The deep learning approach has great potential to be embedded in other food classification and regression tasks and as an extension to other applications in the food and nutrient scope.