Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BHNRC) » Beltsville Human Nutrition Research Center » Methods and Application of Food Composition Laboratory » Research » Publications at this Location » Publication #381128

Research Project: USDA National Nutrient Databank for Food Composition

Location: Methods and Application of Food Composition Laboratory

Title: IngID: a framework for parsing and systematic reporting of ingredients used in commercially packaged foods

Author
item Ahuja, Jaspreet
item LI, YING - University Of Maryland
item BAHADUR, RAHUL - University Of Maryland
item NGUYEN, QUYNHANH - University Of Maryland
item HAILE, ERMIAS - University Of Maryland
item Pehrsson, Pamela

Submitted to: Journal of Food Composition and Analysis
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/31/2021
Publication Date: 4/1/2021
Citation: Ahuja, J.K., Li, Y., Bahadur, R., Nguyen, Q., Haile, E., Pehrsson, P.R. 2021. IngID: a framework for parsing and systematic reporting of ingredients used in commercially packaged foods. Journal of Food Composition and Analysis. 100. http://doi.org/10.1016/j.jfca.2021.103920.
DOI: https://doi.org/10.1016/j.jfca.2021.103920

Interpretive Summary: Commercially packaged foods are an integral part of the US diet, however, there is lack of information in scientific literature on type of ingredients used in packaged foods. This paper reports on the development of a framework for parsing and systematic reporting of ingredients used in commercially packaged foods (IngID) in the US, including the complexity and challenges of current ingredient lists, using baked products to illustrate. The major steps in the development of IngID include: identifying top-selling foods; obtaining their ingredient lists; parsing individual ingredients after several preprocessing steps; building a thesaurus by assigning a preferred descriptor (PD) to equivalent terms such as synonyms and spelling errors; and assigning broader terms such as flour, sweeteners based on the research question. The current version includes 3 main files - an input Food details file, an output file of parsed text strings, and a thesaurus of 6,533 parsed ingredients. IngID can help improve our understanding of commercial ingredients, characterizing foods in dimensions other than the traditional nutrient profiles, and in development of automated systems and tools, and will be useful for nutritionists, food scientists, food ontologists and computer programmers.

Technical Abstract: There is lack of information in the scientific literature on types of ingredients used in packaged foods. USDA’s Global Branded Food products Database for the first time makes publicly available a compiled dataset of ingredient lists for over a quarter million commercial food products. This paper reports on the development of a framework for parsing and systematic reporting of ingredients used in commercially packaged foods (IngID) in the US and delineates the complexity and challenges of current ingredient lists, using baked products to illustrate. The major steps in the development of IngID prototype were 1) identifying top-selling baked products, 2) obtaining their ingredient lists, 3) parsing individual ingredients after several pre-processing steps as ingredient lists were inconsistent and varied, 4) building a thesaurus by assigning a preferred descriptor to equivalent terms such as synonyms and spelling errors, and 5) assigning broader terms such as flour, sweeteners. The current version of IngID includes 3 main files - an input Food details file, an output file of parsed text strings, and a thesaurus of 6,533 parsed ingredients. These tools can potentially help improve our understanding of commercial ingredients, characterize foods in dimensions other than the traditional nutrient profiles, and in development of food ontology, computer programs and artificial intelligence tools.