Location: Methods and Application of Food Composition Laboratory
Title: IngID: a framework for parsing and systematic reporting of ingredients used in commercially packaged foodsAuthor
Ahuja, Jaspreet | |
LI, YING - University Of Maryland | |
BAHADUR, RAHUL - University Of Maryland | |
NGUYEN, QUYNHANH - University Of Maryland | |
HAILE, ERMIAS - University Of Maryland | |
Pehrsson, Pamela |
Submitted to: Journal of Food Composition and Analysis
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 3/31/2021 Publication Date: 4/1/2021 Publication URL: https://handle.nal.usda.gov/10113/7331915 Citation: Ahuja, J.K., Li, Y., Bahadur, R., Nguyen, Q., Haile, E., Pehrsson, P.R. 2021. IngID: a framework for parsing and systematic reporting of ingredients used in commercially packaged foods. Journal of Food Composition and Analysis. 100. http://doi.org/10.1016/j.jfca.2021.103920. DOI: https://doi.org/10.1016/j.jfca.2021.103920 Interpretive Summary: Commercially packaged foods are an integral part of the US diet, however, there is lack of information in scientific literature on type of ingredients used in packaged foods. This paper reports on the development of a framework for parsing and systematic reporting of ingredients used in commercially packaged foods (IngID) in the US, including the complexity and challenges of current ingredient lists, using baked products to illustrate. The major steps in the development of IngID include: identifying top-selling foods; obtaining their ingredient lists; parsing individual ingredients after several preprocessing steps; building a thesaurus by assigning a preferred descriptor (PD) to equivalent terms such as synonyms and spelling errors; and assigning broader terms such as flour, sweeteners based on the research question. The current version includes 3 main files - an input Food details file, an output file of parsed text strings, and a thesaurus of 6,533 parsed ingredients. IngID can help improve our understanding of commercial ingredients, characterizing foods in dimensions other than the traditional nutrient profiles, and in development of automated systems and tools, and will be useful for nutritionists, food scientists, food ontologists and computer programmers. Technical Abstract: There is lack of information in the scientific literature on types of ingredients used in packaged foods. USDA’s Global Branded Food products Database for the first time makes publicly available a compiled dataset of ingredient lists for over a quarter million commercial food products. This paper reports on the development of a framework for parsing and systematic reporting of ingredients used in commercially packaged foods (IngID) in the US and delineates the complexity and challenges of current ingredient lists, using baked products to illustrate. The major steps in the development of IngID prototype were 1) identifying top-selling baked products, 2) obtaining their ingredient lists, 3) parsing individual ingredients after several pre-processing steps as ingredient lists were inconsistent and varied, 4) building a thesaurus by assigning a preferred descriptor to equivalent terms such as synonyms and spelling errors, and 5) assigning broader terms such as flour, sweeteners. The current version of IngID includes 3 main files - an input Food details file, an output file of parsed text strings, and a thesaurus of 6,533 parsed ingredients. These tools can potentially help improve our understanding of commercial ingredients, characterize foods in dimensions other than the traditional nutrient profiles, and in development of food ontology, computer programs and artificial intelligence tools. |