IngID Thesaurus
USDA IngID Thesaurus: an application dataset for systematic reporting of ingredients used in commercially packaged foods

There is a general lack of information in scientific literature on type of ingredients used in commercially packaged foods. USDA’s Global Branded Food products Database (GBFPD), as part of FoodData Central, makes publicly available ingredient lists of >0.25 million commercially packaged food products.  A review of the ingredient terms revealed the need for a thesaurus of ingredient terms used on commercially packaged food labels. We obtained ingredient lists (blocks of free text) from GBFPD of top-selling food categories, based on the variety and diversity of the type of ingredients used in their products. Ingredients that were equivalent, similar, spelling or usage variants, spelling errors or synonyms were assigned a Preferred descriptor for systematic reporting of ingredients. The first publicly available version, IngID Thesaurus Version 1 (2023) contains ~26,000 parsed ingredient terms, that have been assigned ~3,000 PDs, categorized in a taxonomic hierarchy of 16 broad groups.  IngID Thesaurus for the first time makes publicly available a tool that can potentially help reduce pre-processing and data clean-up time for the study of ingredients as listed on commercially packaged food labels. It will enable characterization of what is in the food we eat using standardized vocabulary and can potentially help improve our understanding of commercial ingredients.