Publication : USDA ARS

ARS Home » Northeast Area » Beltsville, Maryland (BHNRC) » Beltsville Human Nutrition Research Center » Nutrient Data Laboratory » Research » Publications at this Location » Publication #224357

Title: Methods of Imputation used in the USDA National Nutrient Database for Standard Reference

Author

	Gebhardt, Susan
	Thomas, Robin

Submitted to: National Nutrient Databank Conference
Publication Type: Abstract Only
Publication Acceptance Date: 3/6/2008
Publication Date: 5/12/2008
Citation: Gebhardt, S.E., Thomas, R.G. 2008. Methods of imputation used in the USDA National Nutrient Database for Standard Reference. 32nd National Nutrient Data Bank Conference, May 12-14, 2008, Ottawa, Ontario, Canada.

Interpretive Summary:

Technical Abstract: Objective: To present the predominate methods of imputing used to estimate nutrient values for foods in the USDA National Nutrient Database for Standard Reference (SR20). Materials and Methods: The USDA Nutrient Data Laboratory developed standard methods for imputing nutrient values for foods where analytical data were not available. Beginning with SR14, a field for derivation codes was included in the Nutrient Data File. There are 54 derivation codes. Derivation Code A indicates analytical data, whereas most codes are used to identify imputation methods. As data for more foods are processed through the new Nutrient Data Bank System this field is being populated. Currently about 60% of the nutrient values in SR20 have derivation codes. This field was queried to determine the most commonly used imputing methods for different types of foods and nutrients. Results: There are about 200,000 nutrient values in SR20 that have data derivation codes indicating that the value is calculated (not analytical). About 20% of these are derivation code Z, meaning an assumed zero. Code Z is used for nutrients such as retinol and cholesterol that do not occur naturally in plant foods. About 17% are BF codes meaning the value is based on analytical data for a similar food. These procedures are mainly used for commodity foods such as fruits, vegetables and grains. About 16% have FL codes indicating calculations based on the use of a formulation. Formulations are used for multi-ingredient foods such as baked products. Code NC indicates a nutrient that is always calculated rather than analyzed, accounting for about 15% of the imputed values. These are nutrients such as carbohydrate by difference and calories Significance: Users of the database want to know the source of the nutrient values. This is particularly useful to other database developers who may have to use imputation for their database applications.