Publication : USDA ARS

ARS Home » Research » Publications at this Location » Publication #193576

Title: SENSITIVITY ANALYSIS OF THE NON-PARAMETRIC NEAREST NEIGHBOR APPROACH TO ESTIMATE WATER RETENTION

Author

	NEMES, ATTILA - UNIV. OF CA, RIVERSIDE
	Rawls, Walter
	Pachepsky, Yakov
	Van Genuchten, Martinus

Submitted to: Vadose Zone Hydrol
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/1/2006
Publication Date: 11/1/2006
Citation: Nemes, A., Rawls, W.J., Pachepsky, Ya. A., van Genuchten, M. Th. 2006. Sensitivity analysis of the nonparametric nearest neighbor technique to estimate soil water retention. Vadose Zone Journal. 5:1222-1235.

Interpretive Summary: A technique was developed that identifies and retrieves the nearest most similar stored objects to the target object from a stored data set. The performance of the approach compared very well with other techniques for estimating water contents at –33 and –1500 kPa matric potentials. The k-NN technique showed little sensitivity to the choice of applied sample weighing techniques and to potential sub-optimal settings in terms of input attribute weighing. Differences in data density in parts of the reference data set do not seem to substantially impact the greatness of the estimation error. Substantial improvement was achieved for locally specific data when some local samples were included in the reference data set, while estimations for other samples remained almost unaffected. The technique is an effective alternative to other techniques and shows a large degree of stability and insensitivity to non-optimal settings and the use of different options.

Technical Abstract: One type of non-parametric lazy learning algorithm, a k-Nearest Neighbor (k-NN) algorithm has been applied to estimate water retention at –33 and –1500 kPa matric potentials. We used a hierarchical set of input attributes using soil texture, bulk density and organic matter content, and varied the size of the data set used to run the estimation algorithm. To obtain a measure of uncertainty of the estimations, we developed an ensemble of predictions by performing multiple randomized subset-selection from the main data set and performing subsequent calculations on each of the data subsets. We examined the sensitivity of this approach to: (1) estimations made to soils with differing distribution of properties; (2) the use of different sample weighing techniques; (3) the number of ensembles used in formulating the final estimate; (4) data density in the reference data set; (5) the presence of outliers in the reference data set; (6) the un-equal weighing of input attributes; and (7) the addition of new – locally specific – data to the reference data set. The k-NN technique performed almost equally well to neural network models developed on the same data to make estimations for data sets of different origin. The use of approximately 50 ensemble members resulted in estimation results that are not significantly affected by the addition of new ensemble members. The k-NN technique showed little sensitivity to the choice of applied sample weighing techniques and to potential sub-optimal settings in terms of input attribute weighing. Differences in data density in parts of the reference data set do not seem to substantially impact the greatness of the estimation error. Substantial improvement was achieved for locally specific data when some local samples were included in the reference data set, while estimations for other samples remained almost unaffected. The k-NN technique is an effective alternative to other techniques to develop PTFs. The technique shows a large degree of stability and insensitivity to non-optimal settings and the use of different options.