Publication : USDA ARS

ARS Home » Research » Publications at this Location » Publication #182750

Title: SENSITIVITY ANALYSIS OF THE NON-PARAMETRIC NEAREST NEIGHBOR TECHNIQUE TO ESTIMATE SOIL HYDRAULIC PROPERTIES

Author

	Rawls, Walter
	Pachepsky, Yakov
	VAN GENUCHTEN, R - USDA-ARS RIVERSIDE, CA

Submitted to: Agronomy Abstracts
Publication Type: Abstract Only
Publication Acceptance Date: 9/9/2005
Publication Date: 11/6/2005
Citation: Rawls, Pachepsky, Y.A., van Genuchten, R. 2005. Sensitivity analysis of the non-parametric nearest neighbor technique to estimate soil hydraulic properties [abstract]. Soil Science Society of America Annual Meeting. 2005 CDROM.

Interpretive Summary:

Technical Abstract: One type of the non-parametric lazy learning algorithms, a k-Nearest Neighbor (k-NN) algorithm has been applied to estimate water retention at –33 and –1500 kPa matric potentials. To obtain a measure of uncertainty of the estimations, we developed an ensemble of predictions by performing multiple randomized subset-selection from the main data set and performing subsequent calculations on each of the data subsets. Running the k-NN algorithm means searching a ‘reference’ data set – analogous to the development or training data sets used for classic PTFs – for soils that are most similar to the target soil, based on the selected input attributes; and using the weighed average of their output attribute as the estimate. Performance of such technique depends on the goodness of selection of the most similar (nearest) soils, their weighing while the output is formulated and on a number of other settings in the algorithm or properties of the data sets that are used to apply the technique. We characterized the sensitivity of this technique to (1) un-equal weighing of input attributes; (2) the choice between different weighing methods to set the influence of each selected soil on the output; (3) the number of ensembles/replicates run on randomly selected data subsets; (4) the presence of outliers in the reference data set; (5) the influence of local data density; (6) the addition of new – locally specific – data to the reference data set; and (7) estimations made to soils with differing distribution of properties.