Submitted to: Soil Science Society of America Journal
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: November 6, 2005
Publication Date: March 1, 2006
Citation: Rawls, W.J., Nemes, A., Pachepsky, Y.A. 2006. Use of the non-parametric nearest neighbor approach to estimating soil hydraulic properties. Soil Science of America. 70:327-336. Interpretive Summary: A technique was developed that identifies and retrieves the nearest most similar stored objects to the target object from a stored data set. The performance of the approach compared very well with other techniques for estimating water contents at –33 and –1500 kPa matric potentials. The technique showed little sensitivity to using different input data sets or decreased data set sizes used to make the estimations, as well as to certain design-parameter settings. The technique provides an efficient tool for estimating missing soil water retention data for use in applications in different fields.
Technical Abstract: Non-parametric approaches are being used in various fields to address classification type problems, as well as to estimate continuous variables. One type of the non-parametric lazy learning algorithms, a k-Nearest Neighbor (k-NN) algorithm has been applied to estimate water retention at –33 and –1500 kPa matric potentials. Performance of the algorithm has subsequently been tested against estimations made by a neural network (NNet) model, developed using the same data and input soil attributes. We used a hierarchical set of inputs using soil texture, bulk density and organic matter content to avoid possible bias towards one set of inputs, and varied the size of the data set used to develop the NNet models and to run the k-NN estimation algorithms. Different ‘design-parameter’ settings, analogous to model parameters have been optimized. The k-NN technique showed little sensitivity to potential sub-optimal settings in terms of how many nearest soils were selected and how those were weighed while formulating the output of the algorithm, as long as extremes were avoided. The optimal settings were, however, dependent on the size of the development/reference data set. The non-parametric k-NN technique performed mostly equally well with the NNet models, in terms of root-mean-squared residuals and mean residuals. Gradual reduction of the data set size from 1600 to 100 resulted in only a slight loss of accuracy for both the k-NN and NNet approaches. The k-NN technique is a competitive alternative to other techniques to develop PTFs, especially since no re-development of PTFs is needed as new data become available.