Submitted to: Development of Pedotransfer Functions in Soil Hydrology
Publication Type: Book / Chapter
Publication Acceptance Date: December 22, 2003
Publication Date: November 30, 2004
Citation: Pachepsky, Y.A. 2004. Developing pedotransfer functions with database exploration methods. Development of Pedotransfer Functions in Soil Hydrology. pp. 21-32. Interpretive Summary: Soil hydraulic properties, i.e., soil's ability to retain and transmit water, nutrients, and contaminants, are characterized by measurable parameters. The measurements are extremely time- and labor-consuming. In many cases, using estimates of soil hydraulic properties instead of measurements becomes the only feasible alternative. As the quality of the estimates affects engineering and management decisions, the methodology of the estimates has recently attracted special attention. Traditional statistical methods have appeared to be limited because they require an equation to be known a priori and cannot suggest the way of finding the best equation. New methods are needed to uncover the relationships between soil data and to suggest the type of equations and predictors to be used. This chapter presents two such methods that have been successfully used to quantify relationship between soil water retention, soil hydraulic conductivity, and basic soil properties, such as texture, structure, and organic matter content. The group method of data handling generates flexible regression equations and sieves out the soil basic properties that do not have much of effect on sol hydraulic properties. The regression tree method ranks the soil basic properties by their effect on soil hydrology, and splits the database into parts in which soil hydraulic properties appear to be influenced by different soil basic properties. Both methods are heuristic, and therefore require a judgement to decide on the desirable complexity of predictive models they can build. The chapter presents cross-validation techniques for that purpose.
Technical Abstract: Pedotransfer functions relate soil hydraulic properties with easier measurable data on soil basic properties, e.g. data readily available from soil survey. After the database is assembled and basic properties-predictors are selected, the subsequent PTF development may (a) retain all assumed predictors in the PTF, (b) eliminate part of them based on statistical tests, (c) eliminate some predictors and define the relative importance of the remaining predictors. This chapter present two new PTF development methods that do not assume a type of dependence and can automatically eliminate the less and the least influential predictors. Both methods have been successfully used in PTF development. The group method of data handling (GMDH) constructs a flexible equation of neural-network type to relate the inputs to outputs, and at the same time has a built-in algorithm to retain only essential input variables. The regression tree modeling uncovers structure in data by partitions data first into two groups, then into four groups, and so on, providing groups as homogeneous as possible at each of the levels of partitioning. Each partitioning can be viewed as a branching, and the final fit of model to data looks like a tree with two branches originating in each node. Both the regression trees and the GMDH are iteratively building models of progressively increasing complexity. The processes have to be stopped to prevent over-fitting, otherwise the predictive capability of the resulting models with respect to new data will be deplorable. The cross-validation techniques are presented to address this issue.