Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #295590

Research Project: Enhancing Breeding of Small Grains through Improved Bioinformatics

Location: Plant, Soil and Nutrition Research

Title: Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised

Author
item Akdemir, Deniz - Cornell University - New York
item Jannink, Jean-luc

Submitted to: Intelligent Data Analysis (An International Journal)
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/14/2013
Publication Date: 9/1/2014
Publication URL: http://DOI: 10.3233/ISA-140672
Citation: Akdemir, D., Jannink, J. 2014. Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised. Intelligent Data Analysis (An International Journal). 18(5):857-872.

Interpretive Summary: One approach to prediction involves using predictor variables to develop a large number of rules, called an ensemble, that split observations into different and divergent groups. These rules are then combined in a processing step to obtain a prediction for each individual. In this article, we propose several new approaches for processing a large ensemble of rules to generate accurate predictions and estimate relationships among individuals. We show with various examples that for regression problems with many predictors the models constructed by processing the rules with statistical method called partial least squares regression have significantly better prediction performance than processing using other existing methods. When rule ensembles are used to estimate relationships among individuals and cluster them, measures of cluster validity indicate high quality groupings.

Technical Abstract: In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with partial least squares regression have significantly better prediction performance than the ones produces by the random forest or the rulefit algorithms which use equal weights or weights estimated from lasso regression. When rule ensembles are used for semi-supervised and unsupervised learning, the internal and external measures of cluster validity point to high quality groupings.