Skip to main content
ARS Home » Research » Publications at this Location » Publication #153085

Title: LOGISTIC REGRESSION MODELING - JMP START(TM) YOUR ANALYSIS WITH A TREE

Author
item SIMPSON, PIPPA - ACHRI - DAC
item GOSSETT, JEFF - ACHRI - DAC
item PARKER, JAMES - ACHRI - DAC
item HALL, RENEE - ACHRI - DAC

Submitted to: Proceedings of SAS Users Group International Meeting
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/1/2003
Publication Date: 3/31/2003
Citation: SIMPSON, P., GOSSETT, J.M., PARKER, J.G., HALL, R.A. Logistic Regression Modeling - JMP Start(TM) Your Analysis with a Tree. PROCEEDINGS OF SAS USERS GROUP INTERNATIONAL MEETING. 2003. Paper No. 257-28.

Interpretive Summary: This exploratory study indicates how regression analysis alone can lead to misleading conclusions, but if used in conjunction with tree analysis a more appropriate model may be built. This study uses NHANES 1999-2000 data to demonstrate the benefits of classification and regression trees in model selection. The dataset was used to develop a logistic model, with obesity as the outcome. Variables included were diet, demographics, income status, federal aid, obesity risk factors, as well as their interactions. Statistical software (Partition© platform in JMP', Version 5) was used to create a regression tree that examined the relationship between the variables mentioned above. A cutpoint for age was found by using the tree analysis, allowing for easily interpretable results. This analysis was also helpful in identifying the best subpopulations to target for interventions related to obesity. The tree approach described is an exploratory technique and requires some form of confirmatory analysis or validation procedure.

Technical Abstract: In regression, the decision about which variables to include and in which form they should be included in the model can be very difficult. Screening variables can be very tedious; perhaps that is the reason why many models seen, for example, in nutrition, only include main effects. However, because of its uses in screening, a tree can JMPstart our regression model analysis. All types of variable can be included in a tree, including variables with missing values and variables that are highly interrelated. This enables consideration of the form of the variables to be included. Because of the tree methodology, cutpoints for variables that best optimize a function are given, so it is possible to consider new variables generated from the old variables. Trees are also useful for exploring the interaction of variables. For example, if a variable appears on one side of a tree and not on the other, it suggests that there is indeed an effect of interaction. Using a large nutrition dataset, we will show how using regression alone can lead to misleading conclusions whereas the use of tree analysis in conjunction with logistic regression can enable building an appropriate model.