Skip to main content
ARS Home » Pacific West Area » Davis, California » Western Human Nutrition Research Center » Obesity and Metabolism Research » Research » Publications at this Location » Publication #281649

Title: Structured variable selection with q-values

Author
item GARCIA, TANYA - Texas A&M University
item MULLER, SAMUEL - University Of Sydney
item CARROLL, RAYMOND - Texas A&M University
item DUNN, TAMARA - University Of California
item THOMAS, ANTHONY - University Of California
item Adams, Sean
item PILLAI, SURESH - Texas A&M University
item WALZEM, ROSEMARY - Texas A&M University

Submitted to: Biostatistics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/4/2013
Publication Date: 4/10/2013
Citation: Garcia, T.P., Muller, S., Carroll, R.J., Dunn, T.N., Thomas, A.P., Adams, S.H., Pillai, S.D., Walzem, R.L. 2013. Structured variable selection with q-values. Biostatistics. 23580317.

Interpretive Summary: When considering scientific results and high-content datasets containing many variables, some of which can both be influenced by and influence other variables used in statistical analysis, the already challenging problem of selecting variables when the number of covariates exceeds the sample size becomes more difficult. To address this problem, statistical approaches are needed that take these factors into account while also correcting for potential error do to multiple comparisons. An illustrative example is a metabolic study in mice that has diet groups and gut microbial percentages which may affect changes in multiple phenotypes related to body weight regulation. The dataset has more variables than observations and diet is known to act directly on the phenotypes as well as on some or potentially all of the microbial percentages. Interest lies in determining which gut microflora influence the phenotypes while accounting for the direct relationship between diet and the other variables. A new methodology for variable selection in this context is presented that links the concept of q-values from multiple hypothesis testing to the recently developed weighted Lasso.

Technical Abstract: When some of the regressors can act on both the response and other explanatory variables, the already challenging problem of selecting variables when the number of covariates exceeds the sample size becomes more difficult. A motivating example is a metabolic study in mice that has diet groups and gut microbial percentages which may affect changes in multiple phenotypes related to body weight regulation. The data has more variables than observations and diet is known to act directly on the phenotypes as well as on some or potentially all of the microbial percentages. Interest lies in determining which gut microflora influence the phenotypes while accounting for the direct relationship between diet and the other variables. A new methodology for variable selection in this context is presented that links the concept of q-values from multiple hypothesis testing to the recently developed weighted Lasso.