Skip to main content
ARS Home » Research » Publications at this Location » Publication #143906

Title: SAS MACRO LANGUAGE PROGRAM FOR PARTIAL LEAST SQUARES REGRESSION OF SPECTRAL DATA

Author
item Delwiche, Stephen - Steve
item Reeves Iii, James

Submitted to: Near Infrared Spectroscopy Journal
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/15/2003
Publication Date: 5/1/2004
Citation: Delwiche, S.R., Reeves III, J.B. 2004. SAS macro language program for partial least squares regression of spectral data [computer program]. Available: Journal of Near Infrared Spectroscopy software archive at http://www.nirpublications.com/software/index.html.

Interpretive Summary: A computer program was written in the SAS language for the purpose of examining the effect of spectral pretreatments on partial least squares regression of near-infrared (or similarly structured) data. The program operates in an unattended batch mode, in which the user may specify a number of commonly used spectral pretreatments, alone or in combination. These pretreatments include the two common ones for particle size variation, namely multiplicative scatter (or signal) correction, and standard normal variate transformation. Additionally, it includes a running mean smooth, Savitzky-Golay smooth or derivative, and a wavelength region truncation option. Size of the convolution window and the polynomial for use in application of the Savitzky-Golay transformation are selectable. The program relies on the SAS macro programming language, specifically through the use of nested loops and global variables across the common constructs of the SAS language - data steps and procedures. The user is given great flexibility in selecting the transformations to examine, such that hundreds of pretreatment combinations may be examined with one run of the program. For each pretreatment regime or trial, a full leave-one-out cross validation PLS regression is performed. Program output is both to the SAS output window and to two text files. In the latter case, the output consists of one line of model performance statistics (e.g., standard error of cross validation, optimal number of PLS factors, coefficient of determination) for each trial. These files are designed for import into spreadsheet programs so that the user may compare the relative merits of the trials through the sorting and graphing features of the spreadsheet program. The beneficiaries of this computer program are scientists developing quantitative modeling.

Technical Abstract: A computer program was written in the SAS language for the purpose of examining the effect of spectral pretreatments on partial least squares regression of near-infrared (or similarly structured) data. The program operates in an unattended batch mode, in which the user may specify a number of commonly used spectral pretreatments, alone or in combination. These pretreatments include the two common ones for particle size variation, namely multiplicative scatter (or signal) correction, and standard normal variate transformation. Additionally, it includes a running mean smooth, Savitzky-Golay smooth or derivative, and a wavelength region truncation option. Size of the convolution window and the polynomial for use in application of the Savitzky-Golay transformation are selectable. The program relies on the SAS macro programming language, specifically through the use of nested loops and global variables across the common constructs of the SAS language - data steps and procedures. The user is given great flexibility in selecting the transformations to examine, such that hundreds of pretreatment combinations may be examined with one run of the program. For each pretreatment regime or trial, a full leave-one-out cross validation PLS regression is performed. Program output is both to the SAS output window and to two text files. In the latter case, the output consists of one line of model performance statistics (e.g., standard error of cross validation, optimal number of PLS factors, coefficient of determination) for each trial. These files are designed for import into spreadsheet programs so that the user may compare the relative merits of the trials through the sorting and graphing features of the spreadsheet program.