Skip to main content
ARS Home » Research » Publications at this Location » Publication #164901

Title: PARTIAL LEAST SQUARES REGRESSION OF SPECTRAL DATA, WITH VALIDATION, USING SAS MACRO LANGUAGE

Author
item Delwiche, Stephen - Steve

Submitted to: Near Infrared Spectroscopy Journal
Publication Type: Other
Publication Acceptance Date: 2/1/2005
Publication Date: 3/6/2005
Citation: Delwiche, S.R. 2005. Partial least squares regression of spectral data, with validation, using SAS macro language. NIR Publications Website (SAS Section) http://www.nirpubications.om/software/index.html.

Interpretive Summary: Near-infrared (NIR) spectroscopy is a commonly used analytical technique for rapidly measuring the composition and properties of food, agricultural, chemical, and pharmaceutical products. The essential components to this technology are the spectrometer, typically operating at wavelengths just beyond those visible to the human eye, and a computer to both control the spectrometer and process the acquired signals. A subcomponent of the processing is the multivariate regression method that is used to relate the spectral readings to the composition or property of interest. Typically a linear regression method, known as partial least squares (PLS), is utilized, which, until the author's work, was only available as custom, typically expensive software. A PLS program was developed that runs within the SAS (Cary, NC) environment, a general-purpose research and business statistics package that runs on many computer platforms and is widely available to commercial, university, and government facilities. This program has the feature of allowing the user to examine the effect of applying a mathematical transformation to the spectra, just before PLS regression. Known as spectral pretreatments, these transformations have been known by NIR researchers for years as beneficial to calibration development. The relative effects of the various pretreatments, which include corrections for particle size variation, smoothes and numerical approximations to mathematical derivatives, are explored by this program in one batch submission. Originally developed by the author about two years ago, the program has undergone revision, such that the newest version offers improved output, data compression prior to regression, and testing of the regression equation by application to an independent data set. The primary beneficiaries of this program are the data analysts in charge of developing meaningful calibration models.

Technical Abstract: A computer program that was written in 2002 in the SAS language for the purpose of examining the effect of spectral pretreatments on partial least squares regression of near-infrared (or similarly structured) data has been substantially expanded. Many of the newest program's features are the same as the older version: 1) operates in an unattended batch mode, 2) user specifies a number of commonly used spectral pretreatments, alone or in combination. 3) smoothes, derivatives, and wavelength windowing are selectable 4) Full cross-validation with two methods utilized for determination of the optimal number of PLS factors, 5) output results to the SAS output window, as well as to text files for ready import by spreadsheet software, and reliant on SAS macro programming. Lacking in the original version, but now present in the newest version include the features of data compression within each spectrum through averaging of neighboring wavelengths, user specification of the number of PLS factors, and most important, the ability to validate the PLS regression equations on separate, independent test sets. This program is to be used by spectral analysts and chemometricians who wish to explore the effects of various spectral pretreatments and their combinations in their quest to develop optimal regression equations for the analyte under study.