Submitted to: Near Infrared Spectroscopy Journal
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: October 15, 2003
Publication Date: December 31, 2003
Citation: Reeves III, J.B., Delwiche, S.R. 2004. Statistical Analysis System partial least squares regression for analysis of spectroscopic data. Near Infrared Spectroscopy Journal. 11(3):415-431.
Interpretive Summary: In order to relate spectroscopic data to the composition of materials of interest it is necessary to use a statistical method such as partial least squares regression (PLS). Many programming packages designed specifically for this are available, but these often lack the ability to rapidly try out many different data manipulation variations at one time. SAS on the other hand is a general statistical package which allows for easy programming of multiple data manipulations, but lacks many of the data treatments routinely used in spectroscopy. The objective of this work was to investigate the potential of SAS PLS to perform chemometric analysis of spectroscopic data and to implement the data pre-treatments routinely used in spectroscopy. A program was written to implement derivatives and various corrections for particle size effects (scatter correction). Results have demonstrated that SAS can be used to perform PLS in the same manner as the more dedicated programs with the same results as far as accuracy, etc. However, the SAS program can generate the results for literally thousands of various variations which would take weeks or months by dedicated programs in a matter of a day. However, it does not have many of the bells and whistles such as easily generated plots available in the dedicated programs. In conclusion, SAS can be used in conjunction with a package designed specifically for PLS to rapidly test many possible PLS variations and to select a few variations for further study used a package designed specifically for PLS data analysis.
The objective was to investigate the potential of SAS PLS to perform chemometric analysis of spectroscopic data. As implemented, SAS can perform type II PLS only, PCR and RRR. While possessing several algorithms for PLS, various cross validation options, the ability to mean center and variance scale data prior to PLS analysis or for each cross validation, and various options for determining the number of factors to use, SAS does not possess any other spectral pre-treatments routinely used in spectroscopy. A program was written using SAS macro language to implement 1st and 2nd gap derivatives, Savitsky-Golay derivatives and smoothing, the ability to skip or average spectral data points, to correct spectra for scatter correction by either MSC or SNV correction with or without detrend, and finally to mean center all data prior to PLS. In addition, an F-test method for factor selection was added. These macros can be implemented alone or in differing combinations or order, and result in a summary report containing results for 100's or 1000's of different data pre-treatments. A second program implements the macros in a fixed order. Results using a set of 67 forage samples scanned in the near-infrared demonstrated that the same results can be achieved as with commercial chemometrics packages. In conclusion, SAS PLS while not possessing all the data pre-treatments of standard chemometric programs can quickly and conveniently test many different data pre-treatments while producing a single summary results file.