Submitted to: Near Infrared Spectroscopy Journal
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: October 15, 2003
Publication Date: December 31, 2003
Citation: Reeves III, J.B., Delwiche, S.R. 2003. Statistical Analysis System program for spectral pre-treatments and partial least squares analysis [computer program]. Available: Journal of Near Infrared Spectroscopy software archive at http://www.nirpublications.com/software/index.html.
Interpretive Summary: Near-infrared spectroscopy uses light to determine the composition of products ranging from food to gasoline. In order to determine the composition of said materials using near-infrared it is necessary to develop a calibration for the product in question. Calibration development is the process of relating the spectral information to the composition of materials of interest using various statistical methods. While there are several different procedures used for calibration development, in almost all cases, it is necessary to pre-treat the spectral data in order to obtain the best results. These data pre-treatments are designed to both extract the most information possible from the spectra and also to reduce or remove differences in the spectra often not related to the composition itself, but rather to the physical properties of the sample such as particle size differences. Unfortunately, there are a great variety of such pre-treatments possible and it is often not possible to know exactly which treatment should be used without actually testing them all, a time consuming and tedious process. This work designed a program using SAS, a general statistical and programming package, which can test over 1000 different pre-treatments and produce summary reports defining how well each pre-treatment works. These summaries can them be used to reduce the number of pre-treatments to a few which can be studied in depth before deciding on the best calibration procedure to use.
The objective of this work was to create a SAS program for the pre-treatment of spectral data and its subsequent utilization in an automated program for performing PLS. The resulting program requires two initial data sets: one consists of a GRAMS multifile (Galactic Industries, Inc, Salem, NH) and contains the spectral information only; the second file consists of an ASCII file in the form of the old GRAMS CFL file without the header information and contains the file identifications and analyte values. The essential function of the program is to create a file containing pre-treated and original spectral data which can then be used as input data to SAS PLS. The program as written is capable of creating over 1000 different data pre-treatments consisting of various combinations of derivatives (gap and Savitsky-Golay), scatter correction (multiplicative, standard normal variate with and without detrend, and detrend alone), mean centering and variance scaling and averaging or skipping of data points). PLS is run on each combination of data pre-treatments for each analyte and two summaries produced in order of goodness of fit for each analyte analyzed. The first summary is based on the final calibration r-square with the number of factors determined by a Monte-Carlo type selection process. The second is based on the RMSE of the one-out cross validation with the number of factors determined by the F-test. In conclusion, this program allows for the rapid and simply testing of thousands of possible PLS calibrations using SAS.