Steve E. Naranjo
Arid-Land Agricultural Research Center, USDA-ARS, Maricopa, AZ
William D. Hutchison
Department of Entomology, University of Minnesota, St. Paul MN
The Resampling Approach
Resampling for Validation of Sampling Plans:
Enumerative Sampling Plans
Binomial Sampling Plans
Field Data Requirements
General Input Requirements
Sample Plan Input Requirements
- Reliable and cost-effective sampling methods are critical to the development of monitoring systems for pest management and can enhance research activities that address issues in population ecology and population dynamics. Validation and evaluation of these plans are central to development and implementation in the field. Sampling plans are often developed from a restricted range of observations from a small area, but are then used over a wide area representing a novel array of environmental and agronomic conditions.
Sets of tools for sample plan evaluation have recently been developed (e.g. Nyrop & Binns 1991). These Monte Carlo simulations can be used to evaluate sampling models during the developmental phase; however, they may not be adequate for testing model validity and performance under field conditions. This is primarily due to the assumption of an underlying statistical distribution (e.g. negative-binomial, normal) which may not adequately represent the actual distributions of insects in all instances. Here we present a method in which actual field data is resampled to evaluate sample plan performance (Hutchison 1994, Hutchison et al. 1988, Naranjo and Hutchison 1997). We originally developed DOS-based computer software for this purpose. A new version which runs as an Excel Plug-in is now available. This new version has the same functionality and the general instruction below will still assist you in using RVSP. Please click the Help button on the new Plug-in to get additional instructions.
- Strengths: The major benefit of a resampling approach is that the underlying spatial distribution of the insect population is defined by actual field data rather than a theoretical model. By using independent field data it is possible to simultaneously test the accuracy of the basic model underlying the sampling plan (e.g. Taylor's power law, Iwao's patchiness regression, various proportion infested-mean density models) and the sampling error associated with the selection (sequential or otherwise) of sample observations from the field.
Limitations: The major limitation of a resampling approach is that additional field data, independent of that use to develop the sampling plan, must be collected. Ideally, these independent data also need to cover the range of population densities under which the sample plan will likely be used. Often, this task can be accomplished by withholding a certain amount of data during the developmental phase. We are currently evaluating the amount of data (both the number of data sets and the number of observations per data set) necessary to perform a robust analysis of sample plan performance.
- Resampling for Validation of Sample Plans (RVSP) is a user-friendly computer program designed specifically for the analysis and validation of commonly used sample plans. The current version of RVSP can be used to evaluate two fixed-precision sampling plans based on enumerative counts (Green's and Kuno's fixed-precision sequential sampling plans) and two sample plans based on binomial counts (Wald's SPRT and a fixed-sample size plan). The program is menu-based and permits the easy entry of sample plan parameters and field data. Items within each menu are accessed using the up and down cursor keys. There is currently no mouse support. RVSP also contains help screens to advise the user on program operation. RVSP is a DOS program that will run on any PC; it can also be executed from the Windows environment. A detailed example of using the program is given in Appendix A.
Basically, RVSP randomly selects observations from an actual data set until either the sequential rule is satisfied or a fixed number of samples has been drawn, depending on the sample plan tested. This process is repeated numerous times (default=500) for each data set. Based on these iterations, RVSP then calculates averages and variances for precision and sample size, as well as operating characteristics for the binomial plans which classify population densities relative to a specified threshold. RVSP provides both summary and detailed output that can be easily imported in spreadsheet and graphical programs for further examination and analysis.
- RVSP can be used to analyze two fixed-precision sequential sampling plans based on enumerative counts.
Green's Plan: Green's (1970) fixed-precision sampling algorithm uses Taylor's power law, S^2 = am^b, to model the relationship between the mean (m) and variance (S^2). The sequential sampling model is given as: Tn = (an^[1-b]/D^2)^[1/(2-b)] where Tn is the cumulative count from n samples, and D is precision (SE to mean ratio).
Kuno's Plan: Kuno's (1969) fixed-precision sampling algorithm uses Iwao's patchiness regression, m* = a + bm, to model the mean-variance relationship, where m* is Lloyd's mean crowding index and the variance, S^2 = (a+1)m + (b-1)m^2. The sequential sampling model is given as: Tn = (a + 1)/(D^2 - [b - 1]/n)
- RVSP can be used to analyze sequential and fixed-sample-size plans based on binomial counts. Any of a number of proportion infested-mean density models can be used because RVSP only requires proportion infested as input.
Wald's SPRT: Wald's (1947) sequential plan allows population density to be classified as either above or below a threshold density. For binomial count data upper and lower sampling stop lines are defined as: Tn = Bx +- A where x is the number of sample units examined, Tn is the cumulative number of units infested with at least t insects, and B and A are parameters derived as standard functions of specified type I and II error rates, and upper and lower boundaries bracketing the threshold density.
Fixed-Sample-Size Plan: This plan is based on a user-specified number of samples to determine the proportion of sample units infested with a least t insects. This value can then be compared to a specified threshold level.
- The number of independent field data sets needed to run RVSP will depend on the particular sampling model being tested and how rigorous one wishes to test that sample model. At a minimum the data sets should cover the range of population densities likely to be encountered by users of the sample plan. Likewise, the required number of observations in each data set will depend on the sample model being tested, the specified precision or error rates, the width of decision boundaries and whether resampling is run with or without replacement. We have found that data sets with around 100 observations will satisfy most testing requirements. When running in "resample without replacement" mode, RVSP will test the adequacy of each data set before executing. A warning message is given if the anticipated sample size exceeds 75% of the observations available in a given data set. The data set will not be executed if the sample size exceeds 90% of the observations available, and execution will terminate for a data set if the sample size exceeds the number of observations during any resampling iteration. These safeguards do not operate in "resample with replacement" mode.
- Input Data File: The name of the ASCII file containing the names of the individual input data files and output files to store the results of each resample iteration. The input file name occupies columns 1-12 and the output file occupies columns 14-25. This specific format must be followed to allow RVSP to correctly read the file names. Each individual input file has one sample observation per line. Input files can be easily made using DOS-Edit, or a wordprocessor or spreadsheet capable of outputting ASCII text files. An example input file (batch.dat) and field data sets are provided on the diskette (Appendix B).
- Output Data File: The name of the ASCII file containing the summary statistics for each field data set specified in the input file above. RVSP also gives the user the option of printing this summary.
- Resampling With or Without Replacement: Specifies how samples will be drawn by the computer. Resampling without replacement is more representative of how samples would be taken in a field. Preliminary analyses suggest that with data sets containing around 100 samples, there is little difference in output regardless of how samples are drawn. Resampling with replacement can be useful if data sets contain relatively few observations and/or the user wishes to test high levels of precision or low error rates.
- Minimum Sample Size: Specifies the minimum number of samples that will be drawn before sampling is terminated.
- Resample Iterations: The number of times each data set is resampled.
- Green's or Kuno's Plan
- Precision (D): The desired fixed-precision of the sample plan.
- Taylor's a and b: Parameters of Taylor's power law.
- Iwao's a and b: Parameters of Iwao's patchiness regression.
Wald's SPRT or Fixed-Sample-Size Plan
- Alpha and Beta Error: Type I and II error rates.
- Upper and Lower Bounds: Boundaries about the action threshold, given in terms of proportion infested. Together with alpha and beta errors these parameters determine the upper and lower sequential stop lines for Wald's plan.
- Action Threshold: Given as a proportion infested
- Tally Threshold: The minimum number of insects necessary to consider a sample unit infested.
- RVSP creates a summary containing the sample plan parameters and statistics summarizing results over all iterations for a given field data set. Optionally, RVSP can also create an output file of results for each individual iteration of a data set if the user wishes to perform further analyses. Output can be imported into spreadsheet and graphics programs for further examination and analysis.
Green's or Kuno's Plan: The summary tabulates the mean, SD and n of the original data set, and the mean, SD, maximum, and minimum of precision and required sample size over all resampling iterations.
Wald's SPRT: The summary tabulates the mean, n and the proportion infested of the original data set, and the mean, SD, maximum, and minimum of proportion infested and required sample size over all resampling iterations. The operating characteristic (probability of taking no action) is calculated directly as the proportion of iteration in which the proportion infested exceeded the upper boundary. The OC function can be estimated by plotting these probabilities against the observed mean.
Fixed-Sample-Size: Similar to Wald's SPRT with the exception that sample size statistics are not given and the OC is calculated as the proportion of iteration in which the proportion infested exceeded the threshold.
- Green, R. H. 1970. On fixed precision sequential sampling. Res. Popul. Ecol. (Kyoto) 12: 249-251.
Hutchison, W. D. 1994. Sequential sampling to determine population density. P. 207-244. In L. Pedigo & G. Buntin (eds.), Handbook of Sampling Methods for Arthropods in Agriculture. CRC Press.
Hutchison, W.D., D.B. Hogg, M.A. Poswall, R.C. Berberet & G.W. Cuperus. 1988. Implications of the stochastic nature of Kuno's and Green's fixed-precision stop-lines: Sampling plans for the pea aphid (Homoptera: Aphididae) in alfalfa as an example. J. Econ. Entomol. 81:749-758.
Kuno, E. 1969. A new method of sequential sampling to obtain the population estimates with a fixed level of precision. Res. Popul. Ecol. (Kyoto) 11: 127-136.
Naranjo, S. E. & H. M. Flint. 1995. Spatial distribution of adult Bemisia tabaci in cotton and development and validation of fixed-precision sequential sampling plans for estimating population density. Environ. Entomol. 24: 261-270
Naranjo, S. E., H. M. Flint & T. J. Henneberry. 1996. Binomial sampling plans for estimating and classifying population density of adult Bemisia tabaci in cotton. Entomol. Exp. Appl. 80: 343-353.
Naranjo, S.E. & W.D. Hutchison. 1997. Validation of arthropod sampling plans using a resampling approach: software and analysis. Am. Entomol. 43: 48-57.
Nyrop, J. P. & M. Binns. 1991. Quantitative methods for designing and analyzing sampling programs for use in pest management. P. 67-132. In D. Pimentel (ed.), Handbook of Pest Management in Agriculture, Vol. 2. CRC Press.
Wald, A. 1947. Sequential Analysis. John Wiley & Sons, New York.
- Example analyses are presented here in order to familiarize the user with the operation of RVSP. The example field data sets used here were those used by Naranjo and Flint (1995, Tables 2-3) to validate two fixed-precision sampling plans for adult Bemisia tabaci in cotton.
After you have copied all the files (see APPENDIX B) onto your hard drive simply type RVSP to execute the program. You will see the startup screen from which you can use the up and down cursor keys to highlight the menu item of interest. Start by highlighting Kuno's Plan and pressing the [ENTER] key. This will bring up another menu screen with 5 choices. Highlight Display/Modify Initialization and press [ENTER]. This brings up a third screen in which you can input data for a resampling analysis. The cursor is automatically positioned in the input data file field. Type BATCH.DAT in this field and use the down arrow to move to the next field. For this example we will accept the default value of D=0.25. Continue by entering data into the rest of the fields as follows: Iwao's a = -0.53, Iwao's b = 2.03, and Output data file = SAMPLE.OUT. Use the default values for the remaining fields. When all the data is entered press the [ESC] key. This will register your input and return you to the previous menu. At this point you could save the data you just entered in an ASCII file by selecting Save Initiatilization File. You will be prompted for a file name. You could also retrieve an existing initialization file if one had been previously saved. This is handy for repetitive runs and saves the time of re-entering the same data. You can verify the data entered at any time by selecting Display/Modify Initialization. Once you are satisfied that the data is correct, select Run Simulation. This will begin the resampling analysis.
Before resampling begins, you will be asked whether you want to print a copy of the summary table and whether you want the program to generate raw data files for each field data set specified in BATCH.DAT. These are handy if you want to do additional analysis beyond the summary statistics already provided. For this example answer Y (this will require about 600K of disk space for the 20 example field data files). RVSP will then begin resampling each data set in turn, flashing the name of the data set being resampled as it proceeds. RVSP has several error traps that permit continuation of the program if errors occur. These include checking to see that file names listed in BATCH.DAT exist, and testing the adequacy of individual data sets for analyses as previously discussed.
Once all data sets have been resampled you will be returned to the Kuno screen and the summary output table will be saved to disk using the name SAMPLE.OUT. This same table will also be printed if you asked the program to do so. Pressing the [ESC] key as this point will return you to the main menu where you can select another sample plan to test or exit the program. Work through the remaining sampling plans using the parameters given on the following summary tables. In all cases use BATCH.DAT as the input file, but rename the output file or RSVP will overwrite the summary file generated in a previous execution. Any valid DOS filename is acceptable.
- The zip file contains the following files:
RVSP20.xlam Excel Plug-in file
RVSP20.chm Excel help file
RVSP Installation Instructions
RVSPMAN.DOC Software documentation(V1.2/2.0)
BATCH.DAT Example input file
TEST183.92 Example field data set files
It is recommended that you create a separate directory for RVSP and load all these files in that directory. RVSP20.xlam and RVSP20.chm must be in the same directory.
Questions regarding this material can be directed to: Dr. Steve Naranjo firstname.lastname@example.org