Submitted to: Meeting Proceedings
Publication Type: Review Article
Publication Acceptance Date: 1/24/2002
Publication Date: 4/17/2002
Citation: Gossett, J.M., Simpson, P., Parker, J.G., Simon, W.L. 2002. How complex can complex survey analysis be with SAS? In: Proceedings, 27th annual SAS User's Group International Conference, April 17, 2002, Orlando, Florida. Paper No. 266-27. Interpretive Summary: The objectives of this study were to show that SAS could be used in analyzing large, complex data sets, using simple algorithms to estimate variances using replicate methods. The USDA Continuing Survey of Food Intakes of Individuals (CSFII) 1994-1998 is an example of such a data base with replicate weights that can provide valuable information to scientists about what people in the US are eating. Other government data sets available to the public can be analyzed by SAS if they have replication weights, which many of them do. In the past because of the need for specialized software, many have not analyzed datasets with replicate weights. Using the CSFII database, this study shows a simple method of analyses that gives results comparable to other more specialized and complicated software, like WesVar or SUDAAN. Means and medians with their standard errors are calculated. This study indicates an approach, which allows for a wider use of SAS in analyzing large, complex data sets. The approach outlined in this paper could also be used for other statistics such as odds ratios and regression coefficients.
Technical Abstract: Introduction: In version 7.0, SAS included routines for analyzing complex survey data with the SURVEYMEANS and SURVEYREG procedures. These additions are welcome since SAS is the program of choice for many analysts and avoiding purchasing and learning other programs is preferable. However the SAS routines, which use Taylor linearization for estimating variances, are limited in their capabilities. There are several survey data sets, particularly government public release data, which include replicated weights. For these studies it is possible to extend the standard procedures using data step programming to calculate better variance estimates. Aim: To show that SAS can be used for complex survey data sets that have replication weights, particularly for balanced repeated replication (BRR) and jackknife type II (JK2). Specifically to show that it is possible to use SAS to calculate properly weighted variance estimates for summary statistics such as the mean, and median using JK2 estimates. Method: We will demonstrate that fairly simple algorithms can be implemented to estimate variances using replicate methods. The results will be shown to be equivalent to those obtained from SUDAAN or WesVar. Conclusion: It can be seen that this approach allows a wider use of SAS in complex surveys.