Assignment 7: Variance Estimation for Complex Surveys due: April 30, 1996 Most surveys taken by the Census Bureau or large survey organizations such as Gallup or Research Triangle Institute are complex, with several stages of clustering as well as stratification. In addition, weights are often used make the demographic information of the sample agree with information for the population from the U.S. Decennial Census, and to adjust for nonresponse. In such a situation, calculating variances for each response would be extremely tedious, requiring a new computer program to calculate the variance for each response and survey. Since many of the quantities of interest are nonlinear functions of population means (e.g. mu_y/mu_x), we need a flexible method for calculating variances. Several general methods for the estimation of variances of complex functions of the population means have been derived, methods that can be applied to almost any theory. These are described in Wolter (1985). One of the simplest is the random groups method, also known as interpenetrating subsampling. In this method, the basic survey design is replicated k times. This may be done by drawing k different samples, or by drawing one sample, and later splitting it into k parts, each part being a miniature version of the basic sampling design. The jackknife is also useful for estimating the variance of nonlinear quantities. If we are interested in the parameter theta, then we can also estimate theta by dividing the sample into k subgroups, as in the random group method, but instead of estimating theta in each subgroup separately, we estimate theta using all of the data except that in the jth subgroup, for j = 1,...,k. The estimate obtained by omitting the jth subgroup is theta-hat(j). Let theta-hat(.) be the average of all of the theta-hat(j). Then theta-hat(.) also estimates theta, and the variance of theta-hat, the estimate of theta using all of the data, can be estimated by (k-1) \sum (theta-hat(j) - theta-hat(.))2/k. For these exercises, draw a simple random sample of size 200 from Lockhart City. We want to estimate R = mu_y/mu_x, the ratio of the price a household is willing to pay for cable TV (Y) to the assessed value of the house (X). We will apply them to a simple random sample and an estimated ratio; however the techniques are usually applied to more complicated designs. 1. Randomly divide your sample into 10 different subsamples, each of size 20. Be sure to explain how you randomly divided your sample. Find the ratio r_i = ybar_i/xbar_i for each group. The r_i are 10 (almost) independent observations; if we use rbar to estimate R, then the estimated variance of rbar is sum (r_i - rbar)^2/90. For the purpose of grading, rearrange your output file from SURVEY so that the first group is the first 20 observations, the second group is the next 20 observations, etc. 2. Calculate 200 different estimates of R, each using all but one of the 200 data points. One way of doing the calculations is to define two new variables, SUMX and SUMY, to be the sum of all 200 observations for X and Y, respectively. Then the variable YJACK = (SUMY - Y)/199 contains the 200 values of ybar(j). Use the variables YJACK and XJACK to calculate the jackknife estimate of the variance of r = ybar/xbar. 3. How do your variance estimates from 1 and 2 compare with the usual Taylor series-based estimate of the variance of this ratio?