Design effects in the analysis of longitudinal survey data
The design effect measures the inflation of the sampling variance of an estimator as a result of the use of a complex sampling scheme. It is usually measured relative to the variance of the estimator under simple random sampling. Many social survey designs employ multi-stage sampling, leading to some clustering of the sample and this tends to lead to design effects greater than unity. There is some empirical evidence that design effects from clustering tend to decrease the more complex the analysis. For example, design effects for regression coefficients are often found to be less than design effects for the mean of the dependent variable in the regression. Evidence of design effects close to unity for such analyses may be used by some analysts of survey data to justify ignoring the sampling design in complex analyses. In this paper we present some evidence of an opposite tendency, for design effects to be higher for complex longitudinal analyses than for corresponding cross-sectional analyses. Our empirical evidence is based upon data from the British Household Panel Study. This survey follows longitudinally a sample of individuals selected in 1991 by two-stage sampling, with clustering by area. Data are collected in annual waves. Our analyses are based upon a subsample of women aged 16-39. The dependent variable is a gender role attitude score, derived from responses to six five-point questions, and treated as a continuous variable. Covariates include age group, economic activity and educational qualifications. Longitudinal regression models include random effects for women. Data are analysed for five waves of the survey when the gender role attitude questions were asked. The design effects for the regression coefficients are found to increase the more waves are included in the analysis. A similar tendency is observed for estimates of the time-averaged mean of the dependent variable. A possible theoretical explanation is provided. The implication of these findings is that standard errors in analyses of longitudinal survey data may be very misleading if the inital sample was clustered and if this clustering is ignored in the analysis.