Analysing linked employer-employee data with Stata
The use of datasets which contain information on both workers and the firms they work for is growing rapidly, especially in fields such as applied econometrics and labour economics. Similar data structures may also arise in the analysis of data on patients and doctors, or students and schools. Many of these datasets are extremely large, some containing a substantial fraction of the population of firms and workers. The analysis of this kind of data poses two related problems. The first is a problem of computing power, memory and storage. The second is the statistical problem of how to control for and estimate the "unobserved effects" (also known as "fixed effects") for both workers and firms. In this presentation we explain the basic issues and how we have dealt with them using Stata. We illustrate using both simulated data and a large linked employer-employee panel collected by the Institut fur Arbeitsmarkt und Berufsforschung in Germany. We show how to implement various potential methods, and suggest problems and limitations which the analyst using Stata may encounter.