Using Stata to Manage and Create a Research Data Bank
We manage a longitudinal research data bank containing 3,000 variables that adds 25,000 observations per year. Data are batch converted from SQL to Stata on a daily basis, resulting in the creation of 20 preliminary data sets. We then use Stata to quality control the data and to prepare a single research data set that can be augmented as required by the data analyst by calls to specialized programs that access the additional data sets. Our philosophy is to that most of the quality control and programming and data set preparation should be built into the dataset creation process rather than requiring the data user to do this. For example, data quality checks and complex data preparation of items such as costs and hospital and mortality codes are programmed into the data set creation process, and relevant additional data sets are automatically created to reflect such new data. The basic data set consists of research and control variables that are needed for most analyses. With simple programming statements such as -getwork- and -getcosts-, preprocessed work and cost data, for example, are merged with the basic set. Global macros identify file locations, database versions, and variable sets, making updating and sharing simple.
Year of publication: |
2003-01-08
|
---|---|
Authors: | Wolfe, Frederick ; Michaud, Kaleb |
Institutions: | Stata User Group |
Saved in:
Saved in favorites
Similar items by person
-
Controlling for time-dependent confounding using marginal structural models
Fewell, Zoe, (2004)
-
Charlett, André, (2005)
-
How to face lists with fortitude
Cox, Nicholas J., (2002)
- More ...