Data Editing and Imputation in Business Surveys Using “R”
Purpose – Missing data are a recurring problem that can cause bias or lead to inefficient analyses. The objective of this paper is a direct comparison between the two statistical software features R and SPSS, in order to take full advantage of the existing automated methods for data editing process and imputation in business surveys (with a proper design of consistency rules) as a partial alternative to the manual editing of data. Approach – The comparison of different methods on editing surveys data, in R with the ‘editrules’ and ‘survey’ packages because inside those, exist commonly used transformations in official statistics, as visualization of missing values pattern using "Amelia" and "VIM" packages, imputation approaches for longitudinal data using "VIMGUI" and a comparison of another statistical software performance on the same features, such as SPSS. Findings – Data on business statistics received by NIS’s (National Institute of Statistics) are not ready to be used for direct analysis due to in-record inconsistencies, errors and missing values from the collected data sets. The appropriate automatic methods from R packages, offers the ability to set the erroneous fields in edit-violating records, to verify the results after the imputation of missing values providing for users a flexible, less time consuming approach and easy to perform automation in R than in SPSS Macros syntax situations, when macros are very handy.
Year of publication: |
2014
|
---|---|
Authors: | Romascanu, Elena |
Published in: |
Romanian Statistical Review. - Institutul National de Statistica şi Studii Economice (INSSE). - Vol. 62.2014, 2, p. 129-146
|
Publisher: |
Institutul National de Statistica şi Studii Economice (INSSE) |
Subject: | Automated Edit Rules | Business Surveys | Missing Values | Multiple Imputation | Non-Response Weights | Pattern of Missing | Random vs. Systematic Errors | SPSS | SQL | Statistical software R |
Saved in:
freely available
Saved in favorites
Similar items by subject
-
A bayesian approach to parameter estimation in the presence of spatial missing data
Panzera, Domenica, (2016)
-
Evaluating the impact of a new product on the sales of other products
Vasilev, Julian Andreev, (2014)
-
Should a Normal Imputation Model be Modified to Impute Skewed Variables?
Hippel, Paul T. von, (2013)
- More ...