Semiparametric regression in Stata
Semiparametric regression deals with the introduction of some very general nonlinear functional forms in regression analyses. This class of regression models is generally used to fit a parametric model in which the functional form of a subset of the explanatory variables is not known and/or in which the distribution of the error term cannot be assumed of being of a specific type beforehand. To fix ideas, consider the partial linear model y = zb + f(x) + e, in which the shape of the potentially nonlinear function of predictor x is of particular interest. Two approaches to modeling f(x) are to use splines or fractional polynomials. This talk reviews other more general approaches, and the commands available in Stata to fit such models. The main topic of the talk will be partial linear regression models, with some brief discussion also of so-called single index and generalized additive models. Though several semiparametric regression methods have been proposed and developed in the literature, these are probably the most popular ones. The general idea of partial linear regression models is that a dependent variable is regressed on i) a set of explanatory variables entering the model linearly and ii) a set of variables entering the model nonlinearly but without assuming any specific functional form. Several estimators have been proposed in the literature and are available in Stata. For example, the semipar command makes available what is called the double residuals estimator introduced by Robinson (1988), which is consistent and efficient. Similarly, the plreg command fits an alternative difference-based estimator proposed by Yatchew (1998) that has similar statistical properties to Robinson’s estimator. These estimators will be briefly compared to identify some drawbacks and pitfalls of both methods. A natural concern of researchers is how these estimators could be modified to deal with heteroskedasticity, serial correlation, and endogeneity in cross-sectional data or how they could be adapted in the context of panel data to control for unobserved heterogeneity. As a consequence, a substantial part of the talk will be devoted to explaining i) how the plreg and semipar commands can be used to tackle these very common violations of the Gauss–Markov assumptions in cross-sectional data and ii) how the user-written xtsemipar command makes a semiparametric regression easy to fit in the context of panel data. Because it is sometimes possible to move toward pure parametric models, a test proposed by Hardle and Mammen (1993) and built to check whether the nonparametric fit can be satisfactorily approximated by a parametric polynomial adjustment of order p will be described.
Year of publication: |
2013-09-16
|
---|---|
Authors: | Verardi, Vincenzo |
Institutions: | Stata User Group |
Saved in:
Saved in favorites
Similar items by person
-
Verardi, Vincenzo, (2012)
-
Semiparametric regression in Stata
Verardi, Vincenzo, (2014)
-
Robust principal component analysis in Stata
Verardi, Vincenzo, (2009)
- More ...