Extreme values and “robust” analysis of distributions
Distributive analysis typically consists in estimating summary measures capturing aspects of the distribution of sample points beyond central tendency. Stochastic dominance analysis is also central for comparisons of distributions. Unfortunately, data contamination, and extreme data more generally, are known to be highly influential in both types of analyses—much more so, than for central tendency analysis—and potentially jeopardize the validity of one’s conclusions even with relatively large sample sizes. This presentation illustrates the problems raised by extreme data in distributive analysis and describes robust parametric and semi-parametric approaches for addressing it. The methods are based on the use of “optimal B-robust” (OBRE) estimators, as an alternative to maximum likelihood. A prototype of Stata implementation of these estimators is described and empirical examples in income distribution analysis show how robust inequality estimates and dominance checks can be derived from these parametric or semiparametric models.