Consistent selection of the actual model in regression analysis
In regression analysis, a best subset of regressors is usually selected by minimizing Mallows's C statistic or some other equivalent criterion, such as the Akaike lambda information criterion or cross-validation. It is known that the resulting procedure suffers from a lack of consistency that can lead to a model with too many variables. For this reason, corrections have been proposed that yield consistent procedures. The object of this paper is to show that these corrected criteria, although asymptotically consistent, are usually too conservative for finite sample sizes. The paper also proposes a new correction of Mallows's statistic that yields better results. A simulation study is conducted that shows that the proposed criterion performs well in a variety of situations.