Data-based interval estimation of classification error rates

Leave-one-out and 632 bootstrap are popular data-based methods of estimating the true error rate of a classification rule, but practical applications almost exclusively quote only point estimates. Interval estimation would provide better assessment of the future performance of the rule, but little has been published on this topic. We first review general-purpose jackknife and bootstrap methodology that can be used in conjunction with leave-one-out estimates to provide prediction intervals for true error rates of classification rules. Monte Carlo simulation is then used to investigate coverage rates of the resulting intervals for normal data, but the results are disappointing; standard intervals show considerable overinclusion, intervals based on Edgeworth approximations or random weighting do not perform well, and while a bootstrap approach provides intervals with coverage rates closer to the nominal ones there is still marked underinclusion. We then turn to intervals constructed from 632 bootstrap estimates, and show that much better results are obtained. Although there is now some overinclusion, particularly for large training samples, the actual coverage rates are sufficiently close to the nominal rates for the method to be recommended. An application to real data illustrates the considerable variability that can arise in practical estimation of error rates.

MoreLess

Year of publication:	2001
Authors:	Krzanowski, W. J.
Published in:	Journal of Applied Statistics. - Taylor & Francis Journals, ISSN 0266-4763. - Vol. 28.2001, 5, p. 585-595
Publisher:	Taylor & Francis Journals

More details

Extent:	text/html
Type of publication:	Article
Source:	RePEc - Research Papers in Economics

Persistent link: https://www.econbiz.de/10005279017