Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap

We consider the accuracy estimation of a classifier constructed on a given training sample. The naive resubstitution estimate is known to have a downward bias problem. The traditional approach to tackling this bias problem is cross-validation. The bootstrap is another way to bring down the high variability of cross-validation. But a direct comparison of the two estimators, cross-validation and bootstrap, is not fair because the latter estimator requires much heavier computation. We performed an empirical study to compare theÂ .632+Â bootstrap estimator with the repeated 10-fold cross-validation and the repeated one-third holdout estimator. All the estimators were set to require about the same amount of computation. In the simulation study, the repeated 10-fold cross-validation estimator was found to have better performance than theÂ .632+Â bootstrap estimator when the classifier is highly adaptive to the training sample. We have also found that theÂ .632+Â bootstrap estimator suffers from a bias problem for large samples as well as for small samples.

MoreLess

Year of publication:	2009
Authors:	Kim, Ji-Hyun
Published in:	Computational Statistics & Data Analysis. - Elsevier, ISSN 0167-9473. - Vol. 53.2009, 11, p. 3735-3745
Publisher:	Elsevier

Online Resource

Check full text access |

More access options

Check Google Scholar

In libraries world-wide (WorldCat)

In German libraries (KVK)

subito order

I need help

More details

Type of publication:	Article
Source:	RePEc - Research Papers in Economics

Persistent link: https://www.econbiz.de/10005005988