A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers
This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives. We consider the setting where the number of features (weak learners) p scales with the sample size n, in an over-parametrized regime. Under a broad class of statistical models, we provide an exact analysis of the generalization error of boosting, when the algorithm interpolates the training data and maximizes the empirical L1-margin. The relation between the boosting test error and the optimal Bayes error is pinned down explicitly. In turn, these precise characterizations resolve several open questions raised in [15, 81] surrounding boosting. On the computational front, we provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Furthermore, we discover that the larger the overparametrization ratio p/n, the smaller the proportion of active features (with zero initialization), and the faster the optimization reaches interpolation. At the heart of our theory lies an in-depth study of the maximum L1-margin, which can be accurately described by a new system of non-linear equations; we analyze this margin and the properties of this system, using Gaussian comparison techniques and a novel uniform deviation argument. Variants of AdaBoost corresponding to general Lq geometry, for q > 1, are also presented, together with an exact analysis of the high-dimensional generalization and optimization behavior of a class of these algorithms
Year of publication: |
2020
|
---|---|
Authors: | Liang, Tengyuan ; Sur, Pragya |
Publisher: |
[S.l.] : SSRN |
Saved in:
freely available
Saved in favorites
Similar items by person
-
Modeling bimodal discrete data using Conway-Maxwell-Poisson mixture models
Sur, Pragya, (2015)
-
Deep learning for individual heterogeneity: An automatic inference framework
Farrell, Max H., (2021)
-
Statistical inference for the population landscape via moment‐adjusted stochastic gradients
Liang, Tengyuan, (2019)
- More ...