Co-Training and Expansion: Towards Bridging Theory and Practice
Co-training is a method for combining labeled and unlabeled data when examples can be thought of as containing two distinct sets of features. It has had a number of practical successes, yet previous theoretical analyses have needed very strong assumptions on the data that are unlikely to be satisfied in practice.In this paper, we propose a much weaker “expansion” assumption on the underlying data distribution, that we prove is sufficient for iterative cotraining to succeed given appropriately strong PAC-learning algorithms on each feature set, and that to some extent is necessary as well. This expansion assumption in fact motivates the iterative nature of the original co-training algorithm, unlike stronger assumptions (such as independence given the label) that allow a simpler one-shot co-training to succeed. We also heuristically analyze the effect on performance of noise in the data. Predicted behavior is qualitatively matched in synthetic experiments on expander graphs.
Year of publication: |
2004-12-01
|
---|---|
Authors: | Balcan, Maria-Florina ; Blum, Avrim ; Ke, Yang |
Publisher: |
Research Showcase |
Saved in:
Saved in favorites
Similar items by person
-
A theory of loss-leaders : making money by pricing below cost
Balcan, Maria-Florina, (2007)
-
Ignorance is almost bliss : near-optimal stochastic matching with few queries
Blum, Avrim, (2020)
-
Shao, Wei, (2021)
- More ...