Co-Training and Expansion: Towards Bridging Theory and Practice

Co-training is a method for combining labeled and unlabeled data when examples can be thought of as containing two distinct sets of features. It has had a number of practical successes, yet previous theoretical analyses have needed very strong assumptions on the data that are unlikely to be satisfied in practice.In this paper, we propose a much weaker “expansion” assumption on the underlying data distribution, that we prove is sufficient for iterative cotraining to succeed given appropriately strong PAC-learning algorithms on each feature set, and that to some extent is necessary as well. This expansion assumption in fact motivates the iterative nature of the original co-training algorithm, unlike stronger assumptions (such as independence given the label) that allow a simpler one-shot co-training to succeed. We also heuristically analyze the effect on performance of noise in the data. Predicted behavior is qualitatively matched in synthetic experiments on expander graphs.

MoreLess

Year of publication:	2004-12-01
Authors:	Balcan, Maria-Florina ; Blum, Avrim ; Ke, Yang
Publisher:	Research Showcase

More details

Type of publication:	Other
Notes:	Computer Science Department
Source:	BASE

Persistent link: https://www.econbiz.de/10009441186