In the canonical learning model, the multi-armed bandit with independent arms, a decision maker learns about the different alternatives only through his private experience. It is well known that any optimal experimentation strategy for this problem is ex-post inefficient: it sometimes leads the superior alternative to be dropped altogether. Many situations of interest, however, involve learning from individual experience and the experience of others. This paper shows how learning in society can overcome this inefficiency. We consider an economy populated with a continuum of infinitely lived agents where each one of them faces a multi-armed bandit. The unknown stochastic payoffs of each arm are the same for all agents. In each period, they are randomly and anonymously matched in pairs, and in any such match they observe their partner's current action choice and its outcome. We establish that if initial beliefs are sufficiently heterogeneous, then the fraction of the population choosing the superior arm converges to one in any perfect bayesian equilibrium of this game. We also show that the same conclusion holds when only action choices are observable within a match and the number of arms is two