Researchers and practitioners increasingly are gaining access to data onexplicit social networks. For example, telecommunications and technologyfirms record data on consumer networks (via phone calls, emails,voice-over-IP, instant messaging), and social-network portal sites suchas MySpace, Friendster and Facebook record consumer-generated data onsocial networks. Inference for fraud detection [5, 3, 8], marketing [9],and other tasks can be improved with learned models that take socialnetworks into account and with collective inference [12], which allowsinferences about nodes in the network to affect each other. However,these socialnetwork graphs can be huge, comprising millions to billionsof nodes and one or two orders of magnitude more links. This paperstudies the application of collective inference to improve predictionover a massive graph. Faced initially with a social network comprisinghundreds of millions of nodes and a few billion edges, our goal is: toproduce an approximate consumer network that is orders of magnitudesmaller, but still facilitates improved performance via collectiveinference. We introduce a sampling technique designed to reduce the sizeof the network by many orders of magnitude, but to keep linkages thatfacilitate improved prediction via collective inference. In short, thesampling scheme operates as follows: (1) choose a set of nodes ofinterest; (2) then, in analogy to snowball sampling [14], grow localgraphs around these nodes, adding their social networks, theirneighbors’ social networks, and so on; (3) next, prune these localgraphs of edges which are expected to contribute little to thecollective inference; (4) finally, connect the local graphs together toform a graph with (hopefully) useful inference connectivity. We applythis sampling method to assess whether collective inference can improvelearned targeted-marketing models for a social network of consumers oftelecommunication services. Prior work [9] has shown improvement to thelearning of targeting models by including social-neighborhoodinformation—in particular, information on existing customers inthe immediate social network of a potential target. However, theimprovement was restricted to the “network neighbors”, thosetargets linked to a prior customer thought to be good candidates for thenew service. Collective inference techniques may extend the predictiveinfluence of existing customers beyond their immediate neighborhoods.For the present work, our motivating conjecture has been that thisinfluence can improve prediction for consumers who are not stronglyconnected to existing customers. Our results show that this is indeedthe case: collective inference on the approximate network enablessignificantly improved predictive performance for non-network-neighborconsumers, and for consumers who have few links to existing customers.In the rest of this extended abstract we motivate our approach, describeour sampling method, present results on applying our approach to a largereal-world target marketing campaign in the telecommunications industry,and finally discuss our findings.