Acora : Distribution-Based Aggregation for Relational Learning from Identifier Attributes
Feature construction through aggregation plays an essential role in modeling relationaldomains with one-to-many relationships between tables. One-to-many relationshipslead to bags (multisets) of related entities, from which predictive informationmust be captured. This paper focuses on aggregation from categorical attributesthat can take many values (e.g., object identifiers). We present a novel aggregationmethod as part of a relational learning system ACORA, that combines the use ofvector distance and meta-data about the class-conditional distributions of attributevalues. We provide a theoretical foundation for this approach deriving a quot;relationalfixed-effectquot; model within a Bayesian framework, and discuss the implications ofidentifier aggregation on the expressive power of the induced model. One advantageof using identifier attributes is the circumvention of limitations caused either bymissing/unobserved object properties or by independence assumptions. Finally, weshow empirically that the novel aggregators can generalize in the presence of identi-fier (and other high-dimensional) attributes, and also explore the limitations of theapplicability of the methods