Association pattern discovery via theme dictionary models
type="main" xml:id="rssb12032-abs-0001"> <title type="main">Summary</title> <p>Discovering patterns from a set of text or, more generally, categorical data is an important problem in many disciplines such as biomedical research, linguistics, artificial intelligence and sociology. We consider here the well-known ‘market basket’ problem that is often discussed in the data mining community, and is also quite ubiquitous in biomedical research. The data under consideration are a set of ‘baskets’, where each basket contains a list of ‘items’. Our goal is to discover ‘themes’, which are defined as subsets of items that tend to co-occur in a basket. We describe a generative model, i.e. the theme dictionary model, for such data structures and describe two likelihood-based methods to infer themes that are hidden in a collection of baskets. We also propose a novel sequential Monte Carlo method to overcome computational challenges. Using both simulation studies and real applications, we demonstrate that the new approach proposed is significantly more powerful than existing methods, such as association rule mining and topic modelling, in detecting weak and subtle interactions in the data.
Year of publication: |
2014
|
---|---|
Authors: | Deng, Ke ; Geng, Zhi ; Liu, Jun S. |
Published in: |
Journal of the Royal Statistical Society Series B. - Royal Statistical Society - RSS, ISSN 1369-7412. - Vol. 76.2014, 2, p. 319-347
|
Publisher: |
Royal Statistical Society - RSS |
Saved in:
Saved in favorites
Similar items by person
-
Bayesian Aggregation of Order-Based Rank Data
Deng, Ke, (2014)
-
Research and practice in data quality
Sadiq, Shazia, (2008)
-
Effect of node deleting on network structure
Deng, Ke, (2007)
- More ...