Identifying Emerging Topics and Content Change from Evolving Document Sets
Document sets where the content is evolving frequently occur often in organizations. It is common for oranizations to update the policy documents periodically and for a news story to evolve over a period of time. When a document set evolves, some of the old content may remain unchanged while some other new content may be added. Depending on the amount of changes, users may need to read and/or analyze the new content once again. Evolving content may make it hard for users to track the changes and understand the global view of the change. In this paper, we consider document sets consisting of documents published at two different points of time and develop a measure to capture the change in content between the documents published at two different time points. We divide a document set into two subsets – a subset of documents containing documents published at an earlier date and another subset containing documents published at a later date. We use Latent Dirichlet Allocation to extract a topic and word distributions for each of the two subsets of the document set. We then compute similarity of the set of topics computed for each subset to measure the amount of change in the content. We study the effectiveness of the method on two data sets – a set of privacy policy documents and a set of Reuters news articles extracted from the TDT-Pilot Corpus and present the experimental results.
Year of publication: |
2017
|
---|---|
Authors: | Chundi, Parvathi |
Published in: |
International Journal of Knowledge-Based Organizations (IJKBO). - IGI Global, ISSN 2155-6407, ZDB-ID 2703517-7. - Vol. 7.2017, 4 (01.10.), p. 1-18
|
Publisher: |
IGI Global |
Subject: | Context Similarity | Evolutionary Change | Latent Dirichlet Allocation | Topic Models | Vector Similarity |
Saved in:
Online Resource
Saved in favorites
Similar items by subject
-
Model-based Purchase Predictions for Large Assortments
Donkers, Bas, (2015)
-
Economic history goes digital: Topic modeling the Journal of Economic History
Wehrheim, Lino, (2017)
-
Cross-corpora comparisons of topics and topic trends
Bystrov, Victor, (2022)
- More ...