sDTM : A Supervised Bayesian Deep Topic Model for Text Analytics
Topic modeling methods such as Latent Dirichlet Allocation (LDA) are powerful tools for analyzing massive amounts of textual data. They have been extensively used in information systems and management research to identify latent topics for data exploration and as a feature-engineering mechanism to derive new variables for additional analyses. However, existing topic modeling approaches are mostly unsupervised and only leverage textual data, while ignoring additional useful information often associated with text, such as star ratings in customer reviews or categories of comments in online discussion forums. As a result, the topics extracted and new variables derived based on the learned topic vectors may not be accurate, which could lead to biased or incorrect estimation for subsequent econometric analysis and inferior performance for predictive tasks. In response, we propose a novel supervised topic modeling approach called sDTM that is designed in a Bayesian deep learning manner while incorporating additional useful data. sDTM offers three key advantages over traditional topic modeling approaches. First, it learns high-quality topics as measured quantitatively and qualitatively, which can help alleviate concerns over potential measurement errors in econometric analysis. Second, this supervised learning model achieves significantly superior predictive performance over cutting-edge baselines. Finally, sDTM is able to highlight those words that have stronger impact on the outcome, thereby facilitating transparent model investigation. Experimental results on three datasets show that sDTM not only improves supervised learning tasks, including classification and regression, but also exhibits a better model fit (e.g., lower perplexity) for document understanding. sDTM makes methodological contributions to the IS and management literature and has direct relevance for research using big data analytics
Year of publication: |
2020
|
---|---|
Authors: | Yang, Yi |
Other Persons: | Zhang, Kunpeng (contributor) |
Publisher: |
[2020]: [S.l.] : SSRN |
Subject: | Bayes-Statistik | Bayesian inference | Theorie | Theory | Text |
Saved in:
freely available
Extent: | 1 Online-Ressource (36 p) |
---|---|
Type of publication: | Book / Working Paper |
Language: | English |
Notes: | Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments May 20, 2020 erstellt |
Other identifiers: | 10.2139/ssrn.3612168 [DOI] |
Source: | ECONIS - Online Catalogue of the ZBW |
Persistent link: https://www.econbiz.de/10012832848
Saved in favorites
Similar items by subject
-
Thöni, Andreas, (2018)
-
Eignung von Algorithmen zur Bereitstellung unstrukturierter Daten im Rahmen der Textklassifikation
Klapdor, Marius, (2005)
-
sDTM: a supervised Bayesian deep topic model for text analytics
Yang, Yi, (2023)
- More ...
Similar items by person