Authorship Attribution of Noisy Text Data With a Comparative Study of Clustering Methods
Through the fast development and intensification of the large volume of data via the internet, visual analytics (VA) comes out with the intention of visualizing multidimensional data in different ways, which reveals interesting information about the data, making them clearer and more intelligible. In this investigation, the authors focused on the VA based Authorship Attribution (AA) task, applied on noisy text data. Furthermore, this article proposes 3D Visual Analytics technique based on sphere implementation. The used dataset contains several text documents written by 5 American Philosophers, with an average length of 850 words per text, which were scanned and then corrupted with different noise levels. The obtained results show that the hierarchical clustering technique using a fully-automated threshold, presents high performance in terms of authorship attribution accuracy, especially with character trigrams and ending bigrams, where the clustering recognition rate (CRR) reaches an accuracy of 100% at noise levels: from 0% to 7%. In addition, the proposed 3D sphere technique appears quite interesting by showing high clustering performances, mainly with Words.
Year of publication: |
2018
|
---|---|
Authors: | Hamadache, Zohra ; Sayoud, Halim |
Published in: |
International Journal of Knowledge and Systems Science (IJKSS). - IGI Global, ISSN 1947-8216, ZDB-ID 2703502-5. - Vol. 9.2018, 2 (01.04.), p. 45-69
|
Publisher: |
IGI Global |
Subject: | 3D Sphere Visualisation | Artificial Intelligence | Authorship Attribution | Clustering | GMM | Noisy Text | Visual Analytics |
Saved in:
Saved in favorites
Similar items by subject
-
Using deep learning and visual analytics to explore hotel reviews and responses
Chang, Yung-Chun, (2020)
-
Visual analytics for innovation and R&D intelligence
Basole, Rahul C., (2023)
-
Clustering for multi-dimensional heterogeneity with an application to production function estimation
Cheng, Xu, (2023)
- More ...
Similar items by person