From entity extraction to network analysis: a method and an application to a Portuguese textual source
This paper reports advances in the entity extraction task (named entity identification) of a text mining process that aims at unveiling non-trivial semantic structures, such as relationships and interaction between entities or communities. We proposed a 3-phase method that is applicable to the Portuguese language and potentially applicable to other languages as well. The method relies on flexible pattern matching, part-of-speech tagging, lexical-based rules and distance-based entity name merging. All steps are implemented using free software and taking advantage of various existing packages. Evaluation of the efficacy of the entity extraction method on part of a book written in portuguese indicates improved F1 results. For further evaluation and illustration of the usefulness of the proposed method, it is applied to a book on Freemasonry and observe the differences in the entity word clouds produced. We also define a social network of named entities solely from information contained in the book and extract structural insights that reveal connections, relationships and communities between entities.
Year of publication: |
2014-11
|
---|---|
Authors: | Rocha, Conceição ; Jorge, Alípio Mário ; Oliveira, Márcia ; Brito, Paula ; Gama, João ; Pimenta, Carlos |
Institutions: | Faculdade de Economia, Universidade do Porto ; OBEGEF - Observatório de Economia e Gestão de Fraude |
Saved in:
Saved in favorites
Similar items by person
-
Publicidade exterior (outdoor) ilegal
Pimenta, Carlos, (2013)
-
Esboço da quantificação da fraude em Portugal
Pimenta, Carlos, (2009)
-
Notes on the epistemology of fraud
Pimenta, Carlos, (2012)
- More ...