Matching Bibliographic Data from Publication Lists with Large Databases Using N-Grams
This paper presents a text matching process for identification and correct assignment of scholarly publications, extracted from publication lists provided by authors or research institutes, in large bibliographic databases such as Thomson Reuters' Web of Science (WoS). An identification method is implemented by means of overlapping common 3-grams and the results are obtained from the match of the two sources according to the highest score of the applied cosine measure. Levenshtein similarities based on N-grams have been used to measure the closeness between the given CV publication and the retrieved best possible WoS match as a complementary and confirmatory measure. It is shown that the suggested method has an important potential on reducing the manual effort to find out whether a desired publication is indexed in WoS or not. The similarity scores derived by Levenshtein measure show consistency with those derived from Salton's similarity measure. Incorrect matches are examined in depth and possible thresholds are suggested to decrease the effort for manual cleaning
Year of publication: |
2014
|
---|---|
Authors: | Abdulhayoglu, Mehmet Ali |
Other Persons: | Thijs, Bart (contributor) ; Jeuris, Wouter (contributor) |
Publisher: |
[2014]: [S.l.] : SSRN |
Saved in:
freely available
Extent: | 1 Online-Ressource (29 p) |
---|---|
Type of publication: | Book / Working Paper |
Language: | English |
Notes: | Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments 2014 erstellt |
Other identifiers: | 10.2139/ssrn.2464065 [DOI] |
Source: | ECONIS - Online Catalogue of the ZBW |
Persistent link: https://www.econbiz.de/10013051195
Saved in favorites
Similar items by person
-
Matching bibliographic data from publication lists with large databases using N-grams
Abdulhayoglu, Mehmet Ali, (2014)
-
Enrichment of Bibliometric Databases by Assigning Region Information by Means of the Web
Abdulhayoglu, Mehmet Ali, (2014)
-
Enrichment of bibliometric databases by assigning region information by means of the web
Abdulhayoglu, Mehmet Ali, (2014)
- More ...