An integer linear programming approach for approximate string comparison
We introduce a problem called maximum common characters in blocks (MCCB), which arises in applications of approximate string comparison, particularly in the unification of possibly erroneous textual data coming from different sources. We show that this problem is NP-complete, but can nevertheless be solved satisfactorily using integer linear programming for instances of practical interest. Two integer linear formulations are proposed and compared in terms of their linear relaxations. We also compare the results of the approximate matching with other known measures such as the Levenshtein (edit) distance.
Year of publication: |
2009
|
---|---|
Authors: | Ritt, Marcus ; Costa, Alysson M. ; Mergen, Sergio ; Orengo, Viviane M. |
Published in: |
European Journal of Operational Research. - Elsevier, ISSN 0377-2217. - Vol. 198.2009, 3, p. 706-714
|
Publisher: |
Elsevier |
Keywords: | Approximate string matching Distance metric Block edits Block moves Integer programming Hop constraints |
Saved in:
Online Resource
Saved in favorites
Similar items by person
-
An integer linear programming approach for approximate string comparison
Ritt, Marcus, (2009)
-
An integer linear programming approach for approximate string comparison
Ritt, Marcus, (2009)
-
Simple heuristics for the assembly line worker assignment and balancing problem
Moreira, Mayron César O., (2012)
- More ...