A STEMMING ALGORITHM FOR LATIN TEXT DATABASES
This paper describes the design of a stemming algorithm for searching databases of Latin text. The algorithm uses a simple longest‐match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries (one for nouns and adjectives and one for verbs) for processing query and database words. These dictionaries and the associated stemming rules are arranged in such a way that the stemmer does not need to know the grammatical category of the word that is being stemmed. It is very easy to overstem in Latin: the stemmer developed here tends, rather, towards understemming, leaving sufficient grammatical information attached to the stems resulting from its use to enable users to pursue very specific searches for single grammatical forms of individual words.
Year of publication: |
1996
|
---|---|
Authors: | SCHINKE, ROBYN ; GREENGRASS, MARK ; ROBERTSON, ALEXANDER M. ; WILLETT, PETER |
Published in: |
Journal of Documentation. - MCB UP Ltd, ISSN 1758-7379, ZDB-ID 1479864-5. - Vol. 52.1996, 2, p. 172-187
|
Publisher: |
MCB UP Ltd |
Saved in:
Saved in favorites
Similar items by person
-
GENERATION OF EQUIFREQUENT GROUPS OF WORDS USING A GENETIC ALGORITHM
ROBERTSON, ALEXANDER M., (1994)
-
ROBERTSON, ALEXANDER M., (1996)
-
Applications of n ‐grams in textual information systems
Robertson, Alexander M., (1998)
- More ...