RE: [Snowball-discuss] problems using the English stemmer in java

From: Martin Porter (martin.porter@grapeshot.co.uk)
Date: Sun Aug 29 2004 - 16:23:35 BST


Shai,

>is there a way that u know of to get the proper english word that results
from the generated stem ?

You need to use a complete English vocabulary (assuming the language of
application is English). For each word in the vocab, find the stem,

    horses->hors

This gives a file that can be inverted,

    hors->horses

There will be >=1 stemmed forms for a given stem:

    hors->horse
    hors->horses
    hors->horsed
    hors->horsing

('horse' can be a verb: to horse around etc). Choose the shortest:

    hors->horse

This gives a mapping of stemmed form to real word, which can be used to
reconstruct a proper English word from a stemmed form.

There are several word lists of English available on the Internet. See for
example,

http://www.gtoal.com/wordgames/yawl/word.list

-- Martin



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST