RE: [Snowball-discuss] problems using the English stemmer in java

From: Martin Porter (
Date: Sun Aug 29 2004 - 16:23:35 BST


>is there a way that u know of to get the proper english word that results
from the generated stem ?

You need to use a complete English vocabulary (assuming the language of
application is English). For each word in the vocab, find the stem,


This gives a file that can be inverted,


There will be >=1 stemmed forms for a given stem:


('horse' can be a verb: to horse around etc). Choose the shortest:


This gives a mapping of stemmed form to real word, which can be used to
reconstruct a proper English word from a stemmed form.

There are several word lists of English available on the Internet. See for

-- Martin

This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST