[Snowball-discuss] Re: stemming addition

From: Martin Porter (martin_porter@SoftHome.net)
Date: Fri May 30 2003 - 10:39:01 BST


Michael,

I think -ist was omitted from the original algorithm mainly because few
words in the test vocabulary had that ending. Looking at the evidence again,
it could I think be usefully added in the way you recommend. I regard the
original stemmer as "frozen", but as you probably know, there is a more
developed one at http://snowball.tartarus.org/english/stemmer.html, which
also lacks -ist, and which I may add in. (I am planning some revisions to
the stemmer - perhaps later this year.)

Thanks for your interest and help,

Martin

 
At 16:51 21/05/2003 -0400, Michael Holmes wrote:
>Mr. Porter,
>
>The addition of
>
>case 't': if (ends("\03" "ist")) { r("\00" ""); break; }
>
>in step 3 of your algorithm (C version) allows it to handle 'economist',
>'archaeologist', etc. without messing up 'gist', 'fist', and the like.
>Do you see any problems with that inclusion?
>
>Michael Holmes
>Georgia Institute of Technology
>mph@cc.gatech.edu
>
>



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:44 BST