[Snowball-discuss] Some possible improvements in English

From: Tolkin, Steve (Steve.Tolkin@FMR.COM)
Date: Tue Nov 20 2001 - 13:57:18 GMT

Dear Martin,

Thanks so much for creating snowball and having it be open source!

On http://snowball.sourceforge.net/english/stemmer.html
you said: "Incidentally, this illustrates how much feedback to expect from
real users of a stemming algorithm: five words in twenty years!"

I never knew you were soliciting feedback.

Here are a few quick suggestions. (More later if I ghet the time.)

1. Need to specially handle certain words that end in "s"; but which are
atlas -> atla # But want it to be atlas, to conflate with atlases
cosmos -> cosmo # bad; cosmo probably a search for Cosmopolitan magazine.

2. Certain wrods that end -ive but whose stem is a common word.
These are likely to decrease precision
respective -> respect
productive -> product
conductive -> conduct
possessive -> possess

I think it would be better to have the -ivity form conflate with -ive
for these, but not reduce all the way.
Hopefully helpfully yours,

Steven Tolkin          steve.tolkin@fmr.com      617-563-0516 
Fidelity Investments   82 Devonshire St. V1D     Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.

_______________________________________________ Snowball-discuss mailing list Snowball-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/snowball-discuss

_____________________________________________________________________ VirusChecked by the Incepta Group plc _____________________________________________________________________

This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST