Dear Martin,
Thanks so much for creating snowball and having it be open source!
On http://snowball.sourceforge.net/english/stemmer.html
you said: "Incidentally, this illustrates how much feedback to expect from
the
real users of a stemming algorithm: five words in twenty years!"
I never knew you were soliciting feedback.
Here are a few quick suggestions. (More later if I ghet the time.)
1. Need to specially handle certain words that end in "s"; but which are
singular.
Example:
atlas -> atla # But want it to be atlas, to conflate with atlases
cosmos -> cosmo # bad; cosmo probably a search for Cosmopolitan magazine.
2. Certain wrods that end -ive but whose stem is a common word.
These are likely to decrease precision
Example:
respective -> respect
productive -> product
conductive -> conduct
possessive -> possess
I think it would be better to have the -ivity form conflate with -ive
for these, but not reduce all the way.
Hopefully helpfully yours,
Steve
-- Steven Tolkin steve.tolkin@fmr.com 617-563-0516 Fidelity Investments 82 Devonshire St. V1D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates._______________________________________________ Snowball-discuss mailing list Snowball-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/snowball-discuss
_____________________________________________________________________ VirusChecked by the Incepta Group plc _____________________________________________________________________
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST