Thanks so much for creating snowball and having it be open source!
you said: "Incidentally, this illustrates how much feedback to expect from
real users of a stemming algorithm: five words in twenty years!"
I never knew you were soliciting feedback.
Here are a few quick suggestions. (More later if I ghet the time.)
1. Need to specially handle certain words that end in "s"; but which are
atlas -> atla # But want it to be atlas, to conflate with atlases
cosmos -> cosmo # bad; cosmo probably a search for Cosmopolitan magazine.
2. Certain wrods that end -ive but whose stem is a common word.
These are likely to decrease precision
respective -> respect
productive -> product
conductive -> conduct
possessive -> possess
I think it would be better to have the -ivity form conflate with -ive
for these, but not reduce all the way.
Hopefully helpfully yours,
-- Steven Tolkin email@example.com 617-563-0516 Fidelity Investments 82 Devonshire St. V1D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates.
_______________________________________________ Snowball-discuss mailing list Snowballfirstname.lastname@example.org https://lists.sourceforge.net/lists/listinfo/snowball-discuss
_____________________________________________________________________ VirusChecked by the Incepta Group plc _____________________________________________________________________
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST