[Snowball-discuss] Questions about english stemmer & the apostrophe

From: Neal Richter (nealr@rightnow.com)
Date: Thu Feb 26 2004 - 22:50:02 GMT


I'm sure this has been discussed before... I tried a google search on the
snowball-discuss archive with no luck.

Is there a rationale for behavior below on words with the apostrophe?

bagpipe -> bagpip
bagpipe's -> bagpipe'
bagpipes -> bagpip

bakeries -> bakeri
bakeries' -> bakeries'
bakery -> bakeri
bakery's -> bakery'
bakerys -> bakeri //This isn't a word - but the form is OK sometimes.

I looked at several older versions of various (porter derived) english
stemmers, all have this behavior.

One could argue that when the apostrophe is used an IR application would
want to preserve the original noun. Apostrophes are used to denote
possession by an entity, and the generalization of stemming 'bakery's ->
bakeri' would be inappropriate.

Since stemming is used to generalize word forms... you could also
argue that the possessive form should be generalized as well.



Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485

This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST