[Snowball-discuss] Personal pronoun "his" in Snowball EnglishStemmer

From: Steve Legrand (steveleg@hotmail.com)
Date: Sat May 21 2005 - 14:59:12 BST


Is there a module in the Snowball stemmer by which I could exclude certain
words from the stemming process? I am using the Java version and get the
word "his" indexed as "hi". "Him" and "he" are indexed as such with no
changes. I know the algorithm tries to optimize between various things and
the stemmed words do not always make sense outside the indexing process.
This, however, prevents me from retrieving phrases such as "his palm".
Instead I use the phrase "hi palm" for the retrieval. In future, I will
probably have a larger group of normal English words I need to keep in their
original form in the index. For this reason I would like to know whether
there is a module in Snowball I could tweak to exclude certain words from
the stemming process, or would it be better to tweak the words before
entering them to the stemmer? I need to code this in.

Cheerio,
Steve Legrand

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:47 BST