Re: [Snowball-discuss] Personal pronoun "his" in Snowball EnglishStemmer

From: Martin Porter (martin.porter@grapeshot.co.uk)
Date: Sun May 22 2005 - 14:13:00 BST


Steve,

You should switch from the Porter to the Porter2 stemmer. If you look at the
page on this stemmer there are clear guidelines about extending the
exclusion list of words. You will also find that Porter2 does not remove the
final "s" from "his".

(Your "EnglishStemmer" should equate with Porter2, if the naming conventions
I set up are being followed.)

The way phrase retrieval combines with stemming obviously depends on the
underlying IR model that is being used, and I'm not quite sure what your
assumptions are here. In Xapian for example, a phrase can generate a
structure of terms, each of which might be stemmed or unstemmed,

    "his palms" --> PHRASE ---+--- 'hi'
                                   |
                                   +--- 'palm'

etc.

Martin



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:47 BST