[Snowball-discuss] Question about Porter2 Step 4

From: Håvard Lindset (lindset@webpixels.net)
Date: Sun Oct 19 2003 - 22:48:01 BST


Hi folks,

This is what the Porter 2 definition
(http://snowball.tartarus.org/english/stemmer.html) has to say about a part
of Step 4:

> Search for the longest among the following suffixes, and,
> if found and in R2, perform the action indicated.
>
> ... (removed the non-relevant part of step 4)
>
> ion
> delete if preceded by s or t"

When I feed the word "unquestionably" to my stemmer, it returns "unquest",
while the provided sample list of stemmed words shows the word being stemmed
to "unquestion" (and so does
http://snowball.tartarus.org/demo.php?words=unquestionably)

When step 4 kicks in, this is what the word looks like:

  u n q u e s t i o n
     | |
     | R2------
     R1--------------

According to the Porter2 definition described on the site, ion should be
removed because it's preceded a "t", and "ion" is located in R2

Has the step 4 rules been changed, or has the provided dictionary/stemmed
list (and demo) not been updated for the Porter2 method? What should I do?

Thanks

Best regards,
Håvard Lindset



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:45 BST