[Snowball-discuss] Problems with step 5 in the Porter2 algorithm

From: Håvard Lindset (lindset@webpixels.net)
Date: Sat Oct 18 2003 - 13:23:02 BST


Hi all,

This isn't really a question about Snowball, but a question about Step 5 in
the Porter2 algorithm. (I'm writing a stemmer in PHP)

"e
     delete if in R2, or in R1 and not preceded by a short syllable"

Should I check just DIRECTLY in front of the ending e, or shouldn't there be
ANY short syllables at all in the word before the ending e?

If anyone could clarify when to remove the e, it would be mostly appreciated
:) Right now I'm finding that I'm either removing too many e's or I'm
removing too few e's

I'm using Perl Compatible Regular Expressions for most of the stemmer stuff,
so if any of you have a PCRE pattern that does what I want to, I'd love to
see it :)

Thanks!

Best regards,
Håvard Lindset



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:45 BST