[Snowball-discuss] Problems with the italian stemmer algorithm

From: pprett@sbox.tugraz.at
Date: Wed Sep 08 2004 - 15:13:55 BST


hello, as i have been trying to implement the italian stemmer algorithm
described in http://www.snowball.tartarus.org/italian/stemmer.html i faced some
minor differences between the output of the snowball implementation and mine.
all this mismatches are related with the "ici" "ico" "ice" suffixes.
for example the word "mediatrice" has the R2 region "rice"

so during step 1 there is a match in terms of the "ice" suffix and it is
deleted. So the stem of mediatrice is mediatr (my implementation)

but in the output file of the snowball implementation the word "mediatrice" gets
conflated to "mediatric" - so i'm currently a little confused - maybe i compute
the region R2 wrong or must there be a complete match between the region R2 and
the suffix in order to delete it?

thx & regards

peter



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST