[Snowball-discuss] Re: Italian Stemmer with C#

From: Martin Porter (martin.porter@grapeshot.co.uk)
Date: Thu Sep 01 2005 - 17:15:21 BST


Federico,

The differences you note are because alzare is a verb with a very short stem
- alz - and my Italian stemmer demands a longer stem length before it takes
anything off. So the difference must be in determining R1 and R2.

Short verb stem are a problem for the stemmers in the romance languages:
rier in French, orare in Italian etc.

If you believe you are getting better overall results with a different
measure of R1 and R2, let me know the rules you are using!

Martin

------------------

>Dear Mr. Porter,
> I've found "Snowball" page during my search in the
>internet about available stemmer softwares.
>I've created (starting from a German version program
>on http://www.codeproject.com/csharp/destemming.asp)
>an Italian version using your rules describe in the
>page on italian language.
>
>After some tests between my code and your snowball
>results on the italian languages, I noticed some
>differences. There can be a little mismatch into the
>code (in mine program or in snowball)?
>
>For example:
> alzandogli: for snowball became alzandogl, mine
>translate into alzand
> alzarla: for snowball became alzarl, mine translate
>into alzar
> alzarsi: for snowball became alzars, mine translate
>into alzar
> alzargli: for snowball became alzargl, mine
>translate into alzar
>
>And many others...
>Which version is the correct one?
>
>Thanks for your reply and Best regards,
> Federico Pieri



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:47 BST