Re: [Snowball-discuss] Polish stemmer?

From: Dawid Weiss (dawid.weiss@cs.put.poznan.pl)
Date: Wed Aug 29 2007 - 16:50:30 BST


Ok, maybe that was a bit of an overstatement -- I don't think Polish is _much_
more complex compared to Russian (don't know about Finnish). It's just my gut
feeling that rule-based stemmers don't work too well for Polish (quite many
combinations at the morphology level). Now, having said that the Morfologik
stemmer I mentioned is built using inflected-form-generation rules (from base
forms), so it should be possible to reuse this knowledge somehow if one wanted
to create a Snowball stemmer. If you're willing to undertake such effort,
Agnieszka, don't let anyone discourage you (and in particular don't let me
discourage you).

I would be actually very curious about the level of quality such a stemmer can
achieve (manually constructed rules). I know for a fact a number of people would
benefit from it.

Dawid

Martin Porter wrote:
> On Wed, 2007-08-29 at 08:16 +0200, Dawid Weiss wrote:
>> Hi Agnieszka,
>>
>> (I am not a snowball developer, but...) It won't be easy to handle the
>> complexity of Polish in a set of Snowball rules.
>
> Dawid,
>
> Do you have any strong evidence for that? I would not have thought
> Polish was more complex than Finnish, or Russian for example.
>
> Martin
>



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:49 BST