Re: Re[4]: [Snowball-discuss] an inconsistency with Russian stemmer

From: Martin Porter (martin_porter@softhome.net)
Date: Sun Nov 18 2001 - 12:35:19 GMT


Andrew Aksyonoff has spotted something wrong with the definition of the
Russian stemmer, so I am putting in a new definition of a slightly modified
algorithm.

The new definition is shorter and simpler, the snowball script is slightly
shorter, and I think more natural, and the small number of words in the
vocabulary which are affected by the change stem better than they did before.

The essential change is that the adjective ending test always precedes the
verb ending test, which has come about through removal of the 'verbal' test
where it was done the other way round.

I got a bit concerned about the removal of the reflexive endings not being
in the context of the preceding ending (si^a is supposed to follow consonant
and s' to follow vowel), but careful study of the vocabulary suggests that
it does not matter, or at least does not matter very much, so I am leaving
that alone.

I will update the website with this change shortly. There are a number of
other changes to go in so I'm not sure it will be today.

Martin

Andrew, if you are extending your stemmer to include diminutives ('ik',
'onok' etc) our stemmer definitions will probably diverge anyway, but it
would be interesting to hear how you get on. I have tended to avoid endings
of this type since in Dutch for example diminutives can radically affect
meaning, in which case one does not want to remove them as part of an IR
process. I don't know their significance in Russian, although I realise
diminutives are used a lot with personal names.

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss

_____________________________________________________________________
VirusChecked by the Incepta Group plc
_____________________________________________________________________



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST