Re[6]: [Snowball-discuss] an inconsistency with Russian stemmer

From: Andrew Aksyonoff (
Date: Sun Nov 18 2001 - 13:28:14 GMT

Hello Martin!

Sunday, November 18, 2001, 3:35:19 PM, you wrote:
MP> Andrew, if you are extending your stemmer to include diminutives ('ik',
MP> 'onok' etc) our stemmer definitions will probably diverge anyway, but it
MP> would be interesting to hear how you get on. I have tended to avoid endings
MP> of this type since in Dutch for example diminutives can radically affect
MP> meaning, in which case one does not want to remove them as part of an IR
MP> process. I don't know their significance in Russian, although I realise
MP> diminutives are used a lot with personal names.
As far as I can tell, longer diminutives (hooray, now I know the correct
English word!) such as "onok", "chok", "chek" and so on, do not affect meaning
in Russian that much (or at all). To my experience, removing some of the
longer and safer of these suffixes along with their respective cases
(eg: "onok", "onku", "onkom") just allows to account for the case, so,
say, "zaichonok" and "zaichonku" would reduce to the same stem without
affecting much else. So it's pretty safe to me.

- Andrew

Snowball-discuss mailing list

VirusChecked by the Incepta Group plc

This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST