Re[4]: [Snowball-discuss] an inconsistency with Russian stemmer

From: Andrew Aksyonoff (
Date: Sat Nov 17 2001 - 10:05:40 GMT

Hello Martin!

Friday, November 16, 2001, 6:59:49 PM, you wrote:
MP> You are right, there is a problem with the algorithm definition: if a
MP> reflexive ending is found in the 'verbal' test it is removed anyway,

MP> Does that issue solve all outstanding problems?
Most of them, as Step 2 which I changed to eliminate both
trailing "i" and "i'" generates only a few differences with
my program and output.txt.

MP> It is nice of you to do the work to find this error, but the
MP> program you have written will I imagine run slower than the
MP> Snowball one, since you do sequential testing for the
MP> endings, not binary chop.
In its current form it is expected to be slow, as this
version was written to have clear reference implementation.
I'll be optimizing it.

Also, I can at least name some endearment suffixes (eg. "ik", "onok",
"ochek" as in "zaichik", "zaichonok", "utenok", "utenochek", etc) which
are sometimes used in Russian but are not supported by current stemmer.
I'm going to add and test support for them, and these experiments are
simpler to me to conduct using my own implementation.

- Andrew

Snowball-discuss mailing list

VirusChecked by the Incepta Group plc

This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST