Re[4]: [Snowball-discuss] an inconsistency with Russian stemmer

From: Andrew Aksyonoff (shodan@chat.ru)
Date: Sat Nov 17 2001 - 10:05:40 GMT


Hello Martin!

Friday, November 16, 2001, 6:59:49 PM, you wrote:
MP> You are right, there is a problem with the algorithm definition: if a
MP> reflexive ending is found in the 'verbal' test it is removed anyway,
Understood.

MP> Does that issue solve all outstanding problems?
Most of them, as Step 2 which I changed to eliminate both
trailing "i" and "i'" generates only a few differences with
my program and output.txt.

MP> It is nice of you to do the work to find this error, but the
MP> program you have written will I imagine run slower than the
MP> Snowball one, since you do sequential testing for the
MP> endings, not binary chop.
In its current form it is expected to be slow, as this
version was written to have clear reference implementation.
I'll be optimizing it.

Also, I can at least name some endearment suffixes (eg. "ik", "onok",
"ochek" as in "zaichik", "zaichonok", "utenok", "utenochek", etc) which
are sometimes used in Russian but are not supported by current stemmer.
I'm going to add and test support for them, and these experiments are
simpler to me to conduct using my own implementation.

- Andrew

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss

_____________________________________________________________________
VirusChecked by the Incepta Group plc
_____________________________________________________________________



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST