[Snowball-discuss] an inconsistency with Russian stemmer

From: Andrew Aksyonoff (shodan@chat.ru)
Date: Fri Nov 16 2001 - 05:10:48 GMT


Hello.

First of all, I'd like to tell you that I was simply happy
to find such an astonishing set of stemmers and am very grateful.
Your work is priceless and brilliant.

There, however, seems to be a certain inconsistency on the
pages that I'd like to report.

I've implemented Russian stemmer in C using the algorithm
described at http://snowball.sourceforge.net/russian/stemmer.html.
Then I processed voc.txt and compared the result with output.txt
to test my implementation. To my confusion, there were some
differences which I can't understand. This means that I carefully
checked for bugs and then also tried to execute the algorithm with
a sheet of paper and a pencil for quite a some times, but didn't
manage to understand what rules I was not applying or applying
incorrectly.

With some tweaking and luck, I've been able to make patches
which made my implementation produce exactly the same results.
But it was different from algorithm explanation, and, as
far as I can tell, the Snowball code.

The first patch was to add (NOUN, REFLEXIVE) sequence
to VERBAL. I guess this could be written in Snowball as

define verbal as (
    ( reflexive
      verb or adjectival or noun )
    or verb
)

The second patch was to not only strip traling "i", but
trailing "i short" in step 2 too. I guess this would be
something like

try([ '{i}' or '{i`}' ] delete)

Could you please tell me what causes these differences?
Is it a bug in my code, in my head, on your pages or
something else?

Please reply directly to my e-mail, as I'm not subscribed
to snowball-discuss list.

Thanks a lot in advance.

- Andrew

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss

_____________________________________________________________________
VirusChecked by the Incepta Group plc
_____________________________________________________________________



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST