[Snowball-discuss] RE: Snowball-discuss digest, Vol 1 #5 - 1 msg

From: Svetlana Pereyaslavets (smp29@cs.waikato.ac.nz)
Date: Mon Sep 09 2002 - 09:49:01 BST


Dear Martin,
I am not a linguist, but a native Russian speaker. May I try to give some
explanation on this suffix.
Free, but hopefully helpful :-)
It is a very common in Russian adjectives and adverbs when we deal with a
construction:

*****basic*construction********
  prefix-root - "other optional" suffix - n (Oleg's question) - <adjective
ending> ( = yi/iy/oy....through all genders and declinations)
*******************************
The rule has the following options:
1. prefix-root - n - <adjective ending >

        1.1. The root itself ends on -n-
        In this case we will encounter -nn- after stripping the adjective ending,
and we SHOULD REMOVE one -n- (that is the suffix).
        Such words usually don't have prefixes (so can be easily compared to the
dictionary).
        Example : kon-n-yi (adjective from "kon'"=horse)

        1.2 The root ends on any other letter

        we SHOULD REMOVE the -n- (that is the suffix).
        Example: ruch-n-oy (adjective from "ruka"= hand).

2. prefix-root - "other optional"suffix - n - adjective ending

        2.1. other optional suffix = - an- or - yan -
         - a- or -ya- SHOULD BE REMOVED TOGETHER with the suffix -n-.

        Example: "sherst-yan-oy" (=woolen).

        THREE exceptions from this rule would fall under case 2.2:
"stekl-yan -n - <adjective ending>" (adj from glass)
"olov-yan -n - <adjective ending>" (adj from tin)
"derev-yan -n - <adjective ending>" (adj from wood)

        2.2. other optional suffix = -on - or -en-

        REMOVE -n- and following -en- or -on-.

        Example: "osob - en- n- <adjective ending>" (=special)

        ONE exception from this rule would fall under case 2.1:
"ran -en- <adjective ending>" (= injured)

        2.3. HARD CASE (RUSSIAN LEXICAL DIVERSITY IS INVOLVED) - I can't suggest a
solution right now, as I need time to think how to detect that without
knowledge of the natural language:

         other optional suffix = -in

                2.3.1. If the following substitution is valid refer to 1.1. or 2.1
(i.e. -n- SHOULD BE REMOVED, following -in- siffix MAY and probably SHOULD
be removed depending on the required detailisation)

                              - a (noun)
                            /
                root - in -|
                            \
                                  -n- <adjective ending>

                Example: "star-in-a"-"star-in-n -yi" (= old)

                2.3.2 If the substitution above is not valid refer to 1.2. or 2.2. with
the same reservation.

                Example: "mysh-in - <adjective ending> " (adjective from "mysh'"= mouse)

3. PARTICIPLE II may look the same as an adjective for an end-stripping
stemmer.
In Participles II, the scheme is :

 word - {-on, -en, -an, -yan} - n - <adjective ending>

Where "word" is VERY LIKELY to consist of "prefix-root" (i.e. there is a
high probability that participle II would have a prefix).

It may look too complicated, please email if you need to clarify something.
Or, please allow me some time to return to this topic and come up with a
digestable algorithm. Actually, I was planning to test Russian stemmer in
the scope of my student research in December this year.

Kind regards

Svetlana

-----Original Message-----
From: snowball-discuss-admin@lists.tartarus.org
[mailto:snowball-discuss-admin@lists.tartarus.org]On Behalf Of
snowball-discuss-request@lists.tartarus.org
Sent: Monday, September 09, 2002 5:45 PM
To: snowball-discuss@lists.tartarus.org
Subject: Snowball-discuss digest, Vol 1 #5 - 1 msg

Send Snowball-discuss mailing list submissions to
        snowball-discuss@lists.tartarus.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.tartarus.org/mailman/listinfo/snowball-discuss
or, via email, send a message with subject or body 'help' to
        snowball-discuss-request@lists.tartarus.org

You can reach the person managing the list at
        snowball-discuss-admin@lists.tartarus.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Snowball-discuss digest..."

Today's Topics:

   1. Re: russian stemmer (Martin Porter)

--__--__--

Message: 1
To: Oleg Bartunov <
oleg@sai.msu.su>
From: martin_porter@softhome.net (Martin Porter)
Cc: snowball-discuss@lists.tartarus.org
Date: Sun, 08 Sep 2002 23:12:26 -0600
Subject: [Snowball-discuss] Re: russian stemmer

Oleg,

I've had a look at -n-ogo, -n-yi etc endings through the Russian vocabulary,
and feel that I would need to take linguistic advice before I could make any
progress with -n- removal.

As you may recall, I did the Russian stemmer with a linguist, Pat Miles, who
lives some 60 miles away, and is not really a computer user. Also, Pat
charges for his work, which is a further inconvenience to me! I'd rather try
to get free linguistic help now through the open source community. Is there
anyone you know in Russia who might experiment a bit further with the
Snowball stemmer to see if they could make improvements here?

Martin

>current russian stemmer seems doesn't treat adjective endings like:
>'nogo', 'nomu', 'nyi' ...., so
>veslopidnogo (bicycle) -> velosipedn~ogo
>velosipednyi -> velosipedn~yi
> while better to have
>velosipednogo -> velosiped~nogo
>velosipednyi -> velosiped~nyi
>
>I'm not a linguist, so I don't know how properly distinguish
>'nogo' from 'ogo' etc. Probably there is some grammar rules.

--__--__--

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss

End of Snowball-discuss Digest



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:42 BST