Re: [Snowball-discuss] Question about adding rules to Snowball Stemmer

From: Martin Porter (martin.porter@grapeshot.co.uk)
Date: Thu Jul 15 2004 - 10:40:59 BST


Olga,

Thank you for your interest.

For your various questions,

1. How we can add a list of "exceptional cases" to the Stemmer -

See the section beginning with the heading "Exceptional forms in general" in
the page

http://snowball.tartarus.org/english/stemmer.html

(Various approaches are possible, but this would be my approach.)

4. Is there Java API available for Snowball?

Yes. See the section "Java generation" in

http://snowball.tartarus.org/q/use.html

5. Could you perhaps point me to some other publicly available stemmers I
could look at and play with?

There is a great deal of work around on language processing (and stemming),
but unfortunately most of it is proprietory, and therefore difficult to
review or assess. An example is

http://www.teragram.com/oem/euro_lang.htm#stemming

For stemming freeware, there is not much avaliable. For English, there is
the Lovins stemmer, see

http://www.cs.waikato.ac.nz/~eibe/stemmers/

and the Paice stemmer, see

http://www.comp.lancs.ac.uk/computing/research/stemming/

But the Lovins stemmer is also available on the Snowball site in snowball form:

http://snowball.tartarus.org/lovins/stemmer.html

The Paice stemmer does not easily translate into Snowball, otherwise it
would be there too.

For foreign language stemmers, there are often references, but almost never
proper algorithmic descriptions. For example,

http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=e
n#id2600260

describes work done in Polish.

There does not appear to be a question 3. For question 2, I am not sure what
you have in mind, and perhaps you could explain a little more fully.

Martin



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST