Re: [Snowball-discuss] The Norwegian stemmer algorithm

From: Ask Solem Hoel (ask@gan.no)
Date: Wed Nov 28 2001 - 21:29:44 GMT


Hi Martin,
        thank you for the quick response.
Because of your @softhome.net e-mail, the spamfilter I use
(http://junkfilter.zer0.org), moved it right to my junkmail mailbox
so I didn't see it until now :(

On Tue, Nov 27, 2001 at 08:14:14AM -0700, Martin Porter sent:

> Thanks, Ask, that is most interesting. I think it would be useful eventually
> to have a collection of links to resources from the Snowball site. Could we
> put your version in?

Sure!
So far the norwegian version is here:
http://www.unixmonks.net/~ask/Stemmer-Norwegian-0.5.tar.gz
This one works perfectly with the norwegian diffs.txt from
snowball.sourceforge.net

But as Oleg said, we need to agree on a namespace and a interface
for perl ported snowball stemmers.

My co-worker here is also porting it to Java.

> I'm sorry about that. 3.1 is part of an old numbering scheme which I thought
> I'd eliminated. I'll fix it. Go to the porter stemmer for the definition of
> R1 and R2, although I guess you must know what the definiton is.

Thanks!

>
> Mmmm - I think no-one is reading the Snowball manual :-) . It sets p1 to 3
> if is less than 3. So p1 is (a) after the first non-vowel following a vowel,
> or (b) after the 3rd letter, whichever position is further right. Basically,
> 2 letters is too little for a residual stem in German, and I think Norwegian.

Ok, that sorted it out.
Now I've also printed and studied the snowball manual :)

> Any observations on the stemmer would be useful - I know little about
> Norwegian. Is a stemmer for Nynorsk of any importance?

We need nynorsk for the project we're working on right now, so somehow
we must come up with the algorithm.

But as far as I can tell, this algorithm already takes a lot of nynorsk,
because -ar, -ande, -ast, -ane, -eleg, -eig and -leg is not "bokmål" but
nynorsk.

> Incidentally, how did you come across snowball? It is widely known as yet.

Well, I'm working on an XML-content indexer
(http://www.unixmonks.net/xanonton+xiri) and needed a
stemmer, and my coworker pointed me to snowball.sourceforge.net.

-- 
/ Ask Solem Hoel        | GAN Media             \
: +47 48054613          | +47 22707439          :
\ www.unixmonks.net     | www.gan.no/media      /

_______________________________________________ Snowball-discuss mailing list Snowball-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/snowball-discuss

_____________________________________________________________________ VirusChecked by the Incepta Group plc _____________________________________________________________________



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST