Hi Martin,
thank you for the quick response.
Because of your @softhome.net e-mail, the spamfilter I use
(http://junkfilter.zer0.org), moved it right to my junkmail mailbox
so I didn't see it until now :(
On Tue, Nov 27, 2001 at 08:14:14AM -0700, Martin Porter sent:
> Thanks, Ask, that is most interesting. I think it would be useful eventually
> to have a collection of links to resources from the Snowball site. Could we
> put your version in?
Sure!
So far the norwegian version is here:
http://www.unixmonks.net/~ask/Stemmer-Norwegian-0.5.tar.gz
This one works perfectly with the norwegian diffs.txt from
snowball.sourceforge.net
But as Oleg said, we need to agree on a namespace and a interface
for perl ported snowball stemmers.
My co-worker here is also porting it to Java.
> I'm sorry about that. 3.1 is part of an old numbering scheme which I thought
> I'd eliminated. I'll fix it. Go to the porter stemmer for the definition of
> R1 and R2, although I guess you must know what the definiton is.
Thanks!
>
> Mmmm - I think no-one is reading the Snowball manual :-) . It sets p1 to 3
> if is less than 3. So p1 is (a) after the first non-vowel following a vowel,
> or (b) after the 3rd letter, whichever position is further right. Basically,
> 2 letters is too little for a residual stem in German, and I think Norwegian.
Ok, that sorted it out.
Now I've also printed and studied the snowball manual :)
> Any observations on the stemmer would be useful - I know little about
> Norwegian. Is a stemmer for Nynorsk of any importance?
We need nynorsk for the project we're working on right now, so somehow
we must come up with the algorithm.
But as far as I can tell, this algorithm already takes a lot of nynorsk,
because -ar, -ande, -ast, -ane, -eleg, -eig and -leg is not "bokmål" but
nynorsk.
> Incidentally, how did you come across snowball? It is widely known as yet.
Well, I'm working on an XML-content indexer
(http://www.unixmonks.net/xanonton+xiri) and needed a
stemmer, and my coworker pointed me to snowball.sourceforge.net.
-- / Ask Solem Hoel | GAN Media \ : +47 48054613 | +47 22707439 : \ www.unixmonks.net | www.gan.no/media /_______________________________________________ Snowball-discuss mailing list Snowball-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/snowball-discuss
_____________________________________________________________________ VirusChecked by the Incepta Group plc _____________________________________________________________________
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST