Re: [Snowball-discuss] Slovene stemmer

From: Martin Porter (martin.porter@grapeshot.co.uk)
Date: Mon Mar 21 2005 - 13:37:48 GMT


Bostjan,

Yes, I have always kept in mind the possibility of putting your Slovene
stemmer among the various snowball stemmers at snowball.tartarus.org.

Various things arose:

I recall that when I asked you for a sample vocab, you sent back a small
text in Slovene (the beginning of a translation of Orwell's 1984, if I
remember correctly), and what I wanted to do was to put together a larger
wordlist, in alphabetical order, derived from a more substantial set of
texts, and then try your stemmer out.

I also wanted to rework your program to use 'among' statements. As I said at
the time, this would make it run really fast.

One thing that struck me about your stemmer is that (again, if I remember
right) the rules were not based on any measure of syllable length. For the
Snowball stemmers syllable length has proved quite useful -- although for
the Russian one it was less important. I wanted to see how far that mattered.

I also wanted to look at the Willett Popovic paper again.

Unfortunately, I have not had too much time to devote to Snowball over the
past six months, so none of this was done. But I would still like to tackle
it. Could you perhaps give me a little more time? Just another month ...

Martin



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:47 BST