[Snowball-discuss] Re: snowball

From: Martin Porter (martin_porter@softhome.net)
Date: Sun Oct 06 2002 - 17:21:01 BST


>.... what is the Snowball project's scope? Do you intend to
>supply other related stuff? Such as stopword lists, accent normalisation code
>(e.g. conflating ä and ae in German), language recognition, etc.
>

That is a very good question.

I was not thinking of adding language recognition or accent normalisation
work, although it seemed to me that lists of stopwords would be useful. But
the lists are available in Xapian, and interestingly no-one has requested
them via Snowball. And my lists are incomplete - I don't have a Finnish stop
word list for example (the stemmer was developed from a Finnish vocab list,
not a sample of text.) And they need some reworking and annotation.

What I hoped initially was that the Snowball site would attract other
contributions. Not necessarily using Snowball itself, but covering stemming.
For example, I offered to put up a PhD thesis in html form that was about
stemming evaluation. The offer was turned down, with the result that the
work is still relatively inaccessible.

Of course the main intention was to see stemmers for other languages
developed by other people but along similar lines, but that has not happened.

M.



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:43 BST