Re: [Snowball-discuss] evaluation of Snowball stemmers

From: Martin Porter (martin.porter@grapeshot.co.uk)
Date: Fri Dec 10 2004 - 08:04:58 GMT


Diana,

I have not carefully monitored the use of the stemmers in evaluation work,
although I think it is fairly extensive. (Of course the stemmers are often
used in IR experiments even when stemming itself is not the subject of
evaluation.) But see this paper:

Stephen Tomlinson (2003) Lexical and algorithmic stemming compared for 9
European languages with Hummingbird SearchServer(TM) at CLEF 2003. In Carol
Peters, editor, Working notes for the CLEF 2003 Workshop 21-22 August,
Trondheim, Norway.

http://www.stephent.com/ir/papers/clef03.html

Tomlinson (2003) compares the Snowball stemmers with a commercial lexical
stemming (lemmatization) system. Of the nine languages tested, six gave
differences that were not statistically significant, two did better under
the lemmatization system, and one better under Snowball - I think I got that
right: you can verify it by looking at the paper.

Given the simplicity and cheapness of the Snowball stemmers compared with a
full lemmatization system I think this is a good result for Snowball.

Unfortunately I have not been able to find out much about the Hummingbird
system, either from Tomlinson's paper or elsewhere.

Martin



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST