RE: [Snowball-discuss] Undoubling in Dutch stemmer

From: Martin Porter (martin.porter@grapeshot.co.uk)
Date: Sun Dec 12 2004 - 13:03:53 GMT


Edwin, Blake,

Sorry to have been a while in replying.

If you get into this, you really should look at the Kraaij Pohlmann stemmer,
which attempts the vowel lengthening you mention. I have translated it into
Snowball, and you can find the result at

http://www.snowball.tartarus.org/kp/stemmer.html

(This is not linked to from elsewhere in the Snowball site, I believe.)

There is also a link from this page to the UPLIFT project page, where their
program can be downloaded.

The difficulty with the K-P stemmer is understanding the linguistic
intentions behind the rules.

If you develop the rules you mention, you must of course check their
behaviour against a sample Dutch vocabulary, and assess, rule by rule,
whether it is improving or degrading the stemming process. More exactly: any
rule has its hits and misses. You compare the ratio of misses to hits and
reject the rule if the ratio is uncomfortably large. When developing the
Snowball stemmer and comparing it with the K-P stemmer, I recall trying to
avoid these less successful rules.

I would like to look into this myself again, but don't quite have the time
at present.

Tell us how you get on,

Martin



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST