RE: [Snowball-discuss] Undoubling in Dutch stemmer

From: Edwin de Jonge (
Date: Thu Dec 09 2004 - 14:08:43 GMT

Hi Martin/Blake,

I raised the same point Blake made, quite some time ago. I didn't have
the time to make a proposal for improvement of the snowball script for
I'll try to do this next week. Being a native speaker of Dutch I find
the "kk", "dd", "tt" undoubling rather arbitrary.
Double consonants in Dutch are mainly used to make the previous vowel
short (with the exception that at the end of a word no double consonants
are used).

The idea (not in snowball syntax)
One of the problems with current Dutch Stemmer is that short and long
vowel words are stemmed to the same stem.
E.g. : "makken" becomes "mak", "maken" becomes also "mak" ("maak" would
be more natural).
If the undoubling of consonant is generalized to all consonants, then
the stemmer should adjust for this effect by doubling vowels.

So the following should be done:
1) modify the undouble procedure to do the following:
 If ending in a double consonant
        remove one of the consonants. //generalisation of
undouble rule
        if ending CVC (consonant among('a''e''o''u') consonant)
                double vowel //make vowel long by
doubling it.
                if ending among ('v', 'z') //transform 'v' and
'z' into 'f' and 's' ('huizen' -> 'huis', 'leven'->'leef'
                        'v' <- 'f'
                        'z' <- 's'
2) remove the vowel undoubling (step 4)
I think this change would be an improvement of your Dutch Stemmer.

It should not be very difficult to translate this into snowball, but I
don't speak snowball fluently (yet). I'll give it a try next week.



> -----Original Message-----
> From: Martin Porter []
> Sent: donderdag 9 december 2004 9:32
> To: Blake Madden;
> Subject: Re: [Snowball-discuss] Undoubling in Dutch stemmer
> I don't recall the details now, but I think I went through
> consonant by consonant trying the effect of undoubling. That
> was my approach in developing the stemmers generally. So if a
> plausible rule is not included it is because, on balance, it
> did not seem to lead to an improvement. Of course any rule
> might be reassessed, and will store this one for the next
> time I look at the Dutch stemmer.
> You may have noticed that there is not much work being done
> on the stemmers at the moment ...
> Martin
> _______________________________________________
> Snowball-discuss mailing list

This message has been scanned by F-Secure Anti-Virus for Microsoft
For more information, connect to

This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST