I raised the same point Blake made, quite some time ago. I didn't have
the time to make a proposal for improvement of the snowball script for
I'll try to do this next week. Being a native speaker of Dutch I find
the "kk", "dd", "tt" undoubling rather arbitrary.
Double consonants in Dutch are mainly used to make the previous vowel
short (with the exception that at the end of a word no double consonants
The idea (not in snowball syntax)
One of the problems with current Dutch Stemmer is that short and long
vowel words are stemmed to the same stem.
E.g. : "makken" becomes "mak", "maken" becomes also "mak" ("maak" would
be more natural).
If the undoubling of consonant is generalized to all consonants, then
the stemmer should adjust for this effect by doubling vowels.
So the following should be done:
1) modify the undouble procedure to do the following:
If ending in a double consonant
remove one of the consonants. //generalisation of
if ending CVC (consonant among('a''e''o''u') consonant)
double vowel //make vowel long by
if ending among ('v', 'z') //transform 'v' and
'z' into 'f' and 's' ('huizen' -> 'huis', 'leven'->'leef'
'v' <- 'f'
'z' <- 's'
2) remove the vowel undoubling (step 4)
I think this change would be an improvement of your Dutch Stemmer.
It should not be very difficult to translate this into snowball, but I
don't speak snowball fluently (yet). I'll give it a try next week.
> -----Original Message-----
> From: Martin Porter [mailto:firstname.lastname@example.org]
> Sent: donderdag 9 december 2004 9:32
> To: Blake Madden; email@example.com
> Subject: Re: [Snowball-discuss] Undoubling in Dutch stemmer
> I don't recall the details now, but I think I went through
> consonant by consonant trying the effect of undoubling. That
> was my approach in developing the stemmers generally. So if a
> plausible rule is not included it is because, on balance, it
> did not seem to lead to an improvement. Of course any rule
> might be reassessed, and will store this one for the next
> time I look at the Dutch stemmer.
> You may have noticed that there is not much work being done
> on the stemmers at the moment ...
> Snowball-discuss mailing list
This message has been scanned by F-Secure Anti-Virus for Microsoft
For more information, connect to http://www.F-Secure.com/
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST