Re: [Snowball-discuss] Dutch stemmer: undouble "nn", "mm", "ff"?

From: Arjen van der Meijden (arjen@glas.its.tudelft.nl)
Date: Thu Jan 01 2004 - 19:39:02 GMT


Martin Porter wrote:

> Edwin,
>
> Thanks for that idea, which I'll try out. There are a number of outstanding
> suggestions to work through, and I must set some time aside to look at them
> early this year.
>
> A new idea of mine: I think apostrophe ought to form part of the alphabet of
> Dutch, and indeed of English. I haven't really had time to put that in though.

Would that stem words like these Dutch words:
cd'tje -> cd
tv'tje -> tv
a4'tje -> a4
baby'tje -> baby
("smaller versions of" abbreviations are "smallerized" with 'tje, as do
words ending at a consonant and a 'y')
pcb's -> pcb
foto's -> foto
taxi's -> taxi
(plural forms of abbreviations and words ending at an a, i, o, u, y have
an 's ending)
Wanda's vis -> Wanda, vis
Kees' auto -> Kees, auto
Henks fiets -> Henk, fiets
(ownerships are with a 's, unless the word already ends with a s (or
s-sound). If there is a consonant at the end, than just an s)

Or are these already handled?
Anyway, the Dutch language seems terrible to stem very well. At least it
does to me. There are a lot of rules and to almost all rules a few
exceptions on those rules. :)

Best regards,

Arjen van der Meijden



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST