Links to resources
The Kraaij-Pohlmann stemming algorithm is an ANSI C program for stemming in Dutch. Although
advertised as an algorithm, it is in fact a program without an accompanying
algorithmic description. It is possible to produce a fairly clean Snowball
version, but only by sacrificing exact functional equivalence. But that does not
matter too much, since in the demonstration vocabulary only 32 words out of over
45,000 stem differently. Here they are:
source | | ANSI C stemmer | | Snowball stemmer
| airways | | airways | | airway
| algerije | | algerije | | alrije
| assays | | assays | | assay
| bruys | | bruys | | bruy
| cleanaways | | cleanaways | | cleanaway
| creys | | creys | | crey
| croyden | | croyd | | croy
| edele | | edel | | edeel
| essays | | essays | | essay
| gedijen | | gedij | | dij
| geoff | | of | | off
| gevrey | | gevrey | | vrey
| geysels | | ysel | | gey
| grootmeesteres | | grootmee | | grootmeest
| gr•otmeesteres | | gr•otmee | | gr•otmeest
| hectares | | hectaar | | hect
| huys | | huys | | huy
| kayen | | kayen | | kaay
| lagerwey | | lagerwey | | larwey
| mayen | | mayen | | maay
| meesteres | | meester | | meest
| oppasseres | | oppasser | | oppas
| pays | | pays | | pay
| royale | | royale | | royaal
| schilderes | | schilder | | schild
| summerhayes | | summerhayes | | summerhaye
| tyumen | | tyuum | | tyum
| verheyen | | verheyen | | verheey
| verleideres | | verleider | | verleid
| ytsen | | yts | | ytsen
| yves | | yve | | yves
| zangeres | | zanger | | zang
|
The Kraaij-Pohmann stemmer can make fairly drastic reductions to a word. For
example, infixed ge is removed, so geluidgevoelige stems to
luidvoel. Often, therefore, the original word cannot be easily guessed from
the stemmed form.
Here then is the Snowball equivalent of the Kraaij-Pohlmann algorithm.
|