Re: [Snowball-discuss] German suffix stripping not complete

From: Martin Porter (martin.porter@grapeshot.co.uk)
Date: Mon Jan 30 2006 - 10:05:46 GMT


Karl,

I'm sure the reason it was not done is that the group is so small. What you
would certainly need to do is to check for the ending -los, and not remove
the -s in that case. If you take the sample vocabulary provided with German,
you then get the following residual list,

        ambros amos autos bartholomaios büros chaos credos fotos
        haemorrheos heros hos infos jethros jos lebensmittelembargos
        migros moos mythos pharaos platos salomos studios theophrastos
        wahlbüros wos

25 words in all. -s could be removed with benefit or without harm from all,
or almost all, of these words. There is some overlap here with your own word
list.

Thank you for pointing this out. I will review the German algorithm at some
point in the future, and possibly incorporate your sugestion,

Martin

>Hello list,
>
>I'm wondering if there is a good reason for the German stemmer not to
>suffix strip the s in words ending on 'os'.
>Autos, kinos, echos, bu"ros, silos, pianos, et.c.
>
>Here are some words you can consider.
>
>Albatros, apropos, chaos, epos, kosmos, gros, rigoros, grandios, los, haarlos.
>
>All I can think of will be pretty much ok suffix stripped.
>
>_______________________________________________



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:47 BST