Re: [Snowball-discuss] Norwegian stemmer

From: Martin Porter (martin.porter@grapeshot.co.uk)
Date: Wed Apr 20 2005 - 10:51:36 BST


Jan,

I apologise for taking so long to get back to the stemming work, but I have
been busy with other things. You suggested (13 Jan) removing k from
define_s_ending in the Norwegian stemmer. In the sample vocabulary about one
word per thousand is affected by this. Here is the list:

-ks word current action Bruusgaard
adjustment
-------- --------------
---------------------

boks bok + boks
brokks brokk 0 brokks
danmarks danmark 0 danmarks
fisks fisk ? fisks
foretaks foretak 0 foretaks
heks hek + heks
innenriks innenrik ? innenriks
inneriks innerik ? inneriks
instruks instruk + instruks
juks juk + juks
laks lak + laks
markedsinndeks markedsinndek ? markedsinndeks
paradoks paradok + paradoks
seks sek ? seks
straks strak ? straks
styreinstruks styreinstruk ? styreinstruks
utenriks utenrik + utenriks
verks verk ? verks
voks vok + voks

I think I will add your suggestion, since the forms marked + (I believe) are
improved by the rule, the forms marked 0 made worse. The forms marked ? I
cannot interpret.

I wonder if you could comment on this list as a native speaker of Norwegian?

Martin



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:47 BST