Jan,
I apologise for taking so long to get back to the stemming work, but I have
been busy with other things. You suggested (13 Jan) removing k from
define_s_ending in the Norwegian stemmer. In the sample vocabulary about one
word per thousand is affected by this. Here is the list:
-ks word current action Bruusgaard
adjustment
-------- --------------
---------------------
boks bok + boks
brokks brokk 0 brokks
danmarks danmark 0 danmarks
fisks fisk ? fisks
foretaks foretak 0 foretaks
heks hek + heks
innenriks innenrik ? innenriks
inneriks innerik ? inneriks
instruks instruk + instruks
juks juk + juks
laks lak + laks
markedsinndeks markedsinndek ? markedsinndeks
paradoks paradok + paradoks
seks sek ? seks
straks strak ? straks
styreinstruks styreinstruk ? styreinstruks
utenriks utenrik + utenriks
verks verk ? verks
voks vok + voks
I think I will add your suggestion, since the forms marked + (I believe) are
improved by the rule, the forms marked 0 made worse. The forms marked ? I
cannot interpret.
I wonder if you could comment on this list as a native speaker of Norwegian?
Martin
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:47 BST