[Snowball-discuss] Finnish stemmer diff file

From: Blake Madden (madden_blake@hotmail.com)
Date: Wed May 12 2004 - 19:00:02 BST

In the Finnish stemmer's diff file (the text file that shows a list of
Finnish words and respective stemmed equivalents), there are a few entries
that have uppercased 'Ä's in them. This can be somewhat confusing given
that the stemmers are meant to only work with lowercased text. Here is one

edelliseltÄ edelliseltÄ
edelliseltä edellis

This gives the impression that there is something special about 'Ä', like it
is a special consonant. It looks here like "edelliseltÄ" and "edelliseltä"
are entirely different words. However, this is not exactly the case. In
reality, "edelliseltÄ" was not stemmed correctly because it was not
lowercased first. Like I said, this could just be a little confusing.


