Re: [Snowball-discuss] More patches

From: Olly Betts (
Date: Thu Feb 15 2007 - 10:58:06 GMT

On Mon, Feb 12, 2007 at 08:30:28AM +0000, Olly Betts wrote:
> This adds a "make check" rule which verifies that the UTF-8 and
> ISO-8859-1 versions of the stemmers actually produce the expected
> output on the test vocabulary. To simplify the implementation
> of this, the patch also converts all the voc.txt and output.txt
> files to UTF-8 (the romanian ones were already) - I just ran them
> through iconv with suitable options to do this:

I've just discovered that this patch incorrectly converted romanian1
files to utf-8, but they were already in utf-8 (the "make check" rule
didn't catch this because romanian1 isn't built into libstemmer by
default). Sorry about that.

This patch reverts those files to their original state:

A related issue - there are a small number of examples in the hungarian
vocabulary which contain upper case ASCII letters. Would it make sense
to just change these to lower case for consistency with the other test


This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:49 BST