Re: [Snowball-discuss] More patches

From: Olly Betts (
Date: Thu Feb 15 2007 - 11:24:40 GMT

On Mon, Feb 12, 2007 at 08:30:28AM +0000, Olly Betts wrote:
> This adds a "make check" rule which verifies that the UTF-8 and
> ISO-8859-1 versions of the stemmers actually produce the expected
> output on the test vocabulary.

This patch extends the rules so that "make check" will print a warning
for algorithm/encoding combinations for which there's no test data.
This isn't used by the sources as shipped, but if you enable other
algorithms, it's useful:

Alternatively, perhaps we should just generate test data by running a
suitable vocabulary through the stemming algorithm - that will at least
allow checking that no regressions are introduced by changes to the
snowball compiler and runtime. The missing data is for lovins, german2,
and romanian2, and we have english, german, and romanian vocabulary for
other stemmers. If that seems a better approach, I'm happy to provide
a patch to do that instead.


