Re: [Snowball-discuss] Download tarball inconsistencies

From: richard@lemurconsulting.com
Date: Sun Sep 10 2006 - 10:26:36 BST


On Sun, Sep 10, 2006 at 04:59:24AM +0100, Olly Betts wrote:
> There are inconsistencies in which .sbl files are included in the
> different downloads available. Here's a list (the first number is the
> file size):

> I find it somewhat suprising that they don't contain exactly the same
> set of .sbl files!

They should do now. (And the timestamps should be the same, too, not that
they're particularly meaningful.)

The stem.sbl files assume encoding in Latin-1 - but since for the
characters they accept this is the same as Unicode, they can be compiled as
unicode algorithms using the appropriate switch to the snowball compiler
(IIRC, -u). The encodings expected by the other stem-*.sbl files should be
obvious.

> I'm also somewhat confused since when I look at CVS, I only see stem.sbl
> in any language directory (and there are no directories for the romanian
> stemmers). So where are these other versions of the .sbl files coming
> from? And how are the new Romanian stemmers getting in there?

I've fixed the link to the CVS repository. We changed to using this ages
ago, I should have noticed this a long time ago, sorry.

-- 
Richard



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:48 BST