Re: [Snowball-discuss] Snowball API versioning

From: Richard Boulton (richard@lemurconsulting.com)
Date: Fri May 04 2007 - 14:42:47 BST


Oleg Bartunov wrote:
> Hi there,
>
> I'm asking about API versioning to let third-party products track
> snowball changes. We use snowball stemmer in our full-text search engine
> for PostgreSQL. Currently, it's extension module but we expect it will
> became
> built-in core FTS in the next release, which should happen in two months.
> There were several (we noticed 2) API changes this year. It's headache !
> I suggest to follow standard versioning scheme major.minor.

Some kind of versioning would indeed be a good idea, but it's not clear
to me what the API changes you're referring to are: as far as I can see,
these are the ways in which the code accessible from snowball changes:

1. Internal changes to the compiler, resulting in the generated stemmer
code being different, but behaving the same.
2. New features being added to the snowball language, but old .sbl files
will still produce equivalent output.
3. Changes to the definition of the snowball language, resulting in .sbl
files no-longer producing equivalent output.
4. Changes to a snowball script, such that it produces different output.
5. Changes to the libstemmer interface (ie, the libstemmer.h file, for C).

IIRC, there have been several changes of type 1, but none of 2 or 3 in
recent months/years. There have been no changes to libstemmer.h since
August 2005.

Therefore, I suspect you're talking about changes of type 4. I would
like to add versioning to the stemming algorithms at some point, such
that each change to an algorithm increments the version number, but
haven't had time to do this yet.

Also, I would like to modify the libstemmer interface such that the
current version of a stemming algorithm can be obtained, and also such
that a particular version of a stemming algorithm can be requested. It
would also be possible to compile a version of libstemmer such that
several old versions of a particular stemmer were available. This would
allow a database to store the stemmer version number which was used to
index with, so that searches can use the same stemmer version. However,
a newly created database would simply use the latest stemmer version.
Again, I simply haven't had time to do this yet.

For now, I recommend that you simply take a copy of libstemmer into your
distribution, and update that static version of libstemmer as
appropriate when you make new releases of your distribution.

I don't think that a major.minor versioning scheme would be appropriate
here, but maybe you are thinking of something different to me (in which
case, please enlighten me).

-- 
Richard



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:49 BST