Re: [Snowball-discuss] More patches

From: Olly Betts (olly@survex.com)
Date: Mon Feb 12 2007 - 15:41:40 GMT


On Mon, Feb 12, 2007 at 01:31:25PM +0000, Richard Boulton wrote:
> Olly Betts wrote:
> >I'm currently updating Xapian to use UTF-8 stemmers generated by the
> >latest version of snowball. I've patched the snowball compiler to
> >generate the stemmers as C++ classes, and I'm embedding the patched
> >compiler in the Xapian build system, so Xapian users can easily drop
> >in new stemmers.
>
> I'd be interested in adding a "C++" output mode to snowball, so patches
> to do this would probably be accepted.

They're probably not generic enough currently, but I'm working on that.

> Ideally, I'd like to make a C++ version of the libstemmer library, and
> maintain it in Snowball rather than Xapian. In particular, it would
> seem useful to me for developers to be able to link against a
> system-wide snowball dynamic library, rather than the specific version
> compiled into Xapian. However, that discussion possibly belongs on the
> Xapian mailing lists rather than here, and for now whatever works is
> fine by me. :)

The problem with this approach is that a change to a stemming algorithm
makes an index containing stemmed terms somewhat incompatible, so you
don't want the stemming algorithms to update underneath you. You want
to know they're going to update and rebuild the index, then switch the
new index in with an updated search frontend in one go. And it makes
sense to update the stemmers when there are incompatible changes to
the index for other reasons, so you'd have to rebuild anyway.

Next patch - this renames uses of variables called "c" in the
generated code. In the generated C++ code, I move "struct SN_env"
to be the base class, so these would shadow the base class member
"c" (i.e. "z->c" in the C code), but I believe it also improves
the clarity of the generated code. I've also combined "int ret;"
followed by "ret =" into a single statement, and eliminated an
inner block in one case:

http://oligarchy.co.uk/xapian/patches/snowball-dont-use-c.patch

Cheers,
    Olly



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:49 BST