[Snowball-discuss] Local variables for snowball

From: Olly Betts (olly@survex.com)
Date: Mon Sep 11 2006 - 12:11:50 BST

I was looking at the generated C code and thinking it would be nice to
be able to make some variables local variables rather than putting them
all in SN_env. The dereferencing must add some overhead - not much per
invocation, but for a lot of text it will add up. There's overhead to
allocate and deallocate but that matters less as people tend to create
a stemmer and stem a lot of words with it. It may also be useful to
be able to write recursive routines where the local variable is
different for each nested invocation.

The first cut of a patch to implement this is here (including an update
for the Snowball manual):


So far I've done integers and booleans, but not strings as they're a
little more work.

And here's an example of how it can be used in the English stemmer
(also included in the patch). By hand-inling "preamble" and "postamble"
into "stem", Y_found can be made a local variable:

    define stem as (
        booleans ( Y_found )

        exception1 or
        not hop 3 or (
            ( // prelude
                do ( ['{'}'] delete)
                do ( ['y'] <-'Y' set Y_found)
                do repeat(goto (v ['y']) <-'Y' set Y_found)
            do mark_regions
            backwards (

                do Step_1a

                exception2 or (

                    do Step_1b
                    do Step_1c

                    do Step_2
                    do Step_3
                    do Step_4

                    do Step_5
            ( // postlude
                Y_found repeat(goto (['Y']) <-'y')

And in the generated code, we now have:

    extern int english_UTF_8_stem(struct SN_env * z) {
            int v_Y_found = 0;
            { int c = z->c; /* or, line 196 */

I've verified this modified English stemmer still gives the same results
on the sample vocabulary.

Does this language extension seem suitable for inclusion? If so, I'll
add support for strings and see if I can get the Java code generator to
implement it too.


This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:48 BST