I was looking at the generated C code and thinking it would be nice to
be able to make some variables local variables rather than putting them
all in SN_env. The dereferencing must add some overhead - not much per
invocation, but for a lot of text it will add up. There's overhead to
allocate and deallocate but that matters less as people tend to create
a stemmer and stem a lot of words with it. It may also be useful to
be able to write recursive routines where the local variable is
different for each nested invocation.
The first cut of a patch to implement this is here (including an update
for the Snowball manual):
http://oligarchy.co.uk/xapian/patches/snowball-local-variables.patch
So far I've done integers and booleans, but not strings as they're a
little more work.
And here's an example of how it can be used in the English stemmer
(also included in the patch). By hand-inling "preamble" and "postamble"
into "stem", Y_found can be made a local variable:
define stem as (
booleans ( Y_found )
exception1 or
not hop 3 or (
( // prelude
do ( ['{'}'] delete)
do ( ['y'] <-'Y' set Y_found)
do repeat(goto (v ['y']) <-'Y' set Y_found)
)
do mark_regions
backwards (
do Step_1a
exception2 or (
do Step_1b
do Step_1c
do Step_2
do Step_3
do Step_4
do Step_5
)
)
( // postlude
Y_found repeat(goto (['Y']) <-'y')
)
)
)
And in the generated code, we now have:
extern int english_UTF_8_stem(struct SN_env * z) {
{
int v_Y_found = 0;
{ int c = z->c; /* or, line 196 */
[...]
I've verified this modified English stemmer still gives the same results
on the sample vocabulary.
Does this language extension seem suitable for inclusion? If so, I'll
add support for strings and see if I can get the Java code generator to
implement it too.
Cheers,
Olly
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:48 BST