Re: [Snowball-discuss] Update on regex approach

From: Oleg Bartunov (oleg@sai.msu.su)
Date: Wed May 08 2002 - 18:33:38 BST


Allan,

I dont' understand what's the problem to use our Perl interface to
Snowball You'll never get performance better than original C program.

        Oleg
On Wed, 8 May 2002, Allan Fields wrote:

> Hi,
>
> Sorry I haven't dropped by for a while, but I'm quite busy. I'll try to get
> my updated Perl stemmer out with-in the next month. More benchmarking to
> come. =) Biggest issue is with overhead of multiple words -- perl can be a
> real beastie performance wise I've witnessed.
>
> My other attempt to speed up the Perl stemmer that I've also been working on
> is stuck on a few technical details of the measure of words. One idea I've
> had is to separate finding the measure from the main transform stage by using
> a reduced set representation in deriving the measure while using a single
> regular expression in substitution with supporting inline logic. s///e The
> biggest issue with this approach, is that at different points it in necessary
> to look-behind to see if the new measure has changed or is past a minimal
> boundry. If there was a way to use integers to represent the logic of the
> {c, v, C, V} sequences, it might significantly speed up that stage by making
> the operations integer operations instead. I would consider this more
> optimal in that, by forcing larger memory usage (still paltry on todays
> computers), it would be possible to conserve processor time.
>
> Also, by inlining all the logic to a single substitution, it could be said
> that perl's larger overhead is reduced somewhat. Now I'm not sure it would
> compare to the C version, but I'm postulating it will be significantly faster
> than most other approaches in Perl. (Although it won't be as algorithmic
> moving lots of the procedural elements to the regex itself.)
>
> This has lead me to believe that it may be possible to create a snowball
> compiler that creates stemmers using Perl regexes at most and at the least
> using sed for instance. There are lots of options for snowball compilation
> currently, but it would have a special geek appeal to make this in sed. Some
> one, please do beat me to it! ;)
>
> Allan
>
>
> _______________________________________________________________
>
> Have big pipes? SourceForge.net is looking for download mirrors. We supply
> the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/snowball-discuss
>

        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:42 BST