Re: [Snowball-discuss] Stemming 'communing' and 'communed'

From: Michael Edwards (mbedwards@gmail.com)
Date: Tue Apr 03 2007 - 11:06:16 BST


Thanks. My implementation has been working for about a week and I
should be ready to upload it soon. One thing I noticed in the spec now
at the bottom where it lists the exceptional prefixes ('gener',
'commun', 'arsen') is that arsen is not bold and the 'a' is a ')':

"If the words begins gener, commun or (rsen, set R1 to be the
remainder of the word."

Incidentally, a colleague wrote another PHP implementation heavily
dependent on the PCRE (Perl Compatible Regular Expression) library and
it was twice as fast as mine. Even though my implementation has room
to be optimized it seems at least at first glance that regular
expressions may be the way to go for many scripting languages as far
as speed and shortness of implementation, in addition to the ability
to providing an easy porting path (because many languages implement
PCRE). Just some thoughts that might be interesting for anyone
thinking about implementing this or similar algorithms.

Best regards,
Michael

On 4/3/07, Martin Porter <martin.porter@grapeshot.co.uk> wrote:
>
> Michael,
>
> I've corrected the definition of the English stemmer in line with your comments,
>
> Martin
>
>
>
>



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:49 BST