Re: [Snowball-discuss] Benchmarking (... On a lovely Sunday morning)

From: Allan Fields (afieldsml@idirect.ca)
Date: Tue Apr 23 2002 - 17:47:31 BST


On April 23, 2002 12:23 pm, Teodor Sigaev wrote:
> Hello.

Hi

> > Perl's features. I'll also try to run it against the
> > Lingua::Stem::Snowball module recently submitted to this list, to see if
> > that isn't the best solution for speed in Perl. I think it makes more
> > sense to interface, but it's not uncommon to have both an interface and
> > native implementation(s).

I haven't got to this yet. Planning to do this soon before I submit the
corrections to my script. Unfortunately there is still one bug I'm working
out.

> [...]
>
> > -- Lingua::Stem --
> > 1 : candidatus -> candidatu 24 wallclock secs (22.45 usr +
> > 0.00 sys = 22.45 CPU) @ 2227.64/s (n=50000) -- Lingua::Stem --
> > ...
> > -- bench-lingua-stem.pl --
> > #!/usr/bin/perl
> > use Lingua::Stem qw(:all);
> > use Benchmark;
> >
> > my @word = grep chomp, <>;
> > my ($n,$pu,$ps) = (0,0,0); my $s = 10;
> > for (1..$s) {
> > my $result;
> > my $w = @word[rand(scalar(@word))];
> > my $t = timeit(100000, sub { ($result) = @{stem($w)} } );
> > print "$_\t: $w -> $result\t",timestr($t),"\n";
> > $pu+=$t->[1]; $ps+=$t->[2]; $n+=$t->[5];
> > }
> > printf "Average random cross-sectional stem rate for $s words: %5.2f Hz
> > (n=%d).\n", $n/($pu+$ps), $n;
>
> Is it script for Lingua::Stem::Snowball? If it is, what is function stem()?
> Lingua::Stem::Snowball isn't provide function stem(), stem is a method of
> object Lingua::Stem::Snowball. Function is named as snowball(). BTW, using
> function snowball() you must get significant performance degradation,
> because it construct Lingua::Stem::Snowball's object internally :). Instead
> let you use stem() method.

No, this is calling the 'Lingua::Stem' module with is different from
Lingua::Stem::Snowball, this one doesn't interface with snowball at all,
although it uses Porter 1 algorithm -- they share a similar branch in the
CPAN tree, but are two different modules with Lingua::Stem being from Stem.pm
and En.pm files and snowball being other files entirely.

The problem with Lingua::Stem as compared to other Stemmers is it uses some
rather exotic sub calls to achieve the stemming rules. Which bogs it down
considerably. This shouldn't be an issue with Lingua::Stem::Snowball as it
directly interfaces the snowball generated C code.

I'll make sure to use the stem() method of Lingua::Stem::Snowball when I do
that benchmark.

Thanks for the info..

-- Allan Fields

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:41 BST