Hi,
Here are still more details about the various Perl solutions. Surprisingly,
I didn't find Daniel van Balen's algorithm in porter.pm any faster than the
perl.txt algorithm you've implemented Martin. My benchmarks were quick
tests, so I'm not 100% confident these numbers are authoritative. Any
suggestions on Benchmarking would be welcome. However, Martin, your
implementation comes out on top as far as speed from what I can tell.
I thought I would do this at least once as an experiment (more for my own
curiosity).. =) And also, I've fixed some problems in the recently submitted
script. I'll resubmit it along with it's new benchmark data. There were a few
programming gaffes on my behalf, and performance issues.
I would be curious if you can compare the overall Perl performance to using
the C versions of the snowball output. I think Perl's strength in this area is
the full-feature regular expression engine, and as I'll try to demonstrate in
my next submission, things can be optimized somewhat by fully exploiting
Perl's features. I'll also try to run it against the Lingua::Stem::Snowball
module recently submitted to this list, to see if that isn't the best solution for
speed in Perl. I think it makes more sense to interface, but it's not uncommon
to have both an interface and native implementation(s).
These tests were performed on a modest (ancient) system:
It's a PII/233 (66 MHz FSB) with 256MB SDRAM running Perl 5.6 under
FreeBSD 4-STABLE in a multiuser environment. There were the following
parameters around the time of the tests, suggesting that the system is a
typical multiuser system (if not a little loaded down):
last pid: 15293; load averages: 0.11, 0.30, 0.41 up 31+11:31:54 05:19:12
180 processes: 2 running, 177 sleeping, 1 stopped
CPU states: 0.5% user, 0.0% nice, 0.5% system, 0.0% interrupt, 99.0% idle
Mem: 117M Active, 96M Inact, 26M Wired, 7116K Cache, 35M Buf, 2800K Free
Swap: 1152M Total, 225M Used, 926M Free, 19% Inuse
Since FreeBSD is a very efficient platform, there isn't much chance the
results as recorded are skewed by other processes. Processor utilization
for the perl process housing the stemmer was close to 97% for the full test
series. (Larger numbers in Hz are better. Scroll to the bottom of each section
for a summary. All of them are stemming on cross-sections of voc.txt available
the website.)
Allan
-- perl-bench.txt (unmodified perl.txt + benchmarking code) --
1 : cade -> cade 2 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 4812.03/s (n=5000)
2 : psalms -> psalm 1 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 5289.26/s (n=5000)
3 : devising -> devis 1 wallclock secs ( 1.14 usr + 0.00 sys = 1.14 CPU) @ 4383.56/s (n=5000)
4 : residue -> residu 2 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 4155.84/s (n=5000)
5 : yoked -> yoke 1 wallclock secs ( 1.45 usr + 0.00 sys = 1.45 CPU) @ 3459.46/s (n=5000)
6 : blessing -> bless 2 wallclock secs ( 1.13 usr + 0.00 sys = 1.13 CPU) @ 4413.79/s (n=5000)
7 : gallop -> gallop 1 wallclock secs ( 0.90 usr + 0.00 sys = 0.90 CPU) @ 5565.22/s (n=5000)
8 : holborn -> holborn 1 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 5289.26/s (n=5000)
9 : edg -> edg 1 wallclock secs ( 0.65 usr + 0.00 sys = 0.65 CPU) @ 7710.84/s (n=5000)
10 : mobled -> mobl 2 wallclock secs ( 1.41 usr + 0.00 sys = 1.41 CPU) @ 3555.56/s (n=5000)
11 : incertain -> incertain 2 wallclock secs ( 1.16 usr + 0.01 sys = 1.16 CPU) @ 4295.30/s (n=5000)
12 : collect -> collect 1 wallclock secs ( 0.96 usr + 0.00 sys = 0.96 CPU) @ 5203.25/s (n=5000)
13 : meditating -> medit 2 wallclock secs ( 1.44 usr + 0.00 sys = 1.44 CPU) @ 3478.26/s (n=5000)
14 : udders -> udder 1 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 5245.90/s (n=5000)
15 : latest -> latest 1 wallclock secs ( 0.91 usr + 0.00 sys = 0.91 CPU) @ 5470.09/s (n=5000)
16 : ephesians -> ephesian 2 wallclock secs ( 1.22 usr + 0.00 sys = 1.22 CPU) @ 4102.56/s (n=5000)
17 : misinterpret -> misinterpret 1 wallclock secs ( 1.40 usr + 0.01 sys = 1.41 CPU) @ 3555.56/s (n=5000)
18 : reckoned -> reckon 2 wallclock secs ( 1.23 usr + 0.00 sys = 1.23 CPU) @ 4050.63/s (n=5000)
19 : beggared -> beggar 1 wallclock secs ( 1.25 usr + 0.00 sys = 1.25 CPU) @ 4000.00/s (n=5000)
20 : dip -> dip 1 wallclock secs ( 0.63 usr + 0.00 sys = 0.63 CPU) @ 7901.23/s (n=5000)
21 : dies -> di 1 wallclock secs ( 0.62 usr + 0.00 sys = 0.62 CPU) @ 8101.27/s (n=5000)
22 : track -> track 1 wallclock secs ( 0.80 usr + 0.00 sys = 0.80 CPU) @ 6274.51/s (n=5000)
23 : somewhat -> somewhat 1 wallclock secs ( 1.05 usr + 0.00 sys = 1.05 CPU) @ 4776.12/s (n=5000)
24 : havings -> have 3 wallclock secs ( 1.47 usr + 0.00 sys = 1.47 CPU) @ 3404.26/s (n=5000)
25 : bustle -> bustl 2 wallclock secs ( 1.24 usr + 0.00 sys = 1.24 CPU) @ 4025.16/s (n=5000)
26 : princess -> princess 1 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 4705.88/s (n=5000)
27 : vaux -> vaux 1 wallclock secs ( 0.72 usr + 0.00 sys = 0.72 CPU) @ 6956.52/s (n=5000)
28 : beating -> beat 1 wallclock secs ( 1.40 usr + 0.00 sys = 1.40 CPU) @ 3575.42/s (n=5000)
29 : eats -> eat -1 wallclock secs ( 0.80 usr + 0.00 sys = 0.80 CPU) @ 6274.51/s (n=5000)
30 : blanket -> blanket 0 wallclock secs ( 0.98 usr + 0.00 sys = 0.98 CPU) @ 5079.37/s (n=5000)
31 : mortis -> morti 1 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 5289.26/s (n=5000)
32 : accites -> accit 1 wallclock secs ( 1.30 usr + 0.00 sys = 1.30 CPU) @ 3855.42/s (n=5000)
33 : bedchamber -> bedchamb 2 wallclock secs ( 1.38 usr + 0.00 sys = 1.38 CPU) @ 3636.36/s (n=5000)
34 : belt -> belt 1 wallclock secs ( 0.73 usr + 0.00 sys = 0.73 CPU) @ 6881.72/s (n=5000)
35 : enfeebles -> enfeebl 2 wallclock secs ( 1.46 usr + 0.00 sys = 1.46 CPU) @ 3422.46/s (n=5000)
36 : caesarion -> caesarion 1 wallclock secs ( 1.21 usr + 0.00 sys = 1.21 CPU) @ 4129.03/s (n=5000)
37 : strangle -> strangl 2 wallclock secs ( 1.44 usr + 0.00 sys = 1.44 CPU) @ 3478.26/s (n=5000)
38 : keiser -> keiser 1 wallclock secs ( 1.02 usr + 0.00 sys = 1.02 CPU) @ 4885.50/s (n=5000)
39 : wands -> wand 1 wallclock secs ( 0.87 usr + 0.00 sys = 0.87 CPU) @ 5765.77/s (n=5000)
40 : strikers -> striker 2 wallclock secs ( 1.23 usr + 0.01 sys = 1.23 CPU) @ 4050.63/s (n=5000)
41 : birthday -> birthdai 1 wallclock secs ( 1.29 usr + 0.00 sys = 1.29 CPU) @ 3878.79/s (n=5000)
42 : potting -> pot 1 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 5245.90/s (n=5000)
43 : successively -> success 2 wallclock secs ( 1.77 usr + 0.00 sys = 1.77 CPU) @ 2819.38/s (n=5000)
44 : awhile -> awhil 1 wallclock secs ( 1.13 usr + 0.00 sys = 1.13 CPU) @ 4413.79/s (n=5000)
45 : esteemed -> esteem 2 wallclock secs ( 1.21 usr + 0.00 sys = 1.21 CPU) @ 4129.03/s (n=5000)
46 : nephews -> nephew 1 wallclock secs ( 1.02 usr + 0.00 sys = 1.02 CPU) @ 4885.50/s (n=5000)
47 : weather -> weather 1 wallclock secs ( 1.12 usr + 0.00 sys = 1.12 CPU) @ 4475.52/s (n=5000)
48 : errate -> errat 2 wallclock secs ( 1.12 usr + 0.00 sys = 1.12 CPU) @ 4444.44/s (n=5000)
49 : unbridled -> unbridl 2 wallclock secs ( 1.28 usr + 0.00 sys = 1.28 CPU) @ 3902.44/s (n=5000)
50 : chins -> chin 2 wallclock secs ( 0.88 usr + 0.00 sys = 0.88 CPU) @ 5714.29/s (n=5000)
51 : heavily -> heavili 5 wallclock secs ( 1.25 usr + 0.00 sys = 1.25 CPU) @ 4000.00/s (n=5000)
52 : horn -> horn 2 wallclock secs ( 0.72 usr + 0.00 sys = 0.72 CPU) @ 6956.52/s (n=5000)
53 : justices -> justic 4 wallclock secs ( 1.38 usr + 0.00 sys = 1.38 CPU) @ 3636.36/s (n=5000)
54 : obstruct -> obstruct 2 wallclock secs ( 1.03 usr + 0.00 sys = 1.03 CPU) @ 4848.48/s (n=5000)
55 : afore -> afor 2 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 4705.88/s (n=5000)
56 : befriended -> befriend 4 wallclock secs ( 1.44 usr + 0.00 sys = 1.44 CPU) @ 3478.26/s (n=5000)
57 : slops -> slop 1 wallclock secs ( 0.87 usr + 0.00 sys = 0.87 CPU) @ 5765.77/s (n=5000)
58 : walks -> walk 2 wallclock secs ( 0.89 usr + 0.00 sys = 0.89 CPU) @ 5614.04/s (n=5000)
59 : samson -> samson 2 wallclock secs ( 0.88 usr + 0.00 sys = 0.88 CPU) @ 5714.29/s (n=5000)
60 : dries -> dri 1 wallclock secs ( 0.78 usr + 0.00 sys = 0.78 CPU) @ 6400.00/s (n=5000)
61 : seeming -> seem 3 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 4705.88/s (n=5000)
62 : these -> these 2 wallclock secs ( 1.17 usr + 0.00 sys = 1.17 CPU) @ 4266.67/s (n=5000)
63 : answer -> answer 1 wallclock secs ( 1.02 usr + 0.00 sys = 1.02 CPU) @ 4923.08/s (n=5000)
64 : corruptibly -> corrupt 2 wallclock secs ( 1.71 usr + 0.01 sys = 1.72 CPU) @ 2909.09/s (n=5000)
65 : abysm -> abysm 1 wallclock secs ( 0.81 usr + 0.00 sys = 0.81 CPU) @ 6153.85/s (n=5000)
66 : inclips -> inclip 0 wallclock secs ( 1.05 usr + 0.00 sys = 1.05 CPU) @ 4740.74/s (n=5000)
67 : whirling -> whirl 2 wallclock secs ( 1.15 usr + 0.00 sys = 1.15 CPU) @ 4353.74/s (n=5000)
68 : compile -> compil 3 wallclock secs ( 1.22 usr + 0.00 sys = 1.22 CPU) @ 4102.56/s (n=5000)
69 : whom -> whom 1 wallclock secs ( 0.71 usr + 0.00 sys = 0.71 CPU) @ 7032.97/s (n=5000)
70 : offert -> offert 3 wallclock secs ( 0.90 usr + 0.00 sys = 0.90 CPU) @ 5565.22/s (n=5000)
71 : bottomless -> bottomless 2 wallclock secs ( 1.23 usr + 0.00 sys = 1.23 CPU) @ 4076.43/s (n=5000)
72 : pudder -> pudder 1 wallclock secs ( 0.98 usr + 0.01 sys = 0.99 CPU) @ 5039.37/s (n=5000)
73 : summers -> summer 2 wallclock secs ( 1.12 usr + 0.00 sys = 1.12 CPU) @ 4444.44/s (n=5000)
74 : footboys -> footboi 2 wallclock secs ( 1.32 usr + 0.00 sys = 1.32 CPU) @ 3786.98/s (n=5000)
75 : mellowing -> mellow 2 wallclock secs ( 1.23 usr + 0.00 sys = 1.23 CPU) @ 4076.43/s (n=5000)
76 : spinners -> spinner 2 wallclock secs ( 1.23 usr + 0.00 sys = 1.23 CPU) @ 4050.63/s (n=5000)
77 : trinculo -> trinculo 1 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 4812.03/s (n=5000)
78 : scissors -> scissor 1 wallclock secs ( 1.12 usr + 0.00 sys = 1.12 CPU) @ 4475.52/s (n=5000)
79 : broking -> broke 0 wallclock secs ( 1.49 usr + 0.00 sys = 1.49 CPU) @ 3350.79/s (n=5000)
80 : erfraught -> erfraught 2 wallclock secs ( 1.14 usr + 0.00 sys = 1.14 CPU) @ 4383.56/s (n=5000)
81 : quire -> quir 1 wallclock secs ( 1.16 usr + 0.00 sys = 1.16 CPU) @ 4295.30/s (n=5000)
82 : massacres -> massacr 2 wallclock secs ( 1.46 usr + 0.01 sys = 1.47 CPU) @ 3404.26/s (n=5000)
83 : declin -> declin 1 wallclock secs ( 0.89 usr + 0.00 sys = 0.89 CPU) @ 5614.04/s (n=5000)
84 : mowing -> mow 1 wallclock secs ( 0.96 usr + 0.00 sys = 0.96 CPU) @ 5203.25/s (n=5000)
85 : thrower -> thrower 1 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) @ 4604.32/s (n=5000)
86 : doubled -> doubl 2 wallclock secs ( 1.49 usr + 0.00 sys = 1.49 CPU) @ 3350.79/s (n=5000)
87 : tertio -> tertio 3 wallclock secs ( 0.90 usr + 0.01 sys = 0.91 CPU) @ 5517.24/s (n=5000)
88 : deliv -> deliv 1 wallclock secs ( 0.82 usr + 0.00 sys = 0.82 CPU) @ 6095.24/s (n=5000)
89 : misery -> miseri 1 wallclock secs ( 1.14 usr + 0.00 sys = 1.14 CPU) @ 4383.56/s (n=5000)
90 : ns -> ns 1 wallclock secs ( 0.12 usr + 0.00 sys = 0.12 CPU) @ 42666.67/s (n=5000)
91 : peopled -> peopl 1 wallclock secs ( 1.14 usr + 0.00 sys = 1.14 CPU) @ 4383.56/s (n=5000)
92 : codpiece -> codpiec 2 wallclock secs ( 1.30 usr + 0.00 sys = 1.30 CPU) @ 3855.42/s (n=5000)
93 : palating -> palat 2 wallclock secs ( 1.38 usr + 0.00 sys = 1.38 CPU) @ 3636.36/s (n=5000)
94 : naples -> napl 2 wallclock secs ( 1.33 usr + 0.00 sys = 1.33 CPU) @ 3764.71/s (n=5000)
95 : liege -> lieg 1 wallclock secs ( 1.19 usr + 0.00 sys = 1.19 CPU) @ 4210.53/s (n=5000)
96 : everything -> everyth 2 wallclock secs ( 1.29 usr + 0.00 sys = 1.29 CPU) @ 3878.79/s (n=5000)
97 : goot -> goot 0 wallclock secs ( 0.71 usr + 0.00 sys = 0.71 CPU) @ 7032.97/s (n=5000)
98 : redeem -> redeem 1 wallclock secs ( 0.91 usr + 0.00 sys = 0.91 CPU) @ 5517.24/s (n=5000)
99 : restraint -> restraint 1 wallclock secs ( 1.15 usr + 0.00 sys = 1.15 CPU) @ 4353.74/s (n=5000)
100 : dolphin -> dolphin 1 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 5245.90/s (n=5000)
Average random cross-sectional stem rate for 100 words: 4550.95 Hz (n=500000).
-- perl-bench.txt --
1 : wiser -> wiser 14 wallclock secs ( 8.47 usr + 0.01 sys = 8.48 CPU) @ 5898.62/s (n=50000)
2 : colors -> color 11 wallclock secs ( 9.35 usr + 0.01 sys = 9.36 CPU) @ 5342.24/s (n=50000)
3 : doating -> doat 22 wallclock secs (13.92 usr + 0.00 sys = 13.92 CPU) @ 3591.47/s (n=50000)
4 : sweets -> sweet 10 wallclock secs ( 9.61 usr + -0.01 sys = 9.60 CPU) @ 5207.49/s (n=50000)
5 : annals -> annal 11 wallclock secs ( 9.88 usr + 0.00 sys = 9.88 CPU) @ 5063.29/s (n=50000)
6 : rushes -> rush 14 wallclock secs (12.98 usr + 0.01 sys = 12.99 CPU) @ 3848.47/s (n=50000)
7 : alarums -> alarum 12 wallclock secs (10.66 usr + 0.00 sys = 10.66 CPU) @ 4692.08/s (n=50000)
8 : humbler -> humbler 11 wallclock secs (10.79 usr + 0.00 sys = 10.79 CPU) @ 4634.32/s (n=50000)
9 : clepeth -> clepeth 12 wallclock secs ( 9.61 usr + 0.00 sys = 9.61 CPU) @ 5203.25/s (n=50000)
10 : tumbled -> tumbl 18 wallclock secs (14.77 usr + 0.00 sys = 14.77 CPU) @ 3386.24/s (n=50000)
Average random cross-sectional stem rate for 10 words: 4543.52 Hz (n=500000).
-- perl-bench.txt --
1 : desires -> desir 29 wallclock secs (24.45 usr + 0.00 sys = 24.45 CPU) @ 4090.76/s (n=100000)
2 : fretten -> fretten 20 wallclock secs (16.92 usr + 0.00 sys = 16.92 CPU) @ 5909.51/s (n=100000)
3 : call -> call 19 wallclock secs (17.03 usr + 0.00 sys = 17.03 CPU) @ 5871.56/s (n=100000)
4 : reasonless -> reasonless 24 wallclock secs (19.45 usr + 0.00 sys = 19.45 CPU) @ 5140.56/s (n=100000)
5 : shaping -> shape 32 wallclock secs (29.19 usr + 0.02 sys = 29.20 CPU) @ 3424.29/s (n=100000)
6 : monsieur -> monsieur 19 wallclock secs (17.70 usr + 0.00 sys = 17.70 CPU) @ 5651.21/s (n=100000)
7 : clouded -> cloud 27 wallclock secs (22.16 usr + 0.00 sys = 22.16 CPU) @ 4513.40/s (n=100000)
8 : gun -> gun 14 wallclock secs (11.68 usr + 0.00 sys = 11.68 CPU) @ 8561.87/s (n=100000)
9 : gloriously -> glorious 31 wallclock secs (28.62 usr + 0.01 sys = 28.63 CPU) @ 3492.50/s (n=100000)
10 : lustily -> lustili 26 wallclock secs (21.65 usr + 0.00 sys = 21.65 CPU) @ 4619.27/s (n=100000)
Average random cross-sectional stem rate for 10 words: 4787.73 Hz (n=1000000).
-- perl-bench.txt --
1 : barnacles -> barnacl 1 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 4063.49/s (n=2000)
2 : marchpane -> marchpan 0 wallclock secs ( 0.41 usr + 0.00 sys = 0.41 CPU) @ 4830.19/s (n=2000)
3 : between -> between 1 wallclock secs ( 0.33 usr + 0.00 sys = 0.33 CPU) @ 6095.24/s (n=2000)
4 : embassies -> embassi 0 wallclock secs ( 0.38 usr + 0.00 sys = 0.38 CPU) @ 5224.49/s (n=2000)
5 : admonition -> admonit 1 wallclock secs ( 0.45 usr + 0.00 sys = 0.45 CPU) @ 4413.79/s (n=2000)
6 : bangor -> bangor 0 wallclock secs ( 0.31 usr + 0.00 sys = 0.31 CPU) @ 6400.00/s (n=2000)
7 : couched -> couch 1 wallclock secs ( 0.45 usr + 0.00 sys = 0.45 CPU) @ 4491.23/s (n=2000)
8 : stare -> stare 0 wallclock secs ( 0.45 usr + 0.00 sys = 0.45 CPU) @ 4491.23/s (n=2000)
9 : voutsafe -> voutsaf 1 wallclock secs ( 0.45 usr + 0.00 sys = 0.45 CPU) @ 4491.23/s (n=2000)
10 : disease -> diseas 1 wallclock secs ( 0.44 usr + 0.00 sys = 0.44 CPU) @ 4571.43/s (n=2000)
...
997 : names -> name 0 wallclock secs ( 0.46 usr + 0.00 sys = 0.46 CPU) @ 4338.98/s (n=2000)
998 : prescriptions -> prescript 1 wallclock secs ( 0.57 usr + 0.00 sys = 0.57 CPU) @ 3506.85/s (n=2000)
999 : eyestrings -> eyestr 1 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 4063.49/s (n=2000)
1000 : separates -> separ 0 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 4063.49/s (n=2000)
Average random cross-sectional stem rate for 1000 words: 4985.59 Hz (n=2000000).
-- perl-bench.txt --
$ diff perl.txt perl-bench.txt
12a13,14
> use Benchmark;
>
108,120c110,119
< while (<>)
< {
< { /^([^a-zA-Z]*)(.*)/ ;
< print $1;
< $_ = $2;
< unless ( /^([a-zA-Z]+)(.*)/ ) { last; }
< $word = lc $1; # turn to lower case before calling:
< $_ = $2;
< $word = stem($word);
< print $word;
< redo;
< }
< print "\n";
--- > > my @word = grep chomp, <>; > @word = grep lc, @word; > my ($n,$pu,$ps) = (0,0,0); my $s = 1000; > for (1..$s) { > my $result; > my $w = @word[rand(scalar(@word))]; > my $t = timeit(2000, sub { $result = stem($w) } ); > print "$_\t: $w -> $result\t",timestr($t),"\n"; > $pu+=$t->[1]; $ps+=$t->[2]; $n+=$t->[5]; 121a121,122 > printf "Average random cross-sectional stem rate for $s words: %5.2f Hz (n=%d).\n", $n/($pu+$ps), $n; > ------------- porter.pm -- 1 : whistles -> whistl 2 wallclock secs ( 1.59 usr + 0.00 sys = 1.59 CPU) @ 3137.25/s (n=5000) 2 : sear -> sear 1 wallclock secs ( 0.80 usr + 0.00 sys = 0.80 CPU) @ 6213.59/s (n=5000) 3 : riseth -> riseth 1 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 4812.03/s (n=5000) 4 : equal -> equal 1 wallclock secs ( 1.05 usr + 0.00 sys = 1.05 CPU) @ 4776.12/s (n=5000) 5 : venue -> venu 2 wallclock secs ( 1.45 usr + 0.00 sys = 1.45 CPU) @ 3459.46/s (n=5000) 6 : cracked -> crack 2 wallclock secs ( 1.61 usr + 0.00 sys = 1.61 CPU) @ 3106.80/s (n=5000) 7 : marigold -> marigold 1 wallclock secs ( 1.10 usr + 0.00 sys = 1.10 CPU) @ 4539.01/s (n=5000) 8 : pimpernell -> pimpernel 2 wallclock secs ( 1.62 usr + 0.00 sys = 1.62 CPU) @ 3076.92/s (n=5000) 9 : respite -> respit 2 wallclock secs ( 1.40 usr + 0.00 sys = 1.40 CPU) @ 3575.42/s (n=5000) 10 : dispos -> dispo 1 wallclock secs ( 1.00 usr + 0.00 sys = 1.00 CPU) @ 5000.00/s (n=5000) 11 : nak -> nak 1 wallclock secs ( 0.76 usr + 0.00 sys = 0.76 CPU) @ 6597.94/s (n=5000) 12 : file -> file 1 wallclock secs ( 1.23 usr + 0.00 sys = 1.23 CPU) @ 4050.63/s (n=5000) 13 : pageants -> pageant 2 wallclock secs ( 1.35 usr + 0.00 sys = 1.35 CPU) @ 3699.42/s (n=5000) 14 : regards -> regard 1 wallclock secs ( 1.02 usr + 0.00 sys = 1.02 CPU) @ 4885.50/s (n=5000) 15 : exceed -> exce 2 wallclock secs ( 1.68 usr + 0.00 sys = 1.68 CPU) @ 2976.74/s (n=5000) 16 : spiritual -> spiritu 2 wallclock secs ( 1.49 usr + 0.00 sys = 1.49 CPU) @ 3350.79/s (n=5000) 17 : nothing -> noth 2 wallclock secs ( 1.66 usr + 0.00 sys = 1.66 CPU) @ 3018.87/s (n=5000) 18 : wake -> wake 1 wallclock secs ( 1.23 usr + 0.00 sys = 1.23 CPU) @ 4076.43/s (n=5000) 19 : shrewishly -> shrewishli 2 wallclock secs ( 1.49 usr + 0.00 sys = 1.49 CPU) @ 3350.79/s (n=5000) 20 : neglected -> neglect 2 wallclock secs ( 1.72 usr + 0.00 sys = 1.72 CPU) @ 2909.09/s (n=5000) 21 : untun -> untun 1 wallclock secs ( 0.94 usr + 0.00 sys = 0.94 CPU) @ 5333.33/s (n=5000) 22 : jaundice -> jaundic 2 wallclock secs ( 1.31 usr + 0.00 sys = 1.31 CPU) @ 3809.52/s (n=5000) 23 : pilfering -> pilfer 2 wallclock secs ( 1.98 usr + 0.00 sys = 1.98 CPU) @ 2519.69/s (n=5000) 24 : remark -> remark 1 wallclock secs ( 0.98 usr + 0.00 sys = 0.98 CPU) @ 5120.00/s (n=5000) 25 : palsies -> palsi 2 wallclock secs ( 1.05 usr + 0.00 sys = 1.05 CPU) @ 4776.12/s (n=5000) 26 : tributary -> tributari 1 wallclock secs ( 1.38 usr + 0.00 sys = 1.38 CPU) @ 3615.82/s (n=5000) 27 : spare -> spare 2 wallclock secs ( 1.41 usr + 0.00 sys = 1.41 CPU) @ 3535.91/s (n=5000) 28 : prologue -> prologu 2 wallclock secs ( 1.41 usr + 0.00 sys = 1.41 CPU) @ 3555.56/s (n=5000) 29 : inheritance -> inherit 1 wallclock secs ( 1.58 usr + 0.00 sys = 1.58 CPU) @ 3168.32/s (n=5000) 30 : permit -> permit 1 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 5289.26/s (n=5000) 31 : exorciser -> exorcis 1 wallclock secs ( 1.49 usr + 0.00 sys = 1.49 CPU) @ 3350.79/s (n=5000) 32 : spitting -> spit 0 wallclock secs ( 1.41 usr + 0.00 sys = 1.41 CPU) @ 3555.56/s (n=5000) 33 : lofty -> lofti 2 wallclock secs ( 1.19 usr + 0.00 sys = 1.19 CPU) @ 4210.53/s (n=5000) 34 : name -> name 1 wallclock secs ( 1.23 usr + 0.00 sys = 1.23 CPU) @ 4076.43/s (n=5000) 35 : lavender -> lavend 2 wallclock secs ( 1.41 usr + 0.00 sys = 1.41 CPU) @ 3535.91/s (n=5000) 36 : juliet -> juliet 1 wallclock secs ( 0.93 usr + 0.00 sys = 0.93 CPU) @ 5378.15/s (n=5000) 37 : allied -> alli 2 wallclock secs ( 1.57 usr + 0.00 sys = 1.57 CPU) @ 3184.08/s (n=5000) 38 : suppose -> suppos 1 wallclock secs ( 1.35 usr + 0.00 sys = 1.35 CPU) @ 3699.42/s (n=5000) 39 : variations -> variat 3 wallclock secs ( 1.97 usr + 0.00 sys = 1.97 CPU) @ 2539.68/s (n=5000) 40 : carelessness -> careless 2 wallclock secs ( 1.73 usr + 0.00 sys = 1.73 CPU) @ 2882.88/s (n=5000) 41 : mockery -> mockeri 1 wallclock secs ( 1.27 usr + 0.00 sys = 1.27 CPU) @ 3950.62/s (n=5000) 42 : actual -> actual 2 wallclock secs ( 1.29 usr + 0.00 sys = 1.29 CPU) @ 3878.79/s (n=5000) 43 : beldams -> beldam 1 wallclock secs ( 0.94 usr + 0.00 sys = 0.94 CPU) @ 5333.33/s (n=5000) 44 : tired -> tire 2 wallclock secs ( 1.75 usr + 0.00 sys = 1.75 CPU) @ 2857.14/s (n=5000) 45 : lym -> lym 1 wallclock secs ( 0.84 usr + 0.00 sys = 0.84 CPU) @ 5981.31/s (n=5000) 46 : bravely -> brave 2 wallclock secs ( 2.02 usr + 0.00 sys = 2.02 CPU) @ 2471.04/s (n=5000) 47 : unwish -> unwish 2 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @ 4961.24/s (n=5000) 48 : prizes -> prize 1 wallclock secs ( 1.55 usr + 0.00 sys = 1.55 CPU) @ 3232.32/s (n=5000) 49 : tackled -> tackl 0 wallclock secs ( 1.69 usr + 0.00 sys = 1.69 CPU) @ 2962.96/s (n=5000) 50 : antidote -> antidot 1 wallclock secs ( 1.45 usr + 0.00 sys = 1.45 CPU) @ 3440.86/s (n=5000) 51 : coarse -> coars 2 wallclock secs ( 1.53 usr + 0.00 sys = 1.53 CPU) @ 3265.31/s (n=5000) 52 : celebrates -> celebr 2 wallclock secs ( 1.48 usr + 0.00 sys = 1.48 CPU) @ 3368.42/s (n=5000) 53 : archbishop -> archbishop 1 wallclock secs ( 1.39 usr + 0.00 sys = 1.39 CPU) @ 3595.51/s (n=5000) 54 : oaten -> oaten 1 wallclock secs ( 0.88 usr + 0.00 sys = 0.88 CPU) @ 5663.72/s (n=5000) 55 : straiter -> straiter 1 wallclock secs ( 1.47 usr + 0.00 sys = 1.47 CPU) @ 3404.26/s (n=5000) 56 : unconfirmed -> unconfirm 2 wallclock secs ( 1.87 usr + 0.00 sys = 1.87 CPU) @ 2677.82/s (n=5000) 57 : banditto -> banditto 0 wallclock secs ( 1.13 usr + 0.00 sys = 1.13 CPU) @ 4413.79/s (n=5000) 58 : visited -> visit 2 wallclock secs ( 1.57 usr + 0.00 sys = 1.57 CPU) @ 3184.08/s (n=5000) 59 : imperfections -> imperfect 2 wallclock secs ( 1.81 usr + 0.00 sys = 1.81 CPU) @ 2758.62/s (n=5000) 60 : lieutenants -> lieuten 1 wallclock secs ( 1.40 usr + 0.00 sys = 1.40 CPU) @ 3575.42/s (n=5000) 61 : ponton -> ponton 1 wallclock secs ( 1.05 usr + 0.00 sys = 1.05 CPU) @ 4776.12/s (n=5000) 62 : express -> express 1 wallclock secs ( 1.16 usr + 0.00 sys = 1.16 CPU) @ 4324.32/s (n=5000) 63 : intruding -> intrud 2 wallclock secs ( 1.73 usr + 0.00 sys = 1.73 CPU) @ 2882.88/s (n=5000) 64 : cures -> cure 2 wallclock secs ( 1.27 usr + 0.00 sys = 1.27 CPU) @ 3926.38/s (n=5000) 65 : oxford -> oxford 1 wallclock secs ( 0.97 usr + 0.00 sys = 0.97 CPU) @ 5161.29/s (n=5000) 66 : disclosed -> disclos 2 wallclock secs ( 1.98 usr + 0.00 sys = 1.98 CPU) @ 2529.64/s (n=5000) 67 : noise -> nois 2 wallclock secs ( 1.45 usr + 0.00 sys = 1.45 CPU) @ 3459.46/s (n=5000) 68 : ushering -> usher 2 wallclock secs ( 1.70 usr + 0.00 sys = 1.70 CPU) @ 2935.78/s (n=5000) 69 : cudgeled -> cudgel 2 wallclock secs ( 1.66 usr + 0.00 sys = 1.66 CPU) @ 3004.69/s (n=5000) 70 : medal -> medal 1 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 4812.03/s (n=5000) 71 : enacts -> enact 1 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 5245.90/s (n=5000) 72 : discoursed -> discours 3 wallclock secs ( 1.97 usr + 0.00 sys = 1.97 CPU) @ 2539.68/s (n=5000) 73 : barely -> bare 2 wallclock secs ( 1.80 usr + 0.00 sys = 1.80 CPU) @ 2782.61/s (n=5000) 74 : warning -> warn 2 wallclock secs ( 1.61 usr + 0.00 sys = 1.61 CPU) @ 3106.80/s (n=5000) 75 : yorick -> yorick 1 wallclock secs ( 0.93 usr + 0.00 sys = 0.93 CPU) @ 5378.15/s (n=5000) 76 : pregnancy -> pregnanc 2 wallclock secs ( 2.16 usr + 0.00 sys = 2.16 CPU) @ 2310.47/s (n=5000) 77 : gleams -> gleam 1 wallclock secs ( 0.93 usr + 0.00 sys = 0.93 CPU) @ 5378.15/s (n=5000) 78 : unkindly -> unkindli 0 wallclock secs ( 1.30 usr + 0.00 sys = 1.30 CPU) @ 3832.34/s (n=5000) 79 : capels -> capel 1 wallclock secs ( 0.98 usr + 0.00 sys = 0.98 CPU) @ 5079.37/s (n=5000) 80 : broken -> broken 1 wallclock secs ( 0.93 usr + 0.00 sys = 0.93 CPU) @ 5378.15/s (n=5000) 81 : tenour -> tenour 2 wallclock secs ( 0.99 usr + 0.00 sys = 0.99 CPU) @ 5039.37/s (n=5000) 82 : untimely -> untim 2 wallclock secs ( 1.90 usr + 0.00 sys = 1.90 CPU) @ 2633.74/s (n=5000) 83 : endurance -> endur 2 wallclock secs ( 1.41 usr + 0.00 sys = 1.41 CPU) @ 3535.91/s (n=5000) 84 : furr -> furr 1 wallclock secs ( 0.88 usr + 0.00 sys = 0.88 CPU) @ 5663.72/s (n=5000) 85 : few -> few 1 wallclock secs ( 0.78 usr + 0.00 sys = 0.78 CPU) @ 6400.00/s (n=5000) 86 : flying -> fly 2 wallclock secs ( 1.66 usr + 0.00 sys = 1.66 CPU) @ 3018.87/s (n=5000) 87 : violent -> violent 1 wallclock secs ( 1.49 usr + 0.00 sys = 1.49 CPU) @ 3350.79/s (n=5000) 88 : somewhither -> somewhith 2 wallclock secs ( 1.64 usr + 0.00 sys = 1.64 CPU) @ 3047.62/s (n=5000) 89 : condemned -> condemn 2 wallclock secs ( 1.77 usr + 0.00 sys = 1.77 CPU) @ 2831.86/s (n=5000) 90 : enjoying -> enjoi 3 wallclock secs ( 1.80 usr + 0.00 sys = 1.80 CPU) @ 2770.56/s (n=5000) 91 : patches -> patch 1 wallclock secs ( 1.53 usr + 0.00 sys = 1.53 CPU) @ 3265.31/s (n=5000) 92 : vestal -> vestal 0 wallclock secs ( 1.31 usr + 0.00 sys = 1.31 CPU) @ 3809.52/s (n=5000) 93 : weeds -> weed 1 wallclock secs ( 1.03 usr + 0.00 sys = 1.03 CPU) @ 4848.48/s (n=5000) 94 : sensibly -> sensibl 3 wallclock secs ( 2.14 usr + 0.02 sys = 2.16 CPU) @ 2318.84/s (n=5000) 95 : incertainties -> incertainti 2 wallclock secs ( 1.26 usr + 0.00 sys = 1.26 CPU) @ 3975.16/s (n=5000) 96 : strangle -> strangl 2 wallclock secs ( 1.69 usr + 0.00 sys = 1.69 CPU) @ 2962.96/s (n=5000) 97 : whirlwinds -> whirlwind 2 wallclock secs ( 1.39 usr + 0.00 sys = 1.39 CPU) @ 3595.51/s (n=5000) 98 : surpassing -> surpass 2 wallclock secs ( 1.96 usr + 0.00 sys = 1.96 CPU) @ 2549.80/s (n=5000) 99 : picklock -> picklock 1 wallclock secs ( 1.03 usr + 0.00 sys = 1.03 CPU) @ 4848.48/s (n=5000) 100 : pretext -> pretext 1 wallclock secs ( 1.03 usr + 0.00 sys = 1.03 CPU) @ 4848.48/s (n=5000) Average random cross-sectional stem rate for 100 words: 3618.07 Hz (n=500000). -- porter.pm -- 1 : baggage -> baggag 12 wallclock secs (11.95 usr + 0.00 sys = 11.95 CPU) @ 4183.01/s (n=50000) 2 : remarkable -> remark 15 wallclock secs (13.28 usr + 0.00 sys = 13.28 CPU) @ 3764.71/s (n=50000) 3 : boson -> boson 10 wallclock secs ( 8.77 usr + 0.00 sys = 8.77 CPU) @ 5699.02/s (n=50000) 4 : reynaldo -> reynaldo 10 wallclock secs ( 9.34 usr + 0.00 sys = 9.34 CPU) @ 5355.65/s (n=50000) 5 : warlike -> warlik 14 wallclock secs (12.09 usr + 0.01 sys = 12.10 CPU) @ 4131.70/s (n=50000) 6 : ambuscadoes -> ambuscado 17 wallclock secs (14.21 usr + 0.00 sys = 14.21 CPU) @ 3518.42/s (n=50000) 7 : title -> titl 16 wallclock secs (14.10 usr + 0.00 sys = 14.10 CPU) @ 3545.71/s (n=50000) 8 : doctors -> doctor 13 wallclock secs (10.71 usr + 0.00 sys = 10.71 CPU) @ 4668.13/s (n=50000) 9 : witches -> witch 18 wallclock secs (15.42 usr + 0.00 sys = 15.42 CPU) @ 3242.15/s (n=50000) 10 : weighed -> weigh 17 wallclock secs (15.34 usr + 0.00 sys = 15.34 CPU) @ 3258.66/s (n=50000) Average random cross-sectional stem rate for 10 words: 3992.51 Hz (n=500000) -- porter.pm -- 1 : inlaid -> inlaid 16 wallclock secs (13.85 usr + 0.00 sys = 13.85 CPU) @ 7219.40/s (n=100000) 2 : contrite -> contrit 28 wallclock secs (25.52 usr + 0.01 sys = 25.53 CPU) @ 3916.77/s (n=100000) 3 : embrac -> embrac 14 wallclock secs (12.86 usr + 0.00 sys = 12.86 CPU) @ 7776.43/s (n=100000) 4 : emhracing -> emhrac 32 wallclock secs (27.19 usr + 0.00 sys = 27.19 CPU) @ 3678.16/s (n=100000) 5 : servanted -> servant 40 wallclock secs (34.54 usr + 0.00 sys = 34.54 CPU) @ 2895.27/s (n=100000) 6 : cataplasm -> cataplasm 20 wallclock secs (15.44 usr + 0.02 sys = 15.46 CPU) @ 6467.91/s (n=100000) 7 : uncoined -> uncoin 34 wallclock secs (27.62 usr + 0.03 sys = 27.66 CPU) @ 3615.82/s (n=100000) 8 : crowkeeper -> crowkeep 27 wallclock secs (22.81 usr + 0.02 sys = 22.83 CPU) @ 4380.56/s (n=100000) 9 : leaven -> leaven 19 wallclock secs (16.48 usr + 0.00 sys = 16.48 CPU) @ 6066.35/s (n=100000) 10 : speech -> speech 15 wallclock secs (13.70 usr + 0.00 sys = 13.70 CPU) @ 7297.61/s (n=100000) Average random cross-sectional stem rate for 10 words: 4759.60 Hz (n=1000000). -- porter.pm -- 1 : appeared -> appear 0 wallclock secs ( 0.52 usr + 0.00 sys = 0.52 CPU) @ 3820.90/s (n=2000) 2 : andirons -> andiron 1 wallclock secs ( 0.43 usr + 0.00 sys = 0.43 CPU) @ 4654.55/s (n=2000) 3 : art -> art 0 wallclock secs ( 0.31 usr + 0.00 sys = 0.31 CPU) @ 6400.00/s (n=2000) 4 : greeks -> greek 1 wallclock secs ( 0.39 usr + 0.00 sys = 0.39 CPU) @ 5120.00/s (n=2000) 5 : unmusical -> unmus 0 wallclock secs ( 0.66 usr + 0.00 sys = 0.66 CPU) @ 3047.62/s (n=2000) 6 : executor -> executor 1 wallclock secs ( 0.38 usr + 0.00 sys = 0.38 CPU) @ 5224.49/s (n=2000) 7 : cetera -> cetera 0 wallclock secs ( 0.35 usr + 0.00 sys = 0.35 CPU) @ 5688.89/s (n=2000) 8 : depositaries -> depositari 1 wallclock secs ( 0.46 usr + 0.00 sys = 0.46 CPU) @ 4338.98/s (n=2000) 9 : intellectual -> intellectu 1 wallclock secs ( 0.70 usr + 0.00 sys = 0.70 CPU) @ 2876.40/s (n=2000) 10 : road -> road 0 wallclock secs ( 0.30 usr + 0.00 sys = 0.30 CPU) @ 6736.84/s (n=2000) ... 983 : flaring -> flare 1 wallclock secs ( 0.80 usr + 0.00 sys = 0.80 CPU) @ 2485.44/s (n=2000) !!! 994 : barnacles -> barnacl 1 wallclock secs ( 0.60 usr + 0.00 sys = 0.60 CPU) @ 3324.68/s (n=2000) 996 : swag -> swag 0 wallclock secs ( 0.30 usr + 0.00 sys = 0.30 CPU) @ 6564.10/s (n=2000) 997 : film -> film 0 wallclock secs ( 0.33 usr + 0.00 sys = 0.33 CPU) @ 6095.24/s (n=2000) 998 : quests -> quest 1 wallclock secs ( 0.42 usr + 0.00 sys = 0.42 CPU) @ 4740.74/s (n=2000) 999 : crests -> crest 0 wallclock secs ( 0.41 usr + 0.00 sys = 0.41 CPU) @ 4830.19/s (n=2000) 1000 : audre -> audr 1 wallclock secs ( 0.57 usr + 0.00 sys = 0.57 CPU) @ 3506.85/s (n=2000) Average random cross-sectional stem rate for 1000 words: 4082.41 Hz (n=2000000). -- bench-porter.pm.pl -- #!/usr/bin/perl require "./porter.pm"; use Benchmark;
my @word = grep chomp, <>; my ($n,$pu,$ps) = (0,0,0); my $s = 100; for (1..$s) { my $result; my $w = @word[rand(scalar(@word))]; my $t = timeit(5000, sub { $result = porter($w) } ); print "$_\t: $w -> $result\t",timestr($t),"\n"; $pu+=$t->[1]; $ps+=$t->[2]; $n+=$t->[5]; } printf "Average random cross-sectional stem rate for $s words: %5.2f Hz (n=%d).\n", $n/($pu+$ps), $n; -----------
-- stem.pl/Text::English -- 1 : helena -> helena 1 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @ 4961.24/s (n=5000) 2 : sallies -> sally 2 wallclock secs ( 1.41 usr + 0.00 sys = 1.41 CPU) @ 3555.56/s (n=5000) 3 : conducting -> conduct 1 wallclock secs ( 1.42 usr + 0.00 sys = 1.42 CPU) @ 3516.48/s (n=5000) 4 : turpitude -> turpitud 2 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) @ 4604.32/s (n=5000) 5 : velutus -> velutu 1 wallclock secs ( 1.12 usr + 0.00 sys = 1.12 CPU) @ 4444.44/s (n=5000) 6 : incestuous -> incestu 2 wallclock secs ( 1.32 usr + 0.00 sys = 1.32 CPU) @ 3786.98/s (n=5000) 7 : rivers -> river 1 wallclock secs ( 1.21 usr + 0.00 sys = 1.21 CPU) @ 4129.03/s (n=5000) 8 : ear -> ear 1 wallclock secs ( 0.82 usr + 0.00 sys = 0.82 CPU) @ 6095.24/s (n=5000) 9 : cowslips -> cowslip 1 wallclock secs ( 1.12 usr + 0.00 sys = 1.12 CPU) @ 4444.44/s (n=5000) 10 : mir -> mir 1 wallclock secs ( 0.80 usr + 0.00 sys = 0.80 CPU) @ 6274.51/s (n=5000) 11 : robas -> roba 1 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 4812.03/s (n=5000) 12 : student -> student 1 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 4155.84/s (n=5000) 13 : religiously -> religy 2 wallclock secs ( 1.75 usr + 0.00 sys = 1.75 CPU) @ 2857.14/s (n=5000) 14 : sty -> sty 1 wallclock secs ( 0.82 usr + 0.00 sys = 0.82 CPU) @ 6095.24/s (n=5000) 15 : epistrophus -> epistrophu 2 wallclock secs ( 1.15 usr + 0.00 sys = 1.15 CPU) @ 4353.74/s (n=5000) 16 : defunct -> defunct 1 wallclock secs ( 0.98 usr + 0.00 sys = 0.98 CPU) @ 5079.37/s (n=5000) 17 : compell -> compel 1 wallclock secs ( 1.14 usr + 0.00 sys = 1.14 CPU) @ 4383.56/s (n=5000) 18 : lovely -> love 2 wallclock secs ( 1.42 usr + 0.00 sys = 1.42 CPU) @ 3516.48/s (n=5000) 19 : sycorax -> sycorax 1 wallclock secs ( 0.98 usr + 0.00 sys = 0.98 CPU) @ 5120.00/s (n=5000) 20 : jewel -> jewel 1 wallclock secs ( 0.98 usr + 0.00 sys = 0.98 CPU) @ 5079.37/s (n=5000) 21 : patient -> paty 2 wallclock secs ( 1.36 usr + 0.00 sys = 1.36 CPU) @ 3678.16/s (n=5000) 22 : wish -> wish 1 wallclock secs ( 0.90 usr + 0.00 sys = 0.90 CPU) @ 5565.22/s (n=5000) 23 : tarquins -> tarquin 1 wallclock secs ( 1.12 usr + 0.00 sys = 1.12 CPU) @ 4444.44/s (n=5000) 24 : sharded -> shard 2 wallclock secs ( 1.38 usr + 0.00 sys = 1.38 CPU) @ 3636.36/s (n=5000) 25 : compelled -> compel 2 wallclock secs ( 1.58 usr + 0.00 sys = 1.58 CPU) @ 3168.32/s (n=5000) 26 : starved -> starv 2 wallclock secs ( 1.38 usr + 0.00 sys = 1.38 CPU) @ 3636.36/s (n=5000) 27 : starveth -> starveth 1 wallclock secs ( 1.02 usr + 0.00 sys = 1.02 CPU) @ 4923.08/s (n=5000) 28 : shapen -> shapen 1 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @ 4961.24/s (n=5000) 29 : unforc -> unforc 1 wallclock secs ( 0.96 usr + 0.00 sys = 0.96 CPU) @ 5203.25/s (n=5000) 30 : tart -> tart 1 wallclock secs ( 0.89 usr + 0.00 sys = 0.89 CPU) @ 5614.04/s (n=5000) ... 93 : cashier -> cashy 1 wallclock secs ( 1.30 usr + 0.00 sys = 1.30 CPU) @ 3855.42/s (n=5000) 94 : reacheth -> reacheth 1 wallclock secs ( 1.02 usr + 0.00 sys = 1.02 CPU) @ 4885.50/s (n=5000) 95 : prosecution -> prosecut 2 wallclock secs ( 1.23 usr + 0.00 sys = 1.23 CPU) @ 4076.43/s (n=5000) 96 : engross -> engross 1 wallclock secs ( 1.00 usr + 0.00 sys = 1.00 CPU) @ 5000.00/s (n=5000) 97 : anthems -> anthem 1 wallclock secs ( 1.15 usr + 0.00 sys = 1.15 CPU) @ 4353.74/s (n=5000) 98 : could -> could 0 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 5289.26/s (n=5000) 99 : undeck -> undeck 1 wallclock secs ( 1.00 usr + 0.00 sys = 1.00 CPU) @ 5000.00/s (n=5000) 100 : across -> across 1 wallclock secs ( 0.98 usr + 0.00 sys = 0.98 CPU) @ 5079.37/s (n=5000) Average random cross-sectional stem rate for 100 words: 4254.75 Hz (n=500000). -- stem.pl/Text::English -- 1 : mother -> mother 12 wallclock secs (11.31 usr + 0.00 sys = 11.31 CPU) @ 4419.89/s (n=50000) 2 : violet -> violet 18 wallclock secs (10.07 usr + 0.01 sys = 10.08 CPU) @ 4961.24/s (n=50000) 3 : adriano -> adriano 18 wallclock secs (10.10 usr + 0.02 sys = 10.12 CPU) @ 4942.08/s (n=50000) 4 : grizzle -> grizzl 18 wallclock secs (11.30 usr + 0.00 sys = 11.30 CPU) @ 4426.00/s (n=50000) 5 : eel -> eel 15 wallclock secs ( 8.34 usr + 0.00 sys = 8.34 CPU) @ 5998.13/s (n=50000) 6 : felonious -> felony 17 wallclock secs (14.95 usr + 0.01 sys = 14.95 CPU) @ 3343.78/s (n=50000) 7 : goldsmith -> goldsmith 11 wallclock secs ( 9.98 usr + 0.00 sys = 9.98 CPU) @ 5007.82/s (n=50000) 8 : sepulchring -> sepulchr 15 wallclock secs (14.25 usr + 0.00 sys = 14.25 CPU) @ 3508.77/s (n=50000) 9 : justle -> justl 13 wallclock secs (10.95 usr + 0.00 sys = 10.95 CPU) @ 4564.91/s (n=50000) 10 : suggested -> suggest 17 wallclock secs (14.45 usr + 0.00 sys = 14.45 CPU) @ 3461.33/s (n=50000) Average random cross-sectional stem rate for 10 words: 4320.53 Hz (n=500000). -- stem.pl/Text::English -- 1 : limander -> limand 33 wallclock secs (22.02 usr + 0.01 sys = 22.03 CPU) @ 4539.01/s (n=100000) 2 : sale -> sale 28 wallclock secs (19.95 usr + 0.00 sys = 19.95 CPU) @ 5011.75/s (n=100000) 3 : blackberries -> blackberry 52 wallclock secs (28.19 usr + 0.04 sys = 28.23 CPU) @ 3542.76/s (n=100000) 4 : tarquin -> tarquin 26 wallclock secs (19.05 usr + 0.02 sys = 19.07 CPU) @ 5243.75/s (n=100000) 5 : unless -> unless 25 wallclock secs (20.72 usr + 0.01 sys = 20.73 CPU) @ 4824.73/s (n=100000) 6 : rascally -> rascal 45 wallclock secs (27.67 usr + 0.02 sys = 27.70 CPU) @ 3610.72/s (n=100000) 7 : carelessness -> careless 29 wallclock secs (26.04 usr + 0.00 sys = 26.04 CPU) @ 3840.38/s (n=100000) 8 : assubjugate -> assubjug 26 wallclock secs (23.68 usr + 0.00 sys = 23.68 CPU) @ 4223.03/s (n=100000) 9 : thorny -> thorny 31 wallclock secs (27.62 usr + 0.01 sys = 27.62 CPU) @ 3619.91/s (n=100000) 10 : trespasses -> trespass 26 wallclock secs (22.01 usr + 0.00 sys = 22.01 CPU) @ 4543.84/s (n=100000) Average random cross-sectional stem rate for 10 words: 4218.44 Hz (n=1000000). -- stem.pl/Text::English -- 1 : misbhav -> misbhav 0 wallclock secs ( 0.38 usr + 0.00 sys = 0.38 CPU) @ 5333.33/s (n=2000) 2 : insinuateth -> insinuateth 1 wallclock secs ( 0.42 usr + 0.00 sys = 0.42 CPU) @ 4740.74/s (n=2000) 3 : never -> never 1 wallclock secs ( 0.43 usr + 0.01 sys = 0.44 CPU) @ 4571.43/s (n=2000) 4 : surveyor -> surveyor 1 wallclock secs ( 0.42 usr + 0.00 sys = 0.42 CPU) @ 4740.74/s (n=2000) 5 : tir -> tir 0 wallclock secs ( 0.32 usr + 0.00 sys = 0.32 CPU) @ 6243.90/s (n=2000) 6 : slumbers -> slumber 1 wallclock secs ( 0.52 usr + 0.00 sys = 0.52 CPU) @ 3878.79/s (n=2000) 7 : gallus -> gallu 1 wallclock secs ( 0.44 usr + 0.00 sys = 0.44 CPU) @ 4571.43/s (n=2000) ... 993 : contemptible -> contempt 1 wallclock secs ( 0.48 usr + 0.00 sys = 0.48 CPU) @ 4196.72/s (n=2000) 994 : sensual -> sensu 0 wallclock secs ( 0.41 usr + 0.00 sys = 0.41 CPU) @ 4830.19/s (n=2000) 995 : jeer -> jeer 1 wallclock secs ( 0.39 usr + 0.00 sys = 0.39 CPU) @ 5120.00/s (n=2000) 996 : holden -> holden 0 wallclock secs ( 0.41 usr + 0.00 sys = 0.41 CPU) @ 4923.08/s (n=2000) 997 : weakling -> weakl 1 wallclock secs ( 0.55 usr + 0.00 sys = 0.55 CPU) @ 3657.14/s (n=2000) 998 : cormorant -> cormor 1 wallclock secs ( 0.45 usr + 0.00 sys = 0.45 CPU) @ 4491.23/s (n=2000) 999 : affianc -> affianc 0 wallclock secs ( 0.38 usr + 0.00 sys = 0.38 CPU) @ 5224.49/s (n=2000) 1000 : fastolfe -> fastolf 1 wallclock secs ( 0.44 usr + 0.00 sys = 0.44 CPU) @ 4571.43/s (n=2000) Average random cross-sectional stem rate for 1000 words: 4384.46 Hz (n=2000000). -- bench-stem.pl.pl -- #!/usr/bin/perl require "./stem.pl"; #use Text::English; # The same thing use Benchmark;
my @word = grep chomp, <>; my ($n,$pu,$ps) = (0,0,0); my $s = 100; for (1..$s) { my $result; my $w = @word[rand(scalar(@word))]; my $t = timeit(2000, sub { ($result) = stem($w) } ); print "$_\t: $w -> $result\t",timestr($t),"\n"; $pu+=$t->[1]; $ps+=$t->[2]; $n+=$t->[5]; } printf "Average random cross-sectional stem rate for $s words: %5.2f Hz (n=%d).\n", $n/($pu+$ps), $n; -----------
-- Lingua::Stem -- (Like I suspected -- all that subroutine and reference overhead really bogs this one down.. Sorry to the authors, but I could really see that coming performance wise :-/, feature wise yours is the best! )
1 : rowel -> rowel 3 wallclock secs ( 2.01 usr + 0.00 sys = 2.01 CPU) @ 2490.27/s (n=5000) 2 : fantasies -> fantasi 2 wallclock secs ( 2.24 usr + 0.00 sys = 2.24 CPU) @ 2229.97/s (n=5000) 3 : bud -> bud 2 wallclock secs ( 1.97 usr + 0.00 sys = 1.97 CPU) @ 2539.68/s (n=5000) 4 : ages -> ag 2 wallclock secs ( 2.48 usr + 0.00 sys = 2.48 CPU) @ 2012.58/s (n=5000) 5 : locking -> lock 3 wallclock secs ( 2.28 usr + 0.00 sys = 2.28 CPU) @ 2191.78/s (n=5000) ... 97 : conveniently -> conveni 2 wallclock secs ( 2.93 usr + 0.00 sys = 2.93 CPU) @ 1706.67/s (n=5000) 98 : spiders -> spider 3 wallclock secs ( 2.38 usr + 0.00 sys = 2.38 CPU) @ 2098.36/s (n=5000) 99 : tu -> tu 2 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 2601.63/s (n=5000) 100 : beadle -> beadl 4 wallclock secs ( 2.40 usr + 0.00 sys = 2.40 CPU) @ 2084.69/s (n=5000) Average random cross-sectional stem rate for 100 words: 2233.86 Hz (n=500000).
-- Lingua::Stem -- 1 : candidatus -> candidatu 24 wallclock secs (22.45 usr + 0.00 sys = 22.45 CPU) @ 2227.64/s (n=50000) 2 : desiring -> desir 29 wallclock secs (23.02 usr + 0.02 sys = 23.04 CPU) @ 2170.23/s (n=50000) 3 : sore -> sore 30 wallclock secs (20.47 usr + 0.00 sys = 20.47 CPU) @ 2442.75/s (n=50000) 4 : nuncio -> nuncio 28 wallclock secs (19.89 usr + 0.02 sys = 19.91 CPU) @ 2511.77/s (n=50000) 5 : wreaks -> wreak 29 wallclock secs (22.45 usr + 0.06 sys = 22.52 CPU) @ 2220.68/s (n=50000) 6 : fans -> fan 32 wallclock secs (22.41 usr + 0.02 sys = 22.44 CPU) @ 2228.41/s (n=50000) 7 : deem -> deem 26 wallclock secs (19.77 usr + 0.00 sys = 19.77 CPU) @ 2529.64/s (n=50000) 8 : paphos -> papho 30 wallclock secs (22.51 usr + 0.00 sys = 22.51 CPU) @ 2221.45/s (n=50000) 9 : promis -> promi 29 wallclock secs (22.59 usr + 0.01 sys = 22.59 CPU) @ 2213.00/s (n=50000) 10 : smoky -> smoki 29 wallclock secs (22.24 usr + 0.00 sys = 22.24 CPU) @ 2247.98/s (n=50000) Average random cross-sectional stem rate for 10 words: 2294.40 Hz (n=500000). -- Lingua::Stem -- 1 : asher -> asher 62 wallclock secs (41.34 usr + 0.05 sys = 41.38 CPU) @ 2416.46/s (n=100000) 2 : learns -> learn 61 wallclock secs (44.96 usr + 0.02 sys = 44.98 CPU) @ 2222.99/s (n=100000) 3 : forswearing -> forswear 65 wallclock secs (46.13 usr + 0.02 sys = 46.15 CPU) @ 2166.92/s (n=100000) 4 : theatre -> theatr 72 wallclock secs (46.54 usr + 0.01 sys = 46.55 CPU) @ 2148.37/s (n=100000) 5 : corpse -> corps 69 wallclock secs (48.18 usr + 0.02 sys = 48.20 CPU) @ 2074.89/s (n=100000) 6 : copied -> copi 53 wallclock secs (45.23 usr + 0.02 sys = 45.25 CPU) @ 2209.94/s (n=100000) 7 : cogging -> cog 67 wallclock secs (47.83 usr + 0.02 sys = 47.85 CPU) @ 2089.80/s (n=100000) 8 : absolute -> absolut 57 wallclock secs (46.70 usr + 0.00 sys = 46.70 CPU) @ 2141.54/s (n=100000) 9 : forswearing -> forswear 56 wallclock secs (45.98 usr + 0.00 sys = 45.98 CPU) @ 2175.02/s (n=100000) 10 : withering -> wither 62 wallclock secs (48.95 usr + 0.02 sys = 48.96 CPU) @ 2042.44/s (n=100000) Average random cross-sectional stem rate for 10 words: 2164.54 Hz (n=1000000). -- Lingua::Stem -- ... -- bench-lingua-stem.pl -- #!/usr/bin/perl use Lingua::Stem qw(:all); use Benchmark;
my @word = grep chomp, <>; my ($n,$pu,$ps) = (0,0,0); my $s = 10; for (1..$s) { my $result; my $w = @word[rand(scalar(@word))]; my $t = timeit(100000, sub { ($result) = @{stem($w)} } ); print "$_\t: $w -> $result\t",timestr($t),"\n"; $pu+=$t->[1]; $ps+=$t->[2]; $n+=$t->[5]; } printf "Average random cross-sectional stem rate for $s words: %5.2f Hz (n=%d).\n", $n/($pu+$ps), $n; -----------
-- Allan Fields
_______________________________________________ Snowball-discuss mailing list Snowball-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/snowball-discuss
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:41 BST