Re: [Snowball-discuss] Java stemmers

From: Richard Boulton (richard@tartarus.org)
Date: Sun Jan 27 2002 - 16:51:18 GMT


Just to note: I had a quick fiddle with the Java stemmer support
application, and seem to have solved (most of) the slowness problems.

I modified the applications so that it explicitly reads in blocks of 8k,
(rather than using a BufferedInputStream, which I would have thought
would internally buffer, but didn't seem to). This improves the run
time to 0.8 seconds for stemming english/voc.txt, and 0.2 seconds for
each repetition of the stem step. This is much closer to what would be
hoped for; a great deal of the remaining time could be setup time, but
this performance should be good enough for now.

See:
http://cvs.sf.net/cgi-bin/viewcvs.cgi/snowball/website/net/sf/snowball/TestApp.java.diff?r1=1.2&r2=1.3
for the patch I applied.

Still to be done is to implement flow analysis so that the java compiler
won't complain about unreachable code: this should be reasonably simple,
but if anyone is desperate for a particular stemmer in the mean time,
they can always edit the generated code to remove unreachable
statements.

-- 
Richard

_______________________________________________ Snowball-discuss mailing list Snowball-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/snowball-discuss



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST