[Snowball-discuss] Re: Possible memory leak in Snowballs Java stemmer

From: Richard Boulton (richard@tartarus.org)
Date: Thu May 27 2004 - 18:36:02 BST


Wolfram,

Sorry for the slow reply - for some reason your email didn't make it to
my machine, so I only found out about it today when Martin pointed it
out to me.

I havn't done any Java work on snowball for a fairly long time, but your
analysis makes sense to me. One thing I'm not sure about is whether
this is a general problem that you're experiencing, or whether it is
just an issue with your version of the JVM (or rather, its associated
class library). I can imagine that other implementations of Java might
handle the stringbuffer allocation differently. (Or maybe the behaviour
is specified by the Java specification?)

I'm fairly happy to include your changes, but slightly worried that, for
a version of Java which didn't exhibit the resource usage problems
you're seeing when making hollow strings from stringbuffers, your
changes would force an unnecessary string copy.

I wonder if making a new StringBuffer in setCurrent(), rather than
modifying the existing StringBuffer, would fix the problem. I fear that
this would cause lots of temporary objects to be created, which could be
less efficient (by making lots of work for the garbage collector to do).

>>Either the user or you library can do something like this
>> String myStem = new String( germanStemmer.getCurrent());

This has the advantage of not forcing a string copy for applications
where only a few stems are being calculated. However, it's an ugly
workaround for a problem in Java, IMHO.

>>I really would like to hear from your team, if you could reproduce my
>>problem and find the solution helpful.

I havn't got time to try and reproduce the problem right now. What
would be very helpful would be if you could send a minimal Java program
which exhibited the problem for you. (I imagine something like calling
stem multiple times on a given word, and storing the result in a vector
would be an appropriate approach.) I could then verify that it's not
just your Java setup which exhibits the problem.

>>Or did I overlook some other (memory saving) means of getting the
>>desired stem?

Not that I can think of.

Comments would be welcome from any Java experts on the list.

-- 
Richard



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST