Re: [Snowball-discuss] Mismatch between vocab.txt and output.txt

From: Oleg Bartunov (
Date: Mon Oct 14 2002 - 15:09:01 BST

On Mon, 14 Oct 2002, Olly Betts wrote:

> On Mon, Oct 14, 2002 at 02:26:09PM +0100, Olly Betts wrote:
> > I generated my stemmers from the ".sbl" sources, but the difference from
> > the finnish stem.c on the website are just in the function names. Most
> > odd - I'll see if I can work out what's going on.
> I've found the problem, and it wasn't Snowball at fault at all.
> There was a mismatch in the order of the stemmers in a table in my own
> code - I had "french" and "finnish" switched, so I was stemming finnish
> with the french stemmer (and vice versa).

Olly, it's interesting how do you decide which stemmer to use.
As I understand, stemmer in definition uderstand any word !
So, I don't see any chance to stem bilingual documents. Luckily,
we could distinguish russian and english using character code, but
in french-english case it's impossible. As a workaround it's possible
to use simple combined stemmer which has all endings (french+english, for
example) and cut the longest ending. This should works if apply the same
stemmer for query and index.

> Sorry about the false alarm.
> Cheers,
> Olly
> _______________________________________________
> Snowball-discuss mailing list

Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
phone: +007(095)939-16-83, +007(095)939-23-83

This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:43 BST