Re: [Snowball-discuss] Mismatch between vocab.txt and output.txt

From: Oleg Bartunov (oleg@sai.msu.su)
Date: Mon Oct 14 2002 - 15:09:01 BST


On Mon, 14 Oct 2002, Olly Betts wrote:

> On Mon, Oct 14, 2002 at 02:26:09PM +0100, Olly Betts wrote:
> > I generated my stemmers from the ".sbl" sources, but the difference from
> > the finnish stem.c on the website are just in the function names. Most
> > odd - I'll see if I can work out what's going on.
>
> I've found the problem, and it wasn't Snowball at fault at all.
>
> There was a mismatch in the order of the stemmers in a table in my own
> code - I had "french" and "finnish" switched, so I was stemming finnish
> with the french stemmer (and vice versa).

Olly, it's interesting how do you decide which stemmer to use.
As I understand, stemmer in definition uderstand any word !
So, I don't see any chance to stem bilingual documents. Luckily,
we could distinguish russian and english using character code, but
in french-english case it's impossible. As a workaround it's possible
to use simple combined stemmer which has all endings (french+english, for
example) and cut the longest ending. This should works if apply the same
stemmer for query and index.

>
> Sorry about the false alarm.
>
> Cheers,
> Olly
>
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss@lists.tartarus.org
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss
>

        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:43 BST