Re: [Snowball-discuss] two results

From: Richard Boulton (richard@tartarus.org)
Date: Mon Oct 06 2003 - 12:50:02 BST


Boštjan Jerko wrote:
> Is it possible to get two stems for one word?
> In Slovene there is a possibility to stem word (e.g. "zelodec) in two ways ("zelodc, "zelod"c).

No, all current stemming algorithms return one stemmed version of each
input word.

I'm not sure what it would mean to have two possible stemmed forms -
stemming is a normalising process, used to determine if differing
versions of a word share a common root.

Why do you think having multiple stemmed forms might be neccessary?

My thought is that a possible situation where this might be useful would
be as follows:

We have two stemmed words, "A" and "B", with quite distinct meanings.
There exists at least one word "A_" which should stem to "A".
There exists at least one word "B_" which should stem to "B".

However, there is a word "X" which can have two different meanings. One
of those meanings is a form of "A", the other is a form of "B". In
order to reflect this, without tying "A" and "B" together, "X" should
stem to both "A" and "B".

Is this the situation you have?

I don't know of any concrete examples of this situation, (Martin may be
able to give one), but the way I would expect it to be solved is to
choose which stemmed form is more frequently the correct stemmed form of
"X", and to use that always. Alternatively, if neither form is
significantly more frequent, "X" could be left in an unstemmed form.

It would require a good deal of work to allow most search engines to
deal with a stemming algorithm that returned multiple possibilities.

-- 
Richard



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:45 BST