Re: Re[2]: [Snowball-discuss] Porter strem question

From: Martin Porter (martin_porter@softhome.net)
Date: Wed Jan 29 2003 - 09:37:01 GMT


>Dear Martin!
>
>Thank you for your replay.
>I'm very sorry for my English. I mean that I need for "Anti English Porter
stemming" algorythm
>for next translate:
>consign consign
>consign consigned
>consign consigning
>consign consignment
>consist consist
>consist => consisted
>consist consistency
>consist consistent
>consist consistently
>consist consisting
>consist consists
>consol consolation
>consol consolations
>
>Purpose: for my own search engine
>
>Thank you
>--
>with best regard, RedStar

Now I see what you mean.

It is possible to do what you want, but only with the aid of a dictionary.
This is because you cannot deduce the part of speech, and therefore the
class of possble endings, from the stem of the word.

Setting such a dictionary up could be done as follows:

A) from a large sample vocabulary get the set of endings corresponding to
each reduced stem, and give the set an identifiable code: e.g.

    V = -ed, -s, -ing, -ings, -able, -ability, -abilities, -ment

V would be a basic verb form, and cover words like govern, arrange, induce,
consign ...

B) (the tricky part) Collapse all these different sets to a small number of
forms. So there would be codes V, V2, N, X ... for different classes of
ending. If a word's endings are nearly the same as X, put it into class X,
and so on.

C) Make a dictionary of stems where you look up the word by its stem, get
the ending class, and from that generate all the forms.

The issue of ending generation has come up before in snowball dicuss. type

    backwards stemmer generation

is the search box, and look at the top four emails.



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:44 BST