Steve,
Well, I am still reeling after the suggestion that -ive should not be
removed, :-) , which indeed is not a bad suggestion.
I'm less sure about the -sis -xis idea, although it is very interesting. It
is really a rule for handling Greek plurals, and English is unusual in
accepting many plural forms from other languages: beaux (French),
cognoscenti (Italian), cacti (Latin), hypotheses (Greek), seraphim (Hebrew).
(It is a phenomenon quite easy to explain historically however.) It has
occurred to me that one might be able to work out the language of a word by
digram analysis - or something similar - and stem accordingly. So hypnotic
is "obviously" Greek, and stems to hypnos, chateaux is "obviously" French
and stems to chateau. Greek -sis endings are therefore part of a general
problem.
Remember in any case that an English stemmer is going to regard -ses endings
as normal plural forms, and remove -s, abuses, bookcases, houses etc. The
Porter stemmer removes -es from longer words, so the problem reduces to
removing -is from analysis etc. The Lovins stemmer (which is more concerned
with "scientific" vocabulary) does that, but also respells final -yt as -ys
so that analysis, analyses, analytic conflate.
The question is, how important is it in practice. One should not be too
influenced by something like YAWL, which is more an aid to scrabble players
than practical list of words for contemporary English. (although if someone
put down "chaprassis" and claimed a triple word score I'd be most upset!) My
sample vocabularies only instance two successful conflations with a rule
like this: hypothesis/hypotheses and parenthesis/parentheses. I realise
there are more in the language as a whole (oasis/oases for example), but the
point is that a rule is hardly worth adding if it only affects one word in
20,000.
The truth is words like bases (as a plural of basis), hypnoses, ellipses are
not used very much. We tend to avoid forming plurals when the plural is
dubious, and use a different contruction. Everyone says "CVs" because they
don't know the plural of "curriculum vitae", and avoid trying to pluralise
words like chassis, chablis, cyclops, Mrs ...
(As a general feature of English, exotic plurals are declining. Hippos for
hippopotami, cactuses for cacti, eskimos for esquimaux etc. Americans say
syllabi, but that sounds strange in England. Dice has become the singular
form of what was once die. Perhaps one day the plural will be dices.
Ignorance of foreign languages must help here. News broadcasters use
papperazzi in the singular without thinking it strange.)
Martin
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss
_____________________________________________________________________
VirusChecked by the Incepta Group plc
_____________________________________________________________________
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:40 BST