[Snowball-discuss] (no subject)

From: Martin Porter (martin_porter@softhome.net)
Date: Sat Aug 17 2002 - 10:51:50 BST


Enea,

I think there are two issues: -ito may be taken as a past participle ending
of a verb when it is simply part of the stem. abito and soprabito are
examples here, and this is what in the introductory paper at

    http://snowball.sourceforge.net/texts/introduction.html

are called mis-stemming errors. Or -ito really is a past participle ending,
but the past participle form has come to acquire a distinct meaning from the
usual verb form. These are called over-stemming errors.

When developing the Italian stemmer I was very conscious of the problem of
over-stemming with past participles. For example 'bandire' is 'to banish',
and the p.p. form 'bandito' can mean 'banished'. But as often it has the
meaning of 'outlaw' - someone who has been banished. We have the same word
in English of course: 'bandit'. The p.p. has come to acquire a precise
meaning of its own. This happens in all the romance languages, but is
especially noteworthy in Italian.

One of your examples is of this type. 'marito' is from 'marire', although I
suppose 'sposare/sposarsi' is the usual verb nowadays.

'prestito' is not the p.p. of 'prestare', but the two forms are connected in
an obvious way. Less obviously 'spirare' and 'spririto' are connected
(because 'spirit' used to be thought of as 'breath'). Your other examples,
'solito' etc., are simple mis-stemmings.

Does it matter? Not necessarily. See the discussion in the introductory
paper. The test must be the performance of an IR system in which the
stemming is utilised.

On the other hand, removal of -itV endings, where V is o,i,e,or a is not
difficult to effect in the Snowball script.

I will bear in your observations for futuire developments. (I wonder what
the actual context is in which you have been looking at the Italian stemmer?)

Martin

>On Tue, 2002-08-06 at 05:08, Enea Mansutti wrote:
>> Sorry if I bother. I've been trying the italian stemmer of
>> snowball and I think that for some words the stemming is
>> wrong.
>> The word marito means husband but is treated as a verb and
>> it is stemmed incorrectly to mar (it should be marit).
>> The same applies to the following (all ending in ito):
>> insolito means unusual
>> vomito means vomitus
>> spirito means spirit
>> subito means now
>> soprabito means coat
>> solito means same
>> prestito means loan
>> and so on...
>> Do you have any suggestions?
>> Thank you for your patience,
>

-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:42 BST