The stemming algorithm
Italian can include the following accented forms:
-
á é í ó ú à è ì ò ù
First, replace all acute accents by grave accents. And, as in French, put u after
q, and u, i between vowels into upper case.
(See note on vowel marking.)
The vowels are then
-
a e i o u à è ì ò ù
R2
(see the note on R1 and R2)
and RV have the same definition as in the
Spanish stemmer.
Always do steps 0 and 1.
Step 0: Attached pronoun
-
Search for the longest among the following suffixes
-
ci gli la le li lo mi ne si ti vi
sene gliela gliele glieli glielo gliene
mela mele meli melo mene
tela tele teli telo tene
cela cele celi celo cene
vela vele veli velo vene
following one of
-
(a) ando endo
(b) ar er ir
in RV. In case of (a) the suffix is deleted, in case (b) it is replace
by e (guardandogli -> guardando, accomodarci -> accomodare)
Step 1: Standard suffix removal
-
Search for the longest among the following suffixes, and perform the
action indicated.
- anza anze ico ici ica ice iche ichi ismo ismi abile abili ibile ibili
ista iste isti istà istè istì oso osi osa ose mente
atrice atrici ante anti
- delete if in R2
- azione azioni atore atori
delete if in R2
- if preceded by ic, delete if in R2
- logia logie
- replace with log if in R2
- uzione uzioni usione usioni
- replace with u if in R2
- enza enze
- replace with ente if in R2
- amento amenti imento imenti
- delete if in RV
- amente
- delete if in R1
- if preceded by iv, delete if in R2 (and if further preceded by at,
delete if in R2), otherwise,
- if preceded by os, ic or abil, delete if in R2
- ità
- delete if in R2
- if preceded by abil, ic or iv, delete if in R2
- ivo ivi iva ive
- delete if in R2
- if preceded by at, delete if in R2 (and if further preceded by ic,
delete if in R2)
Do step 2 if no ending was removed by step 1.
Step 2: Verb suffixes
-
Search for the longest among the following suffixes in RV, and if found,
delete.
-
ammo ando ano are arono
asse assero assi assimo ata ate
ati ato ava avamo avano avate avi avo emmo
enda ende endi endo erà erai eranno ere
erebbe erebbero erei eremmo eremo ereste
eresti erete erò erono essero ete eva evamo
evano evate evi evo Yamo iamo immo irà
irai iranno ire irebbe irebbero irei iremmo
iremo ireste iresti irete irò irono isca
iscano isce isci isco iscono issero ita ite
iti ito iva ivamo ivano ivate ivi ivo
ono uta ute uti uto ar ir
Always do steps 3a and 3b.
Step 3a
-
Delete a final a, e, i, o, à, è, ì or ò if it is in RV, and a
preceding i if it is in RV (crocchi -> crocch, crocchio -> crocch)
Step 3b
-
Replace final ch (or gh) with c (or g) if in RV (crocch -> crocc)
Finally,
-
turn I and U back into lower case
|