[Snowball-discuss] Suggestion of Improvement

From: Andres Hohendahl (andresh@pbox.com.ar)
Date: Fri Jun 23 2006 - 17:45:15 BST


Hi,

 

Being a NLP researcher at the Engineering University of Buenos Aires, I have
been working on the basic ideas of some popular spelling and stemming
algorithms, used basically for word spell-correction in ASPELL and ISPELL
GNU/free projects, linked to Open Office and other free word processing
stuff.

I got from them a useful description (language in text form. *.aff and *.dic
) of the stem-morphological transformations in many languages, made all over
the world and maintained by many NLP groups.

 

>From there I saw a similarity with your project.

I programmed a library for a small company, capable of efficiently applying
accumulative stemming in direct and reverse form, in C#, under .NET.

 

BTW, stemming does natural grammatical changes in words (pluralization,
gender-change, adjectivation, sustantivation, superlative and diminutive,
etc.)

My algorithm keeps track of this, while stemming or flectioning, so a “good
side” effect is to get the grammatical POS of the stemmed words.

 

Isn’t it interesting to melt them all together?

 

My modest algorithm is very efficient and does not need to be compiled, it
builds efficient memory structures to deal with terminations (suffixes) and
prefixes using specialized TRIES, this is as efficient as compiling or even
more, because you save a step, (no compilation) so your rules can be
upgraded easily at runtime (being reloaded, of course).

 

Is there a good description of the snowball meta-language to deal with, and
may be build something new and good between both worlds?

 

Greetings from Argentina

 

Andrés Hohendahl

 



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:48 BST