My name is Olga Beregovaya, and I have been working in
the fields of multilingual computing/natural language
processing for the past 12 years. I am now starting to
work on a mainly open-source-based non-profit "pet"
educational project for Web based foreign language
As a part of the project we need to look at the best
implementation for a Stemming component that we could
use as a part of our application - so someone on MT
list recommended your project, which I found truly
impressive. As I result of playing with your demo I
generated a short list of questions related to the
1. How we can add a list of "exceptional
cases" to the Stemmer -
for example ignore suffixes in Germanic languages
English, German etc)adverbs and postfixes in
adjectives? Has this or similar configuration been
addressed previously? I am certain there has been a
need for this.
2. As an output of stem process we ideally want to
get a "raw dictionary"
data - so "went" should be stemmed to "go"
"maliciously" should be kept
as "maliciously" as an exception. What approach would
you recommend that we take to address these irregular"
4. Is there Java API available for Snowball?
5. Could you perhaps point me to some other publicly
available stemmers I could look at and play with?
Thanks in advance to your help on our search, and I
will be more than happy to post the results of my
findings/observations and progress of our project once
it all takes a more definite shape.
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST