Re: [Snowball-discuss] Japanese stemmer?

From: Micah Bly (micah.j.bly@medtronic.com)
Date: Fri Jan 26 2007 - 16:15:16 GMT


Martin,

I did that search a few months back, and doing it again today, I
think I made a mistake. I found a page like this:
http://lists.tartarus.org/pipermail/xapian-discuss/2005-April/
000832.html

Which is on tartarus, but isn't necessarily snowball-related, at
least directly.

As far as Japanese stemming goes, I can contribute linguistic
knowledge, and pseudo code, but I don't have any experience writing
stemmers, and I don't 'speak' snowball. Would anyone else out there
be interested in collaborating on a stemmer for Japanese?

In other words, I could probably brute force one, but it would not be
rational or efficient.

Micah Bly

On Jan 26, 2007, at 4:04 AM, Martin Porter wrote:

>
> Micah,
>
> I don't know of particular work in this area, but am broadly aware of
> the problems, which are (a) segmentation of text into words and (b)
> word
> normalisation, of which something like stemming forms a part. The
> place
> to go for solutions is no doubt Japan itself. There are commercial
> solutions in the West though, with proprietary software from companies
> like Inxight and Teragram. Among all the major languages, Japanese
> presents the worst problems.
>
> I don't believe the Snowball site says anywhere that stemming doesn't
> matter for Japanese. Can you point to where you found this?
>
> Martin
>
>> Does anyone know of any work being done on a Japanese stemmer? I
>> searched around this site, found a reference that said stemming
>> didn't matter for Japanese (err, ah...), but that was about it.
>>
>> I'm not even sure where to go to look for rules on stemming Japanese.
>>
>> Micah Bly
>
>



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:48 BST