Re: [Snowball-discuss] Unicode version of snowball

From: xiao shibin (xiao.shibin@trs.com.cn)
Date: Mon May 10 2004 - 12:52:02 BST


Hi Martin,

By your help, I can compile the stemmer to process UCS2-based unicode.
But my russian text is encoded in UTF8-based unicode, and I don't want to translate UTF8 data to UCS2 data, Could you tell me how to modify the stemmer?
Or have a done version of snowball which support UTF8?

search the snowball front-page, get "cyrillic letters in utf-8", should I do other modify?

stringdef a decimal '45264'
stringdef b decimal '45520'
stringdef v decimal '45776'
stringdef g decimal '46032'
stringdef d decimal '46288'
stringdef e decimal '46544'
stringdef zh decimal '46800'
stringdef z decimal '47056'
stringdef i decimal '47312'
stringdef i` decimal '47568'
stringdef k decimal '47824'
stringdef l decimal '48080'
stringdef m decimal '48336'
stringdef n decimal '48592'
stringdef o decimal '48848'
stringdef p decimal '49104'
stringdef r decimal '32977'
stringdef s decimal '33233'
stringdef t decimal '33489'
stringdef u decimal '33745'
stringdef f decimal '34001'
stringdef kh decimal '34257'
stringdef ts decimal '34513'
stringdef ch decimal '34769'
stringdef sh decimal '35025'
stringdef shch decimal '35281'
stringdef " decimal '36049'
stringdef y decimal '35793'
stringdef ' decimal '35537'
stringdef e` decimal '36305'
stringdef iu decimal '36561'
stringdef ia decimal '36817'




thanks for your help.

xiao shibin

----- Original Message -----
From: "Martin Porter" <martin.porter@grapeshot.co.uk>
To: "xiao shibin" <xiao.shibin@trs.com.cn>; <snowball-discuss@lists.tartarus.org>
Sent: Sunday, May 09, 2004 7:06 PM
Subject: Re: [Snowball-discuss] Unicode version of snowball


> At 13:44 09/05/2004 +0800, xiao shibin wrote:
> >>May 2002 - Unicode support added
> >
> >where can I download the unicode version?
> >
> >thanks,
> >
> >xiao shib
>
> Just download the whole thing and use the -w[idechars] option when
> compiling. If you put "unicode" in the snowball front-page search box you
> can see the emails that were passed around when 16 bit character support was
> being added, which provides useful background.
>
> Martin
>
>
>
>



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST