Re: [Snowball-discuss] a simple algorithm problem

From: James Aylett (james@tartarus.org)
Date: Thu Jan 06 2005 - 09:29:07 GMT


On Thu, Jan 06, 2005 at 09:12:34AM +0000, Martin Porter wrote:

> So one idea is to declare 'utf8' in the Snowball script, allowing character
> defs in the range 0-64K, as in the 2-byte character version. Characters
> could be written with their Unicode values.

Presumably this still restricts Snowball to code points in the BMP? Or
does it just restrict it to recognising and doing things with
characters at code points in the BMP, passing through any others?
There's not a huge amount outside it yet, so this may not matter at
all.

> and encoded in utf-8 form in strings.

What's the character encoding of snowball scripts at the moment? It
isn't touched upon in the manual, so I'm guessing at present it's
expected to be ASCII or similar.

Cheers,
James

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james@tartarus.org                               uncertaintydivision.org



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:47 BST