[Snowball-discuss] UTF-8

From: Vineet Gupta (vineet@stratify.com)
Date: Fri Feb 22 2002 - 19:08:49 GMT

Next message: Martin Porter: "[Snowball-discuss] Unicode"
Previous message: Andreas Jung: "Re: [Snowball-discuss] Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

        3) UTF-8 encoded 8 bit characters. I believe the only change to the
        generated C is that cursor movements of the form z->c++; and z->c--;
need to
        be replaced by function calls that move over 1,2 or 3 bytes to get
to the
        next character.

It is much easier to have UCS-2 internally, and simply add a converter
to/from UTF-8. This way you need to output only one style of code, with an
option to compile with and without UNICODE. Converters to/from UTF-8 are
trivial, I can send you one if you need.

Vineet

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss

Next message: Martin Porter: "[Snowball-discuss] Unicode"
Previous message: Andreas Jung: "Re: [Snowball-discuss] Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:41 BST