[Snowball-discuss] UTF-8

From: Vineet Gupta (vineet@stratify.com)
Date: Fri Feb 22 2002 - 19:08:49 GMT


        3) UTF-8 encoded 8 bit characters. I believe the only change to the
        generated C is that cursor movements of the form z->c++; and z->c--;
need to
        be replaced by function calls that move over 1,2 or 3 bytes to get
to the
        next character.

It is much easier to have UCS-2 internally, and simply add a converter
to/from UTF-8. This way you need to output only one style of code, with an
option to compile with and without UNICODE. Converters to/from UTF-8 are
trivial, I can send you one if you need.

Vineet

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:41 BST