[Snowball-discuss] Unicode

From: Vineet Gupta (vineet@stratify.com)
Date: Fri Feb 22 2002 - 19:04:22 GMT


Another alternative is to get the IBM International Components for Unicode
library in C or Java.
http://oss.software.ibm.com/icu/
It has a wide variety of converters, along with lots of other functionality
for internationalization and localization (for example its IsAlpha function
is better than the iswalpha that comes with Microsoft Visual C++).

For input hex characters, it might be useful to follow the usual convention
--- hex '0A0D' is one character, whereas hex '0A 0D' is two characters (so
start reading digits until you reach a non-hex digit, that constitutes one
character). In a hex string the only legal characters might be 0-9, A-F and
space.

Vineet

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:41 BST