I was really thinking aloud. I would need to rewrite the snowball scripts to
use 'among's rather than character groups. 'goto vowel' was just a way of
illustrating the problem.
The way to make it work with utf-8 encoded data is to put the unicode
Russian characters into 2 byte form before calling Snowball, and then repack
as utf-8 afterwards. Tedious, I know.
I said 2 or 3 byte characters because in utf-8, a character value above 127
packs into either 2 or 3 bytes. Is that not so?
I will look at http address you sent.
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:44 BST