[Snowball-discuss] russian stemmer in java

From: Антон Потапов (bertolucci@mail.ru)
Date: Tue Nov 19 2002 - 19:40:01 GMT


First of all, I'd like to tell you that I was simply happy
to find such an astonishing set of stemmers and am very grateful.
Your work is priceless and brilliant.

I have a question about russian stemmer in java. The problem is that I cannot use russian stemmer to stem russian words. The russian java stemmer makes text file which contains each word on each new line, but it does nothing with the word. Stemmer writes word to the file as is. I think it is the problem with encoding.

To open file I use:

BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(args[1]), "KOI8_R"));

where is args[1] == text file in russian

BUT, when I read file:

 int character;
 while ((character = reader.read()) != -1) {
   char ch = (char) character;

the out put is NOT in KOI-8R :(

Please advice.


Anton Potapov

This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:43 BST