[Snowball-discuss] Encoding in browser question. StopWords.

From: Praveen Hombaiah (ph_one@hotmail.com)
Date: Sun Jul 04 2004 - 17:34:16 BST


Hello,
     I'm trying to add Kannada language support to the Perl
Lingua::StopWords module. To do this I wanted to put together a list of
Kannada Stop Words. I want to do this by writing a program which will go
thru a few kannada news papers( www.prajavani.net ), and output the list of
most commonly used words.
    I would need the list of words in the utf-8 encoding( since that is the
encoding that is being used by the Lingua::StopWords module ). When I go to
the newspaper website using Internet Explorer, the characters are readable,
and the encoding is set to Western Eurpean ( Windows ). Is there any way to
  convert this to utf-8 ? Exactly what format is this currently in ?

   I aplogize for the cross-posting. Any help anybody could provide is very
much appreciated.

Regards,
Praveen Hombaiah.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST