I'm trying to add Kannada language support to the Perl
Lingua::StopWords module. To do this I wanted to put together a list of
Kannada Stop Words. I want to do this by writing a program which will go
thru a few kannada news papers( www.prajavani.net ), and output the list of
most commonly used words.
I would need the list of words in the utf-8 encoding( since that is the
encoding that is being used by the Lingua::StopWords module ). When I go to
the newspaper website using Internet Explorer, the characters are readable,
and the encoding is set to Western Eurpean ( Windows ). Is there any way to
convert this to utf-8 ? Exactly what format is this currently in ?
I aplogize for the cross-posting. Any help anybody could provide is very
Express yourself instantly with MSN Messenger! Download today - it's FREE!
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:46 BST