Re: [Snowball-discuss] Unicode and python bindings

From: Andreas Jung (lists@andreas-jung.com)
Date: Tue May 16 2006 - 20:50:49 BST


TextIndexNG3 for Zope (sf.net/projects/textindexng) comes with its own
Python bindings against the latest Snowball code base...and the
completeimplementation is based on unicode and in use since ages...

-aj

--On 16. Mai 2006 14:39:05 +0200 Patrick MĂ©zard <pmezard@gmail.com> wrote:

> Hello,
>
> Trying to solve issues I raised in a previous post
> (<http://thread.gmane.org/gmane.comp.search.snowball/772/focus=772>), I
> finally rewrote parts of the original Weongyo Jeong python bindings to
> fit my needs. The main change is the module interface now consumes python
> Unicode strings (UTF-16) instead of native strings. The idea is that code
> dealing with multiple languages usually unifies first the documents
> encodings into Unicode before passing them to other modules, including
> stemming. With the original bindings, since I failed to use the UTF-8
> interface, I had to convert back from Unicode to specific encodings which
> was at best a pain, at worst impossible.
>
> The new version is temporary available there:
> <http://perso.wanadoo.fr/patrick.mezard/dev/pysnowball-0.0.2.zip> and I
> can provide a copy of the darcs (<http://abridgegame.org/darcs/>)
> repository I used to rewrite my branch.
>
> I think it still needs to be reviewed before any release (I am far from
> being a python C extension expert), even if it passes the few tests I
> could imagine.
>
> What's your opinion about this?
>
> --
> Patrick MĂ©zard
>
>
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss@lists.tartarus.org
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss

 




This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:48 BST