[Snowball-discuss] Updated Python interface and new Jython interface to Snowball

From: Olivier Bornet (Olivier.Bornet@idiap.ch)
Date: Fri Feb 25 2005 - 13:56:49 GMT


Hi again,

I'm working on a IR project mainly coded in Python. In this project, I
was doing stemming with the Python class PorterStemmer from "The Porter
Stemming Algorithm" web site[1]. As we want to support different
language than english, I'm now switching to the Snowball stemming
system.

Our project is based on a Python library which is used from either
Python or Java programs. Thanks to PyStemmer[2], the switch from
PorterStemmer class to Snowball was done without problems for Python
programs.

The major problem I have had was for integrating the Snowball stemming
system inside the Java programs. Because the stemming is not done in the
Java code, but in the Python library used by Java (via Jython[3]). Using
Jython is very interesting for allowing the Java code to use the Python
libraries. Unfortunately, in this case, the Python code can't use C
extension, as it is done with PyStemmer.

So, I have created a Python interface to the Java code generated by
Snowball. This enable our Python library to use the Snowball stemming
system from either native Python code (via PyStemmer) or from Java code
(with Jython).

To resume this, we have two way of using Snowball in our project:

  a. Python native program -> our Python library -> Snowball as C
     extension to Python
  b. Java program -> our Python library -> Snowball as Java extension to
     Python

So, in short: I have now a Python interface adapted to the current
Snowball CVS (snowball/snowball directory) and a new Jython interface to
the same Snowball CVS. If there is some interest, I will be happy to
share these interfaces with Snowball. I'm ok to either commit these
changes to the CVS, send to this mailing list, or put on a specific web
site.

Thanks in advance for your feedback, and thanks for Snowball.

        Olivier

[1] http://www.tartarus.org/~martin/PorterStemmer/
[2] http://sourceforge.net/projects/pystemmer/
[3] http://www.jython.org/

-- 
   . __    . ___  __.  | Olivier Bornet         Olivier.Bornet@idiap.ch
  / /  `  / /  / /  /  | IDIAP             http://www.idiap.ch/~bornet/
 / /   / / /--/ /--'   | CP 592        http://www.idiap.ch/~bornet/pgp/
/ /__.' / /  / /       | CH-1920 Martigny           PGP-key: 0xC53D9218




This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:47 BST