[Snowball-discuss] PHP Snowball extension

From: dark_panda@hushmail.com
Date: Mon Mar 04 2002 - 22:17:05 GMT


A couple of weeks ago I decided to write a PHP extension that wraps a few functions around the Snowball stemmers. A few months ago I implemented the original "Porter stemmer" as a PHP extension using posix regexes, and while it performed nicely (it matched the output of the stemmer available at www.tartarus.org/~martin precisely), I decided to re-write the extension from scratch to take advantage of Snowball's various language stemmers.

Besides being noticibly faster over large sets of input, the latest PHP extension also has multi-language support and other nicities. Adding new stemmers to the extension is trivial and requires some 10 lines of code added to the main extension files.

The PHP extension (called, quite simply, "stem") is available for download at http://209.202.82.229/software, along with the older, regex-based extension. Instructions are bundled with the tarball, but basically, you can call the functions from PHP thusly:

string stem(string word [, int language])

where word is obviously the word to be stemmed and language is an optional constant used to determine the langauge used, with the original Porter stemming algorithm as the default. Alternatively, you can call individual stemmers more directly as:

string stem_LANGUAGE(string word)

where LANGUAGE is the, well... language, i.e. stem_french(), stem_russian() and so forth.

The PHP extension's code is licensed under a BSD-like license, with the Snowball code obviously copyright MF Porter, also covered under a BSD-like license.

Enjoy.

J

Hush provide the worlds most secure, easy to use online applications - which solution is right for you?
HushMail Secure Email http://www.hushmail.com/
HushDrive Secure Online Storage http://www.hushmail.com/hushdrive/
Hush Business - security for your Business http://www.hush.com/
Hush Enterprise - Secure Solutions for your Enterprise http://www.hush.com/

Looking for a good deal on a domain name? http://www.hush.com/partners/offers.cgi?id=domainpeople

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:41 BST