Contributions in other programming languages



This page is reserved for encodings of the snowball stemmers in other programming languages.

It must be emphasised that we are not in a position to maintain these submitted programs, or to guarantee their correctness. In the case of the Porter stemmer there have been 17 submissions in a variety of programming languages, and even though the stemmer itself is unchanging, they have created a certain amount of maintenance work. The Snowball site deals with a wide range of stemmers, all subject to occasional change, and the maintenance of other encodings is beyond our resources.

If you want to use one of these stemmers, we suggest you take the sample vocabulary for the corresponding natural language, and check that the stemmer produces the corresponding stemmed output. If it does not, bringing the submitted stemmer up-to-date should be easier than developing it from scratch.

At present we have only these submissions,

stemmer   language   author   affiliation   received   notes  
Russian php5 Dennis Kreminsky 11/2005 etranger at etranger dot ru
English ANSI C Martin Snowball 01/2006
German python ‘kristall’ 05/2006 kristall (the ‘at’ sign) c-base.org
English C# Kamil ‘Crow’ Bartocha www.pccentre.pl 04/2007
Italian C# Luca Gentili 08/2008 luka.gentili[at]gmail[dot]com
English (porter2) python Michael Dirolf 09/2008 mike[at]dirolf[dot]com
Portuguese java Pedro Oliveira University of Coimbra, Portugal 11/2008
English Erlang Frederick Ross bbcf.epfl.ch/ 01/2010 madhadron at gmail dot com
German Javascript Joder Illi FormBlitz AG 07/2010 joderilli at gmail dot com


Python versions of nearly all the stemmers have been made availabe by Peter Stahl (June 2010) at NLTK’s code repository. Peter also reports inaccuracies in the python German stemmer listed above.