Contributions in other programming languages
This page is reserved for encodings of the snowball stemmers in other programming languages.
It must be emphasised that we are not in a position to maintain these submitted programs, or to guarantee their correctness. In the case of the Porter stemmer there have been 17 submissions in a variety of programming languages, and even though the stemmer itself is unchanging, they have created a certain amount of maintenance work. The Snowball site deals with a wide range of stemmers, all subject to occasional change, and the maintenance of other encodings is beyond our resources.
If you want to use one of these stemmers, we suggest you take the sample vocabulary for the corresponding natural language, and check that the stemmer produces the corresponding stemmed output. If it does not, bringing the submitted stemmer up-to-date should be easier than developing it from scratch.
At present we have only these submissions,
stemmer language author affiliation received notes Russian php5 Dennis Kreminsky 11/2005 etranger at etranger dot ru English ANSI C Martin Snowball 01/2006 German python ‘kristall’ 05/2006 kristall (the ‘at’ sign) c-base.org English C# Kamil Bartocha www.pccentre.pl 04/2007 bug fix 12/2016 by Anna Tyurkina Italian C# Luca Gentili 08/2008 luka.gentili[at]gmail[dot]com English (porter2) python Michael Dirolf 09/2008 mike[at]dirolf[dot]com Portuguese java Pedro Oliveira University of Coimbra, Portugal 11/2008 English Erlang Frederick Ross bbcf.epfl.ch/ 01/2010 madhadron at gmail dot com German Javascript Joder Illi FormBlitz AG 07/2010 joderilli at gmail dot com French Javascript Kasun Gajasinghe Moratuwa University, Sri Lanka 08/2010 kasunbg at gmail dot com English matlab Daniel Jablonski 09/2012 (exceptions not encoded) English C++ Sean Massung University of Illinois 10/2012 for C++11, use GCC >= 4.7
Python versions of nearly all the stemmers have been made availabe by Peter Stahl (June 2010) at NLTK’s code repository. Peter also reports inaccuracies in the python German stemmer listed above.
Javascript versions of nearly all the stemmers have appeared in Oleg Mazko’s Urim project. Oleg created the stemmers by hand from the C/java output of the snowball compiler. Compare, for example, his Javascript version for Swedish with the original in C.
(As well as Oleg’s Russian stemmer, there is a Javascript Russian stemmer at code.google.com/p/js-lingua-stem-ru, created by Mark A. Prisyazhnyuk.)
C# versions of the stemmers were made available in August 2011 by of Iveonik Systems. They were ported from the Java Snowball stemmers.