Snowball

Introduction
Demo
Download
Mailing lists
License
Credits
Projects

Source on github

This is the old website for the snowball system, preserved for historical reasons (there are many references to it in the published literature), but now no longer maintained. Snowball development continues on Github, and the new website for snowball is at http://snowballstem.org/. See below for links to the snowball-discuss archives.


 

Links to resources

Quick Introduction
An account of Snowball
How You Can Help

Snowball
the manual
how to run it

Tar gzipped files of Snowball sources

Snowball-discuss archives
at gmane
the same, in a more modern "blog" style

stemmers
English (porter)
English (porter2)
A note on early English
Romance stemmers:
French
Spanish
Portuguese
Italian
Romanian
Germanic stemmers
German
(German variant)
Dutch
Scandinavian stemmers
Swedish
Norwegian
Danish
Russian
Finnish
Character codes

Contributed stemmers in other programming languages

Wrappers

External Contributions
Irish and Czech
Object Pascal codegenerator for Snowball
Two stemmers for Romanian
Hungarian
Turkish
Armenian
Basque (Euskera)
Catalan

Other work
The Schinke Latin stemmer
The Lovins English stemmer
The Kraaij/Pohlmann Dutch stemmer


Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. This site describes Snowball, and presents several useful stemmers which have been implemented using it.



(Since it effectively provides a ‘suffix STRIPPER GRAMmar’, I had toyed with the idea of calling it ‘strippergram’, but good sense has prevailed, and so it is ‘Snowball’ named as a tribute to SNOBOL, the excellent string handling language of Messrs Farber, Griswold, Poage and Polonsky from the 1960s.

- Martin Porter)


Major events

September 2014 — Closure of this site

May 2012 — Contributed stemmers for Irish and Czech

Jul 2010 — Contributed stemmers for Armenian, Basque, Catalan

Mar 2007 — Romanian stemmer

Jan 2007 — Turkish stemmer Contributed by Evren (Kapusuz) Cilden

Sep 2006 — Hungarian stemmer Contributed by Anna Tordai

Jun 2006 — Supported and updated Python bindings

May 2005 — UTF-8 Unicode support

Sep 2002 — Finnish stemmer

Jul 2002 — ISO Latin I as default
The use of MS DOS Latin I is now history, but the old versions of the Snowball stemmers are still accessible on the site.

May 2002 — Unicode support

Feb 2002 — Java support
Richard has modified the snowball code generator to produce Java output as well as ANSI C output. This means that pure Java systems can now use the snowball stemmers.