Snowball: Quick introduction




 

Links

Snowball main page


You can use this site at a number of levels:

- You can look at the stemming algorithm definitions themselves, and use them as templates for coding your own versions of stemmers in the computer language of your choice.

- You can use the various ANSI C and Java stemmers in programs of your own, without bothering yourself with the Snowball system that generated them. To do that, download either the C or the Java version of the libstemmer library, and follow the instructions contained in the  README  files within these tarballs. The tarballs also contain simple example programs which allow you to run the stemmers from the command line.

- You can get involved in Snowball itself. This is particularly worthwhile if you want to adjust the stemmers or develop new stemmers. A typical reason for adjusting the stemmers is that you are working with a different encoding of accented letters from the ISO Latin I encoding assumed in most of the scripts here. Then you need to make your own version of the Snowball compiler and work with the Snowball scripts.
Snowball is a language in which stemming algorithms can be easily represented. The Snowball compiler translates a Snowball script (a  .sbl file) into either a thread-safe ANSI C program or a Java program. For ANSI C, each Snowball script produces a program file and corresponding header file (with  .c  and  .h  extensions). The language has a full manual, and the various stemming scripts act as example programs.
- You can get deeply interested in stemming. If you do, read the introductory paper about Snowball. It is a bit heavyweight, but provides essential background. And look at the notes on how you can help.