Re: [Snowball-discuss] Stop word lists

From: Martin Porter (
Date: Tue Oct 08 2002 - 20:39:02 BST


The Google stopword list is very interesting. The basic list for English,
is, in my experience

   { the a and of to in an }

which works well on titles technical papers.

I rather doubt the 'en' is there because it is a French/Spanish word. It is
not all that common - much less common than 'de' for example. Could it be
connected with the language code for English do you think?

As you say, 'further' as a stopword is very dubious. Time for another rethink.


