Defining R1 and R2

Most of the stemmers make use of at least one of the region definitions R1 and R2. They are defined as follows:

R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.

R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel.

The definition of vowel varies from language to language. In French, for example, é is a vowel, and in Italian i between two other vowels is not a vowel. The class of letters that constitute vowels is made clear in each stemmer.

Below, R1 and R2 are shown for a number of English words,

    b   e   a   u   t   i   f   u   l
                      |<------------->|    R1
                              |<----->|    R2

Letter t is the first non-vowel following a vowel in beautiful, so R1 is iful. In iful, the letter f is the first non-vowel following a vowel, so R2 is ul.

    b   e   a   u   t   y
                      |<->|    R1
                        ->|<-  R2

In beauty, the last letter y is classed as a vowel. Again, letter t is the first non-vowel following a vowel, so R1 is just the last letter, y. R1 contains no non-vowel, so R2 is the null region at the end of the word.

    b   e   a   u
                ->|<-  R1
                ->|<-  R2

In beau, R1 and R2 are both null.

Other examples:

    a   n   i   m   a   d   v   e   r   s   i   o   n
          |<----------------------------------------->|    R1
                  |<--------------------------------->|    R2

    s   p   r   i   n   k   l   e   d
                      |<------------->|    R1
                                    ->|<-  R2

    e   u   c   h   a   r   i   s   t
              |<--------------------->|    R1
                          |<--------->|    R2