Snowball Manual

Links to resources

Snowball definition

Snowball is a small string-handling language, and its name was chosen as a tribute to SNOBOL (Farber 1964, Griswold 1968 — see the references at the end of the introduction), with which it shares the concept of string patterns delivering signals that are used to control the flow of the program.

1 Data types

The basic data types handled by Snowball are strings of characters, signed integers, and boolean truth values, or more simply strings, integers and booleans. Snowball's characters are either 8-bit wide, or 16-bit, depending on the mode of use. In particular, both 8-bit ASCII and 16-bit Unicode are supported.

2 Names

A name in Snowball is a letter followed by zero or more letters, digits and underlines. A name can be of type string, integer, boolean, routine, external or grouping. All names must be declared. A declaration has the form

    Ts ( ... )

where symbol T is one of string, integer etc, and the region in brackets contains a list of names separated by whitespace. For example,

    integers ( p1 p2 )
    booleans ( Y_found )

    routines (
       shortv
       R1 R2
       Step_1a Step_1b Step_1c Step_2 Step_3 Step_4 Step_5a Step_5b
    )

    externals ( stem )

    groupings ( v v_WXY v_LSZ )

p1 and p2 are integers, Y_found is boolean, and so on. Snowball is quite strict about the declarations, so all the names go in the same name space, no name may be declared twice, all used names must be declared, no two routine definitions can have the same name, etc. Names declared and subsequently not used are merely reported in a warning message. A name may not be one of the reserved words of Snowball.

3 Literals

A literal integer is a digit sequence, and is always interpreted as decimal. A literal string is written between single quotes, for example,

    'aeiouy'

In a stringdef (see below), string may be preceded by the word hex, or the word decimal, in which case the contents are interpreted as characters written out in hexadecimal, or decimal, notation. The characters should be separated by spaces. For example,

    hex 'DA'        /* is character hex DA */
    hex 'D A'       /* is the two characters, hex D and A (carriage
                       return, and line feed) */
    decimal '10'    /* character 10 (line feed) */
    decimal '13 10' /* characters 13 and 10 (carriage return, and
                       line feed) */

The following forms are equivalent,

    hex 'd a'      /* lower case also allowed */
    hex '0D 000A'  /* leading zeroes ignored */
    hex ' D  A  '  /* extra spacing is harmless */

stringdefs define special string macros, to handle unusual character combinations.

Macro m may is defined in the form stringdef m 'S', where 'S' is a string, and m a sequence of one or more printing characters terminating with whitespace.

Two special insert characters are defined by the directive stringescapes AB, where A and B are printing characters, and A is not single quote. (B may equal A, but then A itself can never be escaped.) For example,

    stringescapes []

A subsequent occurrence of the same directive redefines the insert characters.

Thereafter, [m] inside a string causes S to be substituted in place of m.

Immediately after the stringescapes directive, ['] will substitute ' and [[] will substitute [, although macros ' and [ may subsequently be redefined. A further feature is that [W] inside a string, where W is a sequence of whitespace characters including one or more newlines, is ignored. This enables long strings to be written over a number of lines.

For example,

    stringescapes []

    /* special Spanish characters (in MS-DOS Latin I) */

    stringdef a'   hex 'A0'  // a-acute
    stringdef e'   hex '82'  // e-acute
    stringdef i'   hex 'A1'  // i-acute
    stringdef o'   hex 'A2'  // o-acute
    stringdef u'   hex 'A3'  // u-acute
    stringdef u"   hex '81'  // u-diaeresis
    stringdef n~   hex 'A4'  // n-tilde

    /* and in the next string we define all the characters in Spanish
       used to represent vowels
    */

    define v 'aeiou[a'][e'][i'][o'][u'][u"]'

4 Routines

A routine definition has the form

    define R as C

where R is the routine name and C is a command, or bracketed group of commands. So a routine is defined as a sequence of zero or more commands. Snowball routines do not (at present) take parameters. For example,

    define Step_5b as (      // this defines Step_5b
        ['l']                // three commands here: [, 'l' and ]
        R2 'l'               // two commands, R2 and 'l'
        delete               // delete is one command
    )

    define R1 as $p1 <= cursor
        /* R1 is defined as the single command "$p1 <= cursor" */

A routine is called simply by using its name, R, as a command.

5 Commands and signals

The flow of control in Snowball is arranged by the implicit use of signals, rather than the explicit use of constructs like the if, then, break of C. The scheme is designed for handling strings, but is perhaps easier to introduce using integers. Suppose x, y, z ... are integers. The command

    $x = 1

sets x to 1. The command

    $x > 0

tests if x is greater than zero. Both commands give a signal t or f, (true or false), but while the second command gives t if x is greater than zero and f otherwise, the first command always gives t. In Snowball, every command gives a t or f signal. A sequence of commands can be turned into as a single command by putting them in a list surrounded by round brackets:

    ( C₁ C₂ C₃ ... C_i C_i+1 ... )

When this is obeyed, C_i+1 will be obeyed if each of the preceding C₁ ... C_i give t, but as soon as a C_i gives f, the subsequent C_i+1 C_i+2 ... are ignored, and the whole sequence gives signal f. If all the C_i give t, however, the bracketed command sequence also gives t. So,

    $x > 0  $y = 1

sets y to 1 if x is greater than zero. If x is less than or equal to zero the two commands give f.

If C₁ and C₂ are commands, we can build up the larger commands,

C₁ or C₂: — Do C₁. If it gives t ignore C₂, otherwise do C₂. The resulting signal is t if and only C₁ or C₂ gave t.
C₁ and C₂: — Do C₁. If it gives f ignore C₂, otherwise do C₂. The resulting signal is t if and only C₁ and C₂ gave t.
not C: — Do C. The resulting signal is t if C gave f, otherwise f.
try C: — Do C. The resulting signal is t whatever the signal of C.
fail C: — Do C. The resulting signal is f whatever the signal of C.

So for example,

($x > 0 $y = 1) or ($y = 0): — sets y to 1 if x is greater than zero, otherwise to zero.
try( ($x > 0) and ($z > 0) $y = 1): — sets y to 1 if both x and z are greater than 0, and gives t.

This last example is the same as

    try($x > 0  $z > 0  $y = 1)

so that and seems unnecessary here. But we will see that and has a particular significance in string commands.

When a ‘monadic’ construct like not, try or fail is not followed by a round bracket, the construct applies to the shortest following valid command. So for example

    try not $x < 1 $z > 0

would mean

    try ( not ( $x < 1 ) ) $z > 0

because $x < 1 is the shortest valid command following not, and then not $x < 1 is the shortest valid command following try.

The ‘dyadic’ constructs like and and or must sit in a bracketed list of commands anyway, for example,

    ( C₁ C₂ and C₃ C₄ or C₅ )

And then in this case C₂ and C₃ are connected by the and; C₄ and C₅ are connected by the or. So

    $x > 0  not $y > 0 or not $z > 0  $t > 0

means

    $x > 0  ((not ($y > 0)) or (not ($z > 0)))  $t > 0

and and or are equally binding, and bind from left to right, so C₁ or C₂ and C₃ means (C₁ or C₂) and C₃ etc.

6 AEs and integer commands

An AE (arithmetic expression) consists of integer names and literal numbers connected by dyadic +, -, * and /, and monadic -, with the same binding powers and semantics as C. An integer command has the form

    $X op AE

where X is an integer name and op is one of the six tests ==, !=, >=, >, <=, <, or five assignments =, +=, -=, *=, /=. Again, the meanings are the same as in C.

As well as integer names and literal numbers, the following may be used in AEs:

`minint`		— the minimum negative number
`maxint`		— the maximum positive number
`sizeof s`		— the number of characters in `s`, where `s` is the name of a string
`cursor`		— the current value of the string cursor
`limit`		— the current value of the string limit
`size`		— the size of the string, in characters

The cursor and limit concepts are explained below.

Examples of integer commands are,

    $p1 <= cursor  // signal is f if the cursor is before position p1
    $p1 = limit    // set p1 to the string limit

7 String commands

If s is a string name, a string command has the form

    $s C

where C is a command that operate on the string. Strings can be processed left-to-right or right-to-left, but we will describe only the left-to-right case for now. The string has a cursor, which we will denote by c, and a limit point, or limit, which we will denote by l. c advances towards l in the course of a string command, but the various constructs and, or, not etc have side-effects which keep moving it backwards. Initially c is at the start and l the end of the string. For example,

        'a|n|i|m|a|d|v|e|r|s|i|o|n'
        |                         |
        c                         l

c, and l, mark the boundaries between characters, and not characters themselves. The characters between c and l will be denoted by c:l.

If C gives t, the cursor c will have a new, well-defined value. But if C gives f, c is undefined. Its later value will in fact be determined by the outer context of commands in which C came to be obeyed, not by C itself.

Here is a list of the commands that can be used to operate on strings.

a) Setting a value

= S

where S is the name of a string or a literal string. c:l is set equal to S, and l is adjusted to point to the end of the copied string. The signal is t. For example,

        $x = 'animadversion'    /* literal string */
        $y = x                  /* string name */

b) Basic tests

S

here and below, S is the name of a string or a literal string. If c:l begins with the substring S, c is repositioned to the end of this substring, and the signal is t. Otherwise the signal is f. For example,

        $x 'anim'   /* gives t, assuming the string is 'animadversion' */
        $x ('anim' 'ad' 'vers')
                    /* ditto */

        $t = 'anim'
        $x t        /* ditto */

true, false

true is a dummy command that generates signal t. false generates signal f. They are sometimes useful for emphasis,

        define start_off as true       // nothing to do
        define exception_list as false // put in among(...) list later

true is equivalent to ()

C₁ or C₂

This is like the case for integers described above, but the extra touch is that if C₁ gives f, c is set back to its old position after C₁ has given f and before C₂ is tried, so that the test takes place on the same point in the string. So we have

        $x ('anim'  /* signal t */
            'ation' /* signal f */
           ) or
           ( 'an'   /* signal t - from the beginning */
           )

C₁ and C₂

And similarly c is set back to its old position after C₁ has given t and before C₂ is tried. So,

        $x 'anim' and 'an'   /* signal t */
        $x ('anim'  'an')    /* signal f, since 'an' and 'ad' mis-match */

not C

try C

These are like the integer tests, with the added feature that c is set back to its old position after an f signal is turned into t. So,

        $x (not 'animation' not 'immersion')
            /* both tests are done at the start of the string */

        $x (try 'animus' try 'an'
            'imad')
            /* - gives t */

try C is equivalent to C or true

test C

This does command C but without advancing c. Its signal is the same as the signal of C, but following signal t, c is set back to its old value.

`test C`		is equivalent to		`not not C`
`test C₁ C₂`		is equivalent to		`C₁ and C₂`

fail C

This does C and gives signal f. It is equivalent to C false. Like false it is useful, but only rarely.

do C

This does C, puts c back to its old value and gives signal t. It is very useful as a way of suppressing the side effect of f signals and cursor movement.

`do C`		is equivalent to		`try test C`
		or		`test try C`

goto C

c is moved right until obeying C gives t. But if c cannot be moved right because it is at l the signal is f. c is set back to the position it had before the last obeying of C, so the effect is to leave c before the pattern which matched against C.

        $x goto 'ad'         /* positions c after 'anim' */
        $x goto 'ax'         /* signal f */

gopast C

Like goto, but c is not set back, so the effect is to leave c after the pattern which matched against C.

        $x gopast 'ad'       /* positions c after 'animad' */

repeat C

C is repeated until it gives f. When this happens c is set back to the position it had before the last repetition of C, and repeat C gives signal t. For example,

        $x repeat gopast 'a' /* position c after the last 'a' */

loop AE C

This is like C C ... C written out AE times, where AE is an arithmetic expression. For example,

        $x loop 2 gopast ('a' or 'e' or 'i' or 'o' or 'u')
            /* position c after the second vowel */

The equivalent expression in C has the shape,

        {    int i;
             int limit = AE;
             for (i = 0; i < limit; i++) C;
        }

atleast AE C

This is equivalent to loop AE C repeat C.

hop AE

moves c AE character positions towards l, but if AE is negative, or if there are less than AE characters between c and l the signal is f. For example,

        test hop 3

tests that c:l contains more than 2 characters.

next

is equivalent to hop 1.

c) Moving text about

We have seen in (a) that $x = y, when x and y are strings, sets c:l of x to the value of y. Conversely

        $x => y

sets the value of y to the c:l region of x.

A more delicate mechanism for pushing text around is to define a substring, or slice of the string being tested. Then

[: sets the left-end of the slice to c,
]: sets the right-end of the slice to c,
-> s: moves the slice to variable s,
<- S: replaces the slice with variable (or literal) S.

For example

        /* assume x holds 'animadversion' */
        $x ( [         // '[animadversion' - [ set as indicated
             loop 2 gopast 'a'
                       // '[anima|dversion' - c is marked by '|'
             ]         // '[anima]dversion' - ] set as indicated
             -> y      // y is 'anima'
           )

For any string, the slice ends should be assumed to be unset until they are set with the two commands [, ]. Thereafter the slice ends will retain the same values until altered.

delete: is equivalent to <- ''

This next example deletes all vowels in x,

        define vowel ('a' or 'e' or 'i' or 'o' or 'u')
        ....
        $ x repeat ( gopast([vowel]) delete )

As this example shows, the slice markers [ and ] often appear as pairs in a bracketed style, which makes for easy reading of the Snowball scripts. But it must be remembered that, unusually in a computer programming language, they are not true brackets.

More simply, text can be inserted at c.

insert S: insert variable or literal S before c, moving c to the right of the insert. <+ is a synonym for insert.
attach S: the same, but leave c at the left of the insert.

d) Marks

The cursor, c, (and the limit, l) can be thought of as having a numeric value, from zero upwards:

         | a | n | i | m | a | d | v | e | r | s | i | o | n |
         0   1   2   3   4   5   6   7   8   9  10  11  12  13

It is these numeric values of c and l which are accessible through cursor and limit in arithmetic expressions.

setmark X: sets X to the current value of c, where X is an integer variable.
tomark AE: moves c forward to the position given by AE,
atmark AE: tests if c is at position AE (t or f signal).

In the case of tomark AE, a similar fail condition occurs as with hop AE. If c is already beyond AE, or if position l is before position AE, the signal is f.

In the stemming algorithms, certain regions of the word are defined by setting marks, and later the failure condition of tomark is used to see if c is inside a particular region.

Two other commands put c at l, and test if c is at l,

tolimit: moves c forward to l (signal t always),
atlimit: tests if c is at l (t or f signal).

e) Changing l

In this account of string commands we see c moving right towards l, while l stays fixed at the end. In fact l can be reset to a new position between c and its old position, to act as a shorter barrier for the movement of c.

setlimit C₁ for C₂

C₁ is obeyed, and if it gives f the signal from setlimit is f with no further action.

Otherwise, the final value of c becomes the new position of l. c is then set back to its old value before C₁ was obeyed, and C₂ is obeyed. Finally l is set back to its old position, and the signal of C₂ becomes the signal of setlimit.

So the signal is f if either C₁ or C₂ gives f, otherwise t. For example,

    $x ( setlimit goto 's'  // 'animadver}sion' new l as marked '}'
         for                // below, '|' marks c after each goto
         ( goto 'a' and     // '|animadver}sion'
           goto 'e' and     // 'animadv|er}sion'
           goto 'i' and     // 'an|imadver}sion'
         )
       )

This checks that x has characters ‘a’, ‘e’ and ‘i’ before the first ‘s’.

f) Backward processing

String commands have been described with c to the left of l and moving right. But the process can be reversed.

backwards C: c and l are swapped over, and c moves left towards l. C is obeyed, the signal given by C becomes the signal of backwards C, and c and l are swapped back to their old values (except that l may have been adjusted because of deletions and insertions). C cannot contain another backwards command.
reverse C: A similar idea, but here c simply moves left instead of moving right, with the beginning of the string as the limit, l. C can contain other reverse commands, but it cannot contain commands to do deletions or insertions — it must be used for testing only. (Without this restriction Snowball's semantics would become very untidy.)

Forward and backward processing are entirely symmetric, except that forward processing is the default direction, and literal strings are always written out forwards, even when they are being tested backwards. So the following are equivalent,

    $x (
        'ani' 'mad' 'version' atlimit
    )

    $x backwards (
        'version' 'mad' 'ani' atlimit
    )

If a routine is defined for backwards mode processing, it must be included inside a backwardmode(...) declaration.

g) `substring` and `among`

The use of substring and among is central to the implementation of the stemming algorithms. It is like a case switch on strings. In its simpler form,

        substring among('S₁' 'S₂' 'S₃' ...)

searches for the longest matching substring 'S₁' or 'S₂' or 'S₃' ... from position c. (The 'S_i' must all be different.) So this has the same semantics as

        ('S₁' or 'S₂' or 'S₃' ...)

— so long as the 'S_i' are written out in decreasing order of length.

substring may be omitted, in which case it is attached to its following among, so

    among(...)

without a preceding substring is equivalent to

    (substring among(...))

substring may also be detached from its among, although it must precede it textually in the same routine in which the among appears. The more general form of substring ... among is,

    substring
    ...
    among( 'S₁₁' 'S₁₂' ... (C₁)
           'S₂₁' 'S₂₂' ... (C₂)
           ...

           'S_n1' 'S_n2' ... (C_n)
         )

Obeying substring searches for a longest match among the 'S_ij'. The signal from substring is t if a match is found, otherwise f. When the among comes to be obeyed, the C_i corresponding to the matched 'S_ij' is obeyed, and its signal becomes the signal of the among command.

substring/among pairs must match up textually inside each routine definition. But there is no problem with an among containing other substring/among pairs, and substring is optional before among anyway. The essential constraint is that two substrings must be separated by an among, and each substring must be followed by an among.

The effect of obeying among when the preceding substring is not obeyed is undefined. This would happen for example here,

    try($x != 617 substring)
    among(...) // 'substring' is bypassed in the exceptional case where x == 617

The significance of separating the substring from the among is to allow them to work in different contexts. For example,

    setlimit tomark L for substring

    among( 'S₁₁' 'S₁₂' ... (C₁)
           ...

           'S_n1' 'S_n2' ... (C_n)
         )

Here the test for the longest 'S_ij' is constrained to the region between c and the mark point given by integer L. But the commands C_i operate outside this limit. Another example is

    reverse substring

    among( 'S₁₁' 'S₁₂' ... (C₁)
           ...

           'S_n1' 'S_n2' ... (C_n)
         )

The substring test is in the opposite direction in the string to the direction of the commands C_i.

The last (C_n) may be omitted, in which case (true) is assumed.

Another possible abbreviation is that when substring is omitted, a construct such as

    among( 'S₁₁' 'S₁₂' ... (C C₁)
           'S₂₁' 'S₂₂' ... (C C₂)
           ...
           'S_n1' 'S_n2' ... (C C_n)
         )

can be written

    among( (C)
           'S₁₁' 'S₁₂' ... (C₁)
           'S₂₁' 'S₂₂' ... (C₂)
           ...
           'S_n1' 'S_n2' ... (C_n)
         )

and this is just equivalent to

    substring C
    among( 'S₁₁' 'S₁₂' ... (C₁)
           'S₂₁' 'S₂₂' ... (C₂)
           ...
           'S_n1' 'S_n2' ... (C_n)
         )

In its most general form, each string 'S_ij' may be optionally followed by a routine name,

    among( (C)
           'S₁₁' R₁₁ 'S₁₂' R₁₂ ... (C₁)
           'S₂₁' R₂₁ 'S₂₂' R₂₂ ... (C₂)
           ...
           'S_n1' R_n1 'S_n2' R_n1 ... (C_n)
         )

So here each R_ij is either a routine name or is null. If null, it is equivalent to a routine which simply returns signal t,

    define null as true

— so we can imagine each 'S_ij' having its associated routine R_ij. Then obeying the among causes a search for the longest 'S_ij' whose corresponding routine R_ij gives t. The routines R_ij should be written without any side-effects, other than the inevitable cursor movement. (c is in any case set back to its old value following a call of R_ij.)

8 Booleans

set B and unset B set B to true and false respectively, where B is a boolean name. B as a command gives a signal t if it is set true, f otherwise. For example,

    booleans ( Y_found )   // declare the boolean

    ....

    unset Y_found          // unset it
    do ( ['y'] <-'Y' set Y_found )
       /* if c:l begins 'y' replace it by 'Y' and set Y_found */

    do repeat(goto (v ['y']) <-'Y' set Y_found)
       /* repeatedy move down the string looking for v 'y' and
          replacing 'y' with 'Y'. Whenever the replacement takes
          place set Y_found. v is a test for a vowel, defined as
          a grouping (see below). */


    /* Y_found means there are some letters Y in the string.
       Later we can use this to trigger a conversion back to
       lower case y. */

    ....

    do (Y_found repeat(goto (['Y']) <- 'y')

9 Groupings

A grouping brings characters together and enables them to be looked for with a single test.

If G is declared as a grouping, it can be defined by

    define G G₁ op G₂ op G₃ ...

where op is + or -, and G₁, G₂, G₃ are literal strings, or groupings that have already been defined. (There can be zero or more of these additional op components). For example,

    define capital_letter  'ABDEFGHIJKLMNOPQRSTUVWXYZ'
    define small_letter    'abdefghijklmnopqrstuvwxyz'
    define letter          capital_letter + small_letter
    define vowel           'aeiou' + 'AEIOU'
    define consonant       letter - vowel
    define digit           '0123456789'
    define alphanumeric    letter + digit

Once G is defined, it can be used as a command, and is equivalent to a test

    'ch1' or 'ch2' or ...

where ch1, ch2 ... list all the characters in the grouping.

non G is the converse test, and matches any character except the characters of G. Note that non G is not the same as not G, in fact

    non G    is equivalent to     (not G next)

non may be optionally followed by hyphen, so one may write

    non-vowel
    non-digit

etc.

10 A Snowball program

A complete program consists of a sequence of declarations followed by a sequence of definitions of groupings and routines. Routines which are implicitly defined as operating on c:l from right to left must be included in a backwardmode(...) declaration.

A Snowball program is called up via a simple API through its defined externals. For example,

    externals ( stem1 stem2 )
    ....
    define stem1 as ( ... /* stem1 commands */ )
    define stem2 as ( ... /* stem2 commands */ )

The API also allows a current string to be defined, and this becomes the c:l string for the external routine to work on. Its final value is the result handed back through the API.

The strings, integers and booleans are accessible from any point in the program, and exist throughout the running of the Snowball program. They are therefore like static declarations in C.

11 Comments, and other whitespace fillers

At a deeper level, a program is a sequence of tokens, interspersed with whitespace. Names, reserved words, literal numbers and strings are all tokens. Various symbols, made up of non-alphanumerics, are also tokens.

A name, reserved word or number is terminated by the first character that cannot form part of it. A symbol is recognised as the longest sequence of characters that forms a valid symbol. So +=- is two symbols, += and -, because += is a valid symbol in the language while +=- is not. Whitespace separates tokens but is otherwise ignored. This of course is like C.

Anywhere that whitespace can occur, there may also occur:

(a) Comments, in the usual multi-line /* .... */ or single line // ... format.

(b) Get directives. These are like #include commands in C, and have the form get 'S', where 'S' is a literal string. For example,

    get '/home/martin/snowball/main-hdr' // include the file contents

(c) stringescapes XY where X and Y are any two printing characters.

(d) stringdef m 'S' where m is sequence of characters not including whitespace and terminated with whitespace, and 'S' is a literal string.

12 Character representation

In this description of Snowball, it is assumed that strings are composed of characters, and that characters can be defined numerically, but the numeric range of these characters is not defined. As implemented, three different schemes are supported. Characters can either be (a) bytes in the range 0 to 255, as in traditional C strings, or (b) byte pairs in the range 0 to 65535, as in Java strings, or (c) UTF-8 encoded bytes sequences in the range 0 to 65535, so that a character may occupy 1, 2 or 3 bytes.

For case (c), we need to make a slight separation of the concept of characters into symbols, the units of text being represented, and slots, the units of space into which they map. (So in case (a), all slots are one byte; in case (b) all slots are two bytes.) c and l have numeric values that can be used in AEs (arithmetic expressions). These values count the number of slots. Similarly setmark, tomark and atmark are remembering and then using slot counts. size and sizeof measure string size in slots, not symbols. However, hop N moves c over N symbols, not N slots, and next is equivalent to hop 1.

So long as these simple distinctions are recognised, the same Snowball script can be compiled to work with any of the three encoding schemes.

Snowball syntax

|| is used for alternatives, [X] means that X is optional, and [X]* means that X is repreated zero or more times. meta-symbols are defined on the left. <char> means any character.

The definition of literal string does not allow for the escaping conventions established by the stringescapes directive. The command ? is a debugging aid.

<letter>        ::= a || b || ... || z || A || B || ... || Z
<digit>         ::= 0 || 1 || ... || 9
<name>          ::= <letter> [ <letter> || <digit> || _ ]*
<s_name>        ::= <name>
<i_name>        ::= <name>
<b_name>        ::= <name>
<r_name>        ::= <name>
<g_name>        ::= <name>
<literal string>::= '[<char>]*'
<number>        ::= <digit> [ <digit> ]*

S               ::= <s_name> || <literal string>
G               ::= <g_name> || <literal string>

<declaration>   ::= strings ( [<s_name>]* ) ||
                    integers ( [<i_name>]* ) ||
                    booleans ( [<b_name>]* ) ||
                    routines ( [<r_name>]* ) ||
                    externals ( [<r_name>]* ) ||
                    groupings ( [<g_name>]* )

<r_definition>  ::= define <r_name> as C
<plus_or_minus> ::= + || -
<g_definition>  ::= define <g_name> G [ <plus_or_minus> G ]*

AE              ::= (AE) ||
                    AE + AE || AE - AE || AE * AE || AE / AE || - AE ||
                    maxint || minint || cursor || limit || size ||
                    sizeof <s_name> || <i_name> || <number>

<i_command>     ::= $ <i_name> = AE ||
                    $ <i_name> += AE || $ <i_name> -= AE ||
                    $ <i_name> *= AE || $ <i_name> /= AE ||
                    $ <i_name> == AE || $ <i_name> != AE ||
                    $ <i_name> > AE || $ <i_name> >= AE ||
                    $ <i_name> < AE || $ <i_name> <= AE ||

<s_command>     ::= $ <s_name> C

C               ::= ( [C]* ) ||
                    <i_command> || <s_command> || C or C || C and C ||
                    not C || test C || try C || do C || fail C ||
                    goto C || gopast C || repeat C || loop AE C ||
                    atleast AE C || S || = S || insert S || attach S ||
                    <- S || delete ||  hop AE || next ||
                    => <s_name> || [ || ] || -> <s_name> ||
                    setmark <i_name> || tomark AE || atmark AE ||
                    tolimit || atlimit || setlimit C for C ||
                    backwards C || reverse C || substring ||
                    among ( [<literal string> [<r_name>] || (C)]* ) ||
                    set <b_name> || unset <b_name> || <b_name> ||
                    <r_name> || <g_name> || non [-] <g_name> ||
                    true || false || ?

P              ::=  [P]* || <declaration> ||
                    <r_definition> || <g_definition> ||
                    backwardmode ( P )

<program>      ::=  P



synonyms:      <+ for insert