Links to resources
| ||||||||||||||||||||||||||||||||||||||||
The question occasionally arises of how far the English (or earlier Porter)
stemming algorithm can be adapted to handle older forms of the English
language.
Historically, English is usually divided into three periods of development,
Middle English is problematical for a number of reasons. There is no standard spelling in the original texts, and the grammatical differences between Middle and Modern English prevent the spelling from being simply ‘modernised’. It is however possible to normalise the spelling according to some modern scheme, but again there is no standard modern scheme. Middle English itself had great regional variations, so that for example the English of Chaucer and his contemporary the Gawain poet (both late 14th century) are strikingly different. Finally, grammar was fluid even for one writer, so Chaucer might use they love or they loven, he sitteth or he sit. We may take Modern English to mean English which can be cast into a modern spelling form without too much damage being done to the original. From this point of view Shakespeare and the Authorised Version of the Bible are in Modern English. The ending structure of words in early Modern English differ from contemporary English in the est and eth endings of verbs in the present indicative,
To put the endings into the Porter stemmer, the rules
The inclusion of these endings does produce certain ‘side effects’. est is the ending of adjectival superlatives (greatest, unkindest), where it will also be removed. Words like brandreth, deforest will be mis-stemmed. Nevertheless, for the vocabulary of the Bible, the inclusion of these extra endings is not harmful (see this demonstration — for example, search for the text love in 1000 verses). |