Re: [Snowball-discuss] Optimising among

From: Olly Betts (olly@survex.com)
Date: Mon Sep 18 2006 - 18:47:36 BST


On Mon, Sep 18, 2006 at 06:15:22PM +0200, Martin Porter wrote:
> For string-forward among, surely the byte to take is not byte 0, but byte
> n-1, where n is the size of the smallest string in the among.

Are you saying it's currently incorrect?

Or that taking this byte may give a better optimisation, because it
avoids the problem with Cyrillic characters always starting with one of
two bytes in UTF-8?

Assuming the later, since we know the cases when we generate the
shortcut, we could actually look at all the different choices of
bytes between 0 and n-1 and potentially chose a different strategy
for each among.

Cheers,
    Olly



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:48 BST