On Mon, Sep 18, 2006 at 06:15:22PM +0200, Martin Porter wrote:
> For string-forward among, surely the byte to take is not byte 0, but byte
> n-1, where n is the size of the smallest string in the among.
Are you saying it's currently incorrect?
Or that taking this byte may give a better optimisation, because it
avoids the problem with Cyrillic characters always starting with one of
two bytes in UTF-8?
Assuming the later, since we know the cases when we generate the
shortcut, we could actually look at all the different choices of
bytes between 0 and n-1 and potentially chose a different strategy
for each among.
This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:48 BST