Re: [Snowball-discuss] Minor bug in utf-8 handling

From: Martin Porter (martin.porter@grapeshot.co.uk)
Date: Wed Feb 14 2007 - 13:49:36 GMT

Next message: Olly Betts: "Re: [Snowball-discuss] More patches"
Previous message: Olly Betts: "[Snowball-discuss] Minor bug in utf-8 handling"
In reply to: Olly Betts: "[Snowball-discuss] Minor bug in utf-8 handling"
Next in thread: Richard Boulton: "Re: [Snowball-discuss] Minor bug in utf-8 handling"
Reply: Richard Boulton: "Re: [Snowball-discuss] Minor bug in utf-8 handling"
Reply: Richard Boulton: "Re: [Snowball-discuss] Minor bug in utf-8 handling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Yes that's a bug. many thanks Olly. I think we've been fortunate to get
away with that one all this time.

On Tue, 2007-02-13 at 17:01 +0000, Olly Betts wrote:
> I think I've spotted a bug in the handling of 3 byte utf-8 sequences
> while reading the code. Both get_utf8 and get_b_utf8 fetch the third
> byte with *p when they should use p[c].

. . . . .

> In current stemmers, this is probably harmless, as the characters in use
> in the languages snowball has stemmers for encode as one or two byte
> utf-8 sequences.
>

Next message: Olly Betts: "Re: [Snowball-discuss] More patches"
Previous message: Olly Betts: "[Snowball-discuss] Minor bug in utf-8 handling"
In reply to: Olly Betts: "[Snowball-discuss] Minor bug in utf-8 handling"
Next in thread: Richard Boulton: "Re: [Snowball-discuss] Minor bug in utf-8 handling"
Reply: Richard Boulton: "Re: [Snowball-discuss] Minor bug in utf-8 handling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:49 BST