[Snowball-discuss] Re: SnowBall German stemming

From: Martin Porter (martin_porter@softhome.net)
Date: Wed Mar 13 2002 - 11:04:19 GMT


Marcus,

Thank you for your encouragement. You are the first German user from whom we
have had any feedback!

(It is useful to post to "Snowball discuss".)

There was some confusion over codes a while ago, since I was using MS-DOS
Latin 1, but describing it as ISO-Latin 1 on the website (pure ignorance on
my part). But I think everything is in parallel now. The Snowball scripts
and the sample data sets use MS-DOS Latin 1. The documentation on the
website refers to MS-DOS Latin 1 where relevant. From 'Character codes' on
the main page you can get to the header files for ISO-Latin 1 and
instructions on how to adjust the Snowball scripts to use them. The code
values you quote are ISO-Latin 1.

- But perhaps I haven't understood your email, since 'ß' is E1=225 in MS-DOS
Latin 1, 223 in ISO-Latin 1.

Martin

At 02:24 AM 3/13/02 -0800, Marcus Hassler wrote:
>Hello!

>

>First of all: you did a great job! I am using the Snowball

>concept for developing a natural language Information

>Retrieval system for German.I downloaded everything and

>everything it is working properly. There is just one

>problem:

>

>The special characters 'ü' (decimal 252), 'ö' (decimal

>246) and 'ä' (decimal 228) are not handled as they should

>(as the input-output sample says!). The special character

>'ß' (hex E1) is handled correctly! I am not sure if there

>is a problem in the snowball file with the algorithm or

>anything else. I would be glad if you can help me with

>this!

>

>Best regards,

> Marcus

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:41 BST