A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2001-May/014744.html below:

[Python-Dev] RE: Ill-defined encoding for CP875?

[Python-Dev] RE: Ill-defined encoding for CP875?Tim Peters tim.one@home.com
Sat, 12 May 2001 17:48:38 -0400
[Martin v. Loewis, whose encyclopedic knowledge of encoding details
 still isn't enough to get a clear answer (it's like somebody asking
 me for a simple answer to a floating point question <wink>]

> ...
> So I think we can take one of two approaches:
>
> 1. admit that CP 875 is not round-trippable, and exclude it from the
>    test (although when looking at the first 128 characters only, it
>    is round-trippable).

As I noted later, 875 is already excluded from the roundtrip test across
range(128, 256).  What it's failing is the roundtrip test across range(128):
after unicode("?", "cp875") produces u'\x1a', the following .encode('c875')
has no way to know which range the original input came from.  So it's not
really round-trippable across range(128) either unless more info is given to
.encode().

> 2. remove the SUBSTITUTE mappings from CP875, acknowledging that
>    apparently these characters have no meaning in that code page.
>    Unfortunately, I could not find any official IBM documentation
>    page that lists the characters supported in each of the EBCDIC
>    code pages.
>
> The second seems to be more corrrect to me, although it is a deviation
> from the Unicode consortium publications.

Until you and MAL agree on the best thing to do (I have no opinion:  my only
exposure to Unicode in daily programming life remains the Python test suite),
I'm going to opt for #1:  as cp875.py stands today, it's simply a fact that
it's not round-trippable across any range including 0x3f.




RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4