RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2002-March/021045.html below:

[Python-Dev] PEP 263 considered faulty (for some Japanese)

[Python-Dev] PEP 263 considered faulty (for some Japanese)Stephen J. Turnbull stephen@xemacs.org
13 Mar 2002 18:11:42 +0900

Previous message: [Python-Dev] PEP 263 considered faulty (for some Japanese)
Next message: [Python-Dev] PEP 263 considered faulty (for some Japanese)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

>>>>> "Martin" == Martin v Loewis <martin@v.loewis.de> writes:

    Martin> Reliable detection of encodings is a good thing, though,

I would think that UTF-8 can be quite reliably detected without the
"BOM".

I suppose you could construct short ambiguous sequences easily for
ISO-8859-[678] (which are meaningful in the corresponding natural
language), but it seems that even a couple dozen characters would make
the odds astronomical that "in the wild" syntactic UTF-8 is intended
to be UTF-8 Unicode (assuming you're expecting a text file, such as
Python source).  Is that wrong?  Have you any examples?  I'd be
interested to see them; we (XEmacs) have some ideas about
"statistical" autodetection of encodings, and they'd be useful test
cases.

    Martin> as the Web has demonstrated.

But the Web in general provides (mandatory) protocols for identifying
content-type, yet I regularly see HTML files with incorrect http-equiv
meta elements, and XHTML with no encoding declaration containing Shift
JIS.  Microsoft software for Japanese apparently ignores Content-Type
headers and the like in favor of autodetection (probably because the
same MS software regularly relies on users to set things like charset
parameters in MIME Content-Type).

I can't tell my boss that his mail is ill-formed (well, not to any
effect).  So I'd really love to be able to watch his face when Python
2.3 tells him his program is not legally encoded.

But I guess that's not convincing enough reason for Guido to mandate
UTF-8.<wink>



-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
              Don't ask how you can "do" free software business;
              ask what your business can "do for" free software.

Previous message: [Python-Dev] PEP 263 considered faulty (for some Japanese)
Next message: [Python-Dev] PEP 263 considered faulty (for some Japanese)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4