A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2002-March/021100.html below:

[Python-Dev] PEP 263 considered faulty (for some Japanese)

[Python-Dev] PEP 263 considered faulty (for some Japanese)SUZUKI Hisao suzuki611@oki.com
Thu, 14 Mar 2002 15:10:05 +0900
>     SUZUKI> I should have appended to that, "And English people will
>     SUZUKI> distribute programs with no magic comments all over the
>     SUZUKI> world.  Japanese users will use them."
> But this "just works" as long as the default encoding is an ASCII
> superset (or even JIS X 0201 (^^; as Japanese users are now all
> equipped with YEN SIGN <-> REVERSE SOLIDUS codecs).

Yes, this is the problem I found.

>     SUZUKI> Certainly Japanese users are also free from putting
>     SUZUKI> encoding declarations, but we do not expect such programs
>     SUZUKI> to be usable in other countries than Japan, given the PEP
>     SUZUKI> as is.
> But this is also true for everyone else, except Americans.  All of the
> common non-ASCII encodings are non-universal and therefore
> non-portable, with the exception of UTF-8 and X Compound Text (and the
> latter is a non-starter in program sources because of the 0x22
> problem).

Indeed.  If we are to distribute Python programs to various
countries, I think, we must write them in UTF-8, _anyway_.
Under the PEP as is, the magic comment or BOM is mandatory,
unless all character codes happen to be less than 0x80.  This is
tedious and somewhat ugly, but not fatal.

> I myself objected to this PEP because I think it's far too easy for my
> Croatian (Latin-2) friend working in Germany to paste a Latin-1 quote
> into a Latin-2 file.  He'll do it anyway on occasion, but if we start
> insisting _now_ that "Python programs are written in UTF-8", we'll
> avoid a lot of mojibake.  12 years in Japan makes that seem an important
> goal.<wink>  But such multiscript processing is surely a lot more
> rare in any country but Japan.

I agree with you on the problem of "mojibake".  UTF-8 is the
sole encoding at present, in which people all over Asia, Europa,
or even the World, can cooperate on the same python source file
safely.

The PEP will serve us for making various local encodings for the
present to be "official".  It will not save us from the chaos of
the local encodings very much.

And almost every operating system in Japan is on the way to
adopt Unicode to save us from the chaos.  I am afraid the
mileage of the PEP will be fairly short and just results in
loading a lot of burden onto the language, though it is not
fatal in itself.

>     SUZUKI> BTW, when transmitting Python source code between Unix and
>     SUZUKI> Windows, we do not necessarily convert encodings.
> But this is bad practice.  You can do it if it works for you, but
> Python should not avoid useful changes because people are treating
> different encodings as the same!

I know it is not the best practice either.  However, you cannot
safely write Shift_JIS into Python source file anyway, unless
you hack up the interpreter parser itself for now.  Strictly
speaking, Shift_JIS is not compatible with ASCII, you know.
With the present Python as is, you are safe to write EUC-JP and
UTF-8 in source.

On a very serious project, it is reasonable to use the original
(i.e., not hacked) interpreter and (either EUC-JP or) UTF-8 both
on Unix and Windows even in the "present day, present time".

> There's a third option:
> 
>     3.  Make UTF-8 the only encoding acceptable for "standard Python",
>         and insert a hook for a codec to be automatically run on source
>         text.  Standard Python would _never_ put anything on this hook,
>         but an optional library would provide other codecs, including one
>         to implement PEP 263.
> 
> Guido thought the idea has merit, as an implementation.  Therefore
> UTF-8 would be encouraged by Python, but PEP 263 would give official
> sanction to the -*- coding: xxx -*- cookie.  And this would give you a
> lot of flexibility for experimentation (eg, with UTF-16 codecs, etc).

Certainly this will not load a burden onto the language itself
even if the mileage of the PEP is short.

--
SUZUKI Hisao <suzuki@acm.org> <suzuki611@okisoft.co.jp>



RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4