A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2002-March/021279.html below:

[Python-Dev] PEP 263 - default encoding

[Python-Dev] PEP 263 - default encodingStephen J. Turnbull stephen@xemacs.org
16 Mar 2002 12:20:08 +0900
>>>>> "Guido" == Guido van Rossum <guido@python.org> writes:

    >> a. Does this really make sense for UTF-16?  It looks to me like
    >> a great way to induce bugs of the form "write a unicode literal
    >> containing 0x0A, then translate it to raw form by stripping the
    >> u prefix."

    Guido> Of course not. I don't expect anyone to put UTF-16 in their
    Guido> source encoding cookie.

Mr. Suzuki's friends.  People who use UTF-16 strings in other
applications (eg Java), but otherwise are happy with English.

    Guido> But should we bother making a list of encodings that
    Guido> shouldn't be used?

I would say yes.  People will find reasons to inflict harm on
themselves if you don't.

    >> b. No editor is likely to implement correct display to
    >> distinguish between u"" and just "".

    Guido> That's fine.  Given phase 2, the editor should display the
    Guido> entire file using the encoding given in the cookie, despite
    Guido> that phase 1 only applies the encoding to u"" literals.
    Guido> The rest of the file is supposed to be ASCII, and if it
    Guido> isn't, that's the user's problem.

Huh?  I thought that people were regularly putting arbitrary text into
ordinary strings, and that the whole purpose of this PEP was to extend
that practice to Unicode.

Are you going to deprecate the practice of putting KOI8-R into
ordinary strings?  This means that Cyrillic users have stop doing
that, change the string to Unicode, and apply codecs on IO.  They
aren't going to bother in phase 1, will have a rude surprise in phase
2.  That's human nature, of course, but I don't see how it serves
Python to risk it.

    >> e. This causes problems for UTF-8 transition, since people will
    >> want to put arbitrary byte strings in a raw string.

    Guido> I'm not sure I understand.  What do you call a raw string?
    Guido> Do you mean an r"" literal?  Why would people want to use
    Guido> that for arbitrary binary data?  Arbitrary binary data
    Guido> should *always* be encoded using \xDD hex or \OOO octal
    Guido> escapes.

raw -> non-Unicode here.  Incorrect usage, my apologies.  "Arbitrary"
was the wrong word too, I mean non-UTF-8.  Eg, iso-8859-1 0xFF.  I
would have not problem with requiring people to use escapes to write
non-English strings.  But the whole point of this PEP is to allow
people to write those in their native encodings (for Unicode strings).
People are going to continue to squirt implicitly coded octet-strings
at their terminals (which just happen to have an appropriate font
installed<wink>) and expect it to work.

AFAICT this interpretation of the PEP saves no pain, simply postpones
it.  Worse, people who don't understand it fully are going to believe
it sanctions arbitrary encodings in string literals.  I don't see how
you can avoid widespread misunderstanding of that sort unless you have
the parser refuse to execute the program---it may actually increase
the pain when phase 2 starts.

    Guido> Sounds like a YAGNI to me.

Could be.  I'm sorry I can't be less fuzzy about the specific points.
But then, that's the whole problem, really---we're trying to serve
natural language usage which is inherently fuzzy.

I see lots of potential problems in interpretation of this PEP by the
people it's intended to serve: those who are attached to some native
encoding.  Better to raise each now, and have the scorn it deserves
heaped high, than to say "we coulda guessed this would happen" later.

If you think it's getting too abstract to be useful, I'll be quiet
until I've got something more concrete.  I'm hoping the the discussion
seems useful despite the fuzz.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
              Don't ask how you can "do" free software business;
              ask what your business can "do for" free software.



RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4