RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2000-May/004106.html below:

[Python-Dev] Unicode

[Python-Dev] UnicodeMartin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 17 May 2000 00:02:10 +0200

Previous message: [Python-Dev] Unicode
Next message: [Python-Dev] Unicode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> perfectionist or not, I only want Python's Unicode support to
> be as intuitive as anything else in Python.  as it stands right
> now, Perl and Tcl's Unicode support is intuitive.  Python's not.

I haven't much experience with Perl, but I don't think Tcl is
intuitive in this area. I really think that they got it all wrong.
They use the string type for "plain bytes", just as we do, but then
have the notion of "correct" and "incorrect" UTF-8 (i.e. strings with
violations of the encoding rule). For a "plain bytes" string, the
following might happen

- the string is scanned for non-UTF-8 characters
- if any are found, the string is converted into UTF-8, essentially
  treating the original string as Latin-1.
- it then continues to use the UTF-8 "version" of the original string,
  and converts it back on demand.

Maybe I got something wrong, but the Unicode support in Tcl makes me
worry very much.

> btw, I thought we'd all agreed on GvR's solution for 1.6?
> 
> what did I miss?

I like the 'only ASCII is converted' approach very much, so I'm not
objecting to that solution - just as I wasn't objecting to the
previous one.

> so tell me, if "good enough" is what we're aiming at, why isn't
> my counter-proposal good enough?

Do you mean the one in

http://www.python.org/pipermail/python-dev/2000-April/005218.html

which I suppose is the same one as the "java-like approach"? AFAICT,
all it does is to change the default encoding from UTF-8 to Latin-1.
I can't follow why this should be *better*, but it would be certainly
as good... In comparison, restricting the "character" interpretation
of the string type (in terms of your proposal) to 7-bit characters
has the advantage that it is less error-prone, as Guido points out.

Regards,
Martin

Previous message: [Python-Dev] Unicode
Next message: [Python-Dev] Unicode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4