RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2000-October/009808.html below:

[Python-Dev] Tcl and Unicode

[Python-Dev] Tcl and UnicodeFredrik Lundh Fredrik Lundh" <effbot@telia.com
Sun, 8 Oct 2000 13:04:50 +0200

Previous message: [Python-Dev] Tcl and Unicode
Next message: [Python-Dev] A standard lexer?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

guido:
> > This *should* be correct because Tcl/Tk always uses UTF-8 internally.
> > (Even though it is "lenient" when receiving strings -- if a sequence
> > of characters has no valid Unicode representation, it appears to falls
> > back to Latin-1; I don't know the details of this algorithm.)

Tcl/Tk uses a 16-bit (UCS-2) unicode string type internally, but
their 8-bit strings use UTF-8.

When converting from external 8-bit strings to unicode, they
convert valid UTF-8 sequences to unicode characters just like
Python, but "a lead-byte not followed by enough trail-bytes
represents itself." (in other words, it's cast from an unsigned
char to an unsigned short).

And the chance that any reasonable Latin-1 string would contain
a UTF-8 lead bytes followed by the right number of UTF-8 trail
bytes is close to zero...

(in case anyone wonders, Python's codec thinks that "close
to zero" isn't good enough, so it raises an exception instead)

tim:
> Dunno, but wouldn't be surprised if they had a notion of default encoding,
> and that it simply appears to be Latin-1 to us because American Windows uses
> a superset of Latin-1.

They have a system encoding, but it's not used here -- it's just
that Latin-1 is a subset of Unicode...

</F>

Previous message: [Python-Dev] Tcl and Unicode
Next message: [Python-Dev] A standard lexer?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4