>> Fix for next iteration of SF bug 115690 (Unicode headaches in >> IDLE). ... [Guido] > I apologize, I should have explained when text.get() returns Unicode: > > Any string returned from Tcl/Tk that contains a byte with the 8th bit > set is translated from UTF-8 into Unicode, unless the translation > fails (in which case the original raw 8-bit string is returned as a > fallback). Except that's *why* it was muddy <wink>: in the specific case that popped up in the bug, text.get() appeared to return a Unicode string of length 1 containing only a newline. No high-bit byte appeared to be involved. However, that was an illusion I didn't unmask until later. All is clear now. > This *should* be correct because Tcl/Tk always uses UTF-8 internally. > (Even though it is "lenient" when receiving strings -- if a sequence > of characters has no valid Unicode representation, it appears to falls > back to Latin-1; I don't know the details of this algorithm.) Dunno, but wouldn't be surprised if they had a notion of default encoding, and that it simply appears to be Latin-1 to us because American Windows uses a superset of Latin-1. If BeOpen would like to buy me a version of Chinese Windows, happy to lend it to you <wink>. as-american-as-they-come-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4