[Moshe Zadka] > ... > I'd much prefer Python to reflect a fundamental truth about Unicode, > which at least makes sure binary-goop can pass through Unicode and > remain unharmed, then to reflect a nasty problem with UTF-8 (not > everything is legal). Then you don't want Unicode at all, Moshe. All the official encoding schemes for Unicode 3.0 suffer illegal byte sequences (for example, 0xffff is illegal in UTF-16 (whether BE or LE); this isn't merely a matter of Unicode not yet having assigned a character to this position, it's that the standard explicitly makes this sequence illegal and guarantees it will always be illegal! the other place this comes up is with surrogates, where what's legal depends on both parts of a character pair; and, again, the illegalities here are guaranteed illegal for all time). UCS-4 is the closest thing to binary-transparent Unicode encodings get, but even there the length of a thing is contrained to be a multiple of 4 bytes. Unicode and binary goop will never coexist peacefully.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4