[Andrew Kuchling] > ... > This is for compatibility with Python string literals: > > kronos Python-1.6>./python > >>> '\x00fffffff' > '\377' > >>> u'\x00fffffff' > u'\uFFFF' > > (Where do these semantics come from, BTW? C's \x seems to take any > number of hex digits but then reports an error if the character is > greater than 256, too large to fit into a byte.) The behavior of \x in C is mostly implementation-defined. The committee knew that C had to do *something* to support "large characters" down the road, but in those early days they had no clear idea exactly what. So, rather than do something sensible <0.5 wink>, they invented a perfectly general mechanism without portable semantics. "C itself" isn't complaining if the character "is greater than 256", it's the specific implementation of C you're using that's complaining. A different implementation is free to (& probably will!) do something different. Guido adopted the most commonly implemented semantics (ignore all but the last byte) in Python, apparently under the delusion that this would be a Good Thing <wink>. Marc-Andre followed suit by generalizing this madness to Unicode. > Note that the \u escape for Unicode characters uses exactly 4 digits, > no more, no less. I pushed for that obnoxiously. Glad you appreciate it <wink>. Java does the same. > It would certainly be simpler and clearer to only support a fixed > number of digits with \x, since I find the casting down behaviour is > magical and not obvious. Yes, it's basically nuts. > But I don't know if we want to make that change now. No from me, because it may break stuff. Wait for Python 2.0 <ahem>. > (Guido now realizes the downside to numbering it 2.0, as everyone > hurries to suggest their favorite backward-incompatible change.) Guido always realized that, I believe. It's a "least of evils" kind of thing, mixed with a celebration, not a pure win. > That doesn't help with regexes, of course, since a pattern might be > written as a regular string but be intended to match Unicode. Maybe > the simplest rule is the best; always take 4 digits, even if it winds > up being incompatible with the \x in string literals. I vote for backward compatibility for now, and not only because that will irritate /F the most.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4