On Fri, Jun 30, 2000 at 04:18:13PM +0200, Fredrik Lundh wrote: > re.match('\\x00ffffffffffffff', '\377') != None >or in other words, long hexadecimal escapes are cast >down to 8-bit characters in RE. This is for compatibility with Python string literals: kronos Python-1.6>./python >>> '\x00fffffff' '\377' >>> u'\x00fffffff' u'\uFFFF' (Where do these semantics come from, BTW? C's \x seems to take any number of hex digits but then reports an error if the character is greater than 256, too large to fit into a byte.) Note that the \u escape for Unicode characters uses exactly 4 digits, no more, no less. It would certainly be simpler and clearer to only support a fixed number of digits with \x, since I find the casting down behaviour is magical and not obvious. But I don't know if we want to make that change now. (Guido now realizes the downside to numbering it 2.0, as everyone hurries to suggest their favorite backward-incompatible change.) That doesn't help with regexes, of course, since a pattern might be written as a regular string but be intended to match Unicode. Maybe the simplest rule is the best; always take 4 digits, even if it winds up being incompatible with the \x in string literals. --amk
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4