tim wrote: > > That doesn't help with regexes, of course, since a pattern might be > > written as a regular string but be intended to match Unicode. Maybe > > the simplest rule is the best; always take 4 digits, even if it = winds > > up being incompatible with the \x in string literals. >=20 > I vote for backward compatibility for now, and not only because that = will > irritate /F the most. backward compatibility with what? 8-bit string literals or unicode string literals? the problem here is that the pattern is compiled once (from either 8-bit or unicode strings), and can then be used on either 8-bit or unicode targets. to be fully backwards compatible, this means that the compiler should use 8 bits, no matter what string type you're using. another solution would be to use the type of the pattern string to choose between 8 and 16 bits. I almost implemented that, before I realized that it broke the following rather nice property: sre.compile("some pattern") =3D=3D sre.compile(u"some pattern") (well, the pattern type doesn't implement __cmp__, but you get the idea). the current implementation guarantees "=3D=3D", but I'm planning to change that to "is" (!). anyway, I suspect it's too late to change this in 2.0b1. if enough people complain about this, we can always label it a "critical bug", and do something about it in b2. </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4