On 18/06/2012 00:55, Nick Coghlan wrote: > On Mon, Jun 18, 2012 at 6:41 AM, Guido van Rossum<guido at python.org> wrote: >> Would it make sense to detect and reject these in 3.3 if the 2.7 syntax is >> used? > > Possibly - I'm trying not to actually *change* any of the internals of > the string literal processing, though. (If I recall the way we > implemented the change correctly, by the time we get to processing the > string contents, we've forgotten which specific prefix was used) > > However, tis question did remind me of another detail I wanted to > check after realising this discrepancy existed: it turns out this > semantic inconsistency already arises if you use "from __future__ > import unicode_literals" to get supposedly "Python 3 style" string > literals in 2.x > > Python 2.7.3 (default, May 29 2012, 14:54:22) >>>> from __future__ import unicode_literals >>>> print(r"\u03b3") > γ >>>> print("\u03b3") > γ > > Python 3.2.1 (default, Jul 11 2011, 18:54:42) >>>> print(r"\u03b3") > \u03b3 >>>> print("\u03b3") > γ > > So, perhaps the answer is to leave this as is, and try to make 2to3 > smart enough to detect such escapes and replace them with their > properly encoded (according to the source code encoding) Unicode > equivalent? What if it's not possible to encode that character? I suppose that it could be expanded into a string expression so that a non-raw string literal could be used, possibly using implicit concatenation, parenthesised, if necessary (or always?). > After all, that's already the way to include such characters in a > forward compatible way when using the future import: > > Python 2.7.3 (default, May 29 2012, 14:54:22) >>>> from __future__ import unicode_literals >>>> print("γ") > γ >>>> print(r"γ\n") > γ\n > > Python 3.2.1 (default, Jul 11 2011, 18:54:42) >>>> print("γ") > γ >>>> print(r"γ\n") > γ\n > > So, rather than going ahead with reverting "ur" support as I first > suggested (since it turns out that's not a *new* problem, but just a > different way of spelling an *existing* problem), how about I do the > following: > > 1. Add a note to PEP 414 and the Py3k porting guide regarding the > discrepancy in escaping semantics for raw Unicode strings between 2.x > and 3.x > 2. Reject the tracker issue for reverting the ur support (the semantic > problem already exists, and any solution we come up with for > __future__.unicode_literals should handle the ur prefix as well) > 3. Create a new feature request for 2to3 to see if it can > automatically handle the problem of translating "\u" and "\U" escapes > into properly encoded Unicode characters > > The scope of the problem is really quite small: you have to be using a > raw Unicode string in 2.x (either via the string prefix, or the future > import) *and* using a "\u" or "\U" escape within that string. > [snip]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4