On 14.06.13 23:03, PJ Eby wrote: > On Fri, Jun 14, 2013 at 2:11 PM, Ron Adam <ron3200 at gmail.com> wrote: >> >> >> On 06/14/2013 10:36 AM, Guido van Rossum wrote: >>> >>> Not a bug. The same is done for file input -- CRLF is changed to LF before >>> tokenizing. >> >> >> >> Should this be the same? >> >> >> python3 -c 'print(bytes("""\r\n""", "utf8"))' >> b'\r\n' >> >> >>>>> eval('print(bytes("""\r\n""", "utf8"))') >> b'\n' > > No, but: > > eval(r'print(bytes("""\r\n""", "utf8"))') > > should be. (And is.) > > What I believe you and Walter are missing is that the \r\n in the eval > strings are converted early if you don't make the enclosing string > raw. So what you're eval-ing is not what you think you are eval-ing, > hence the confusion. I expected that eval()ing a string that contains the characters U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE U+000D: CR U+000A: LR U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE to return a string containing the characters: U+000D: CR U+000A: LR Making the string raw, of course turns it into: U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE U+005C: REVERSE SOLIDUS U+0072: LATIN SMALL LETTER R U+005C: REVERSE SOLIDUS U+006E: LATIN SMALL LETTER N U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE and eval()ing that does indeed give "\r\n" as expected. Hmm, it seems that codecs.unicode_escape_decode() does what I want: >>> codecs.unicode_escape_decode("\r\n\\r\\n\\x0d\\x0a\\u000d\\u000a") ('\r\n\r\n\r\n\r\n', 26) Servus, Walter
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4