A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2018-May/153613.html below:

[Python-Dev] Why aren't escape sequences in literal strings handled by the tokenizer?

[Python-Dev] Why aren't escape sequences in literal strings handled by the tokenizer? [Python-Dev] Why aren't escape sequences in literal strings handled by the tokenizer?Eric V. Smith eric at trueblade.com
Thu May 17 18:38:59 EDT 2018
On 5/17/2018 3:01 PM, Larry Hastings wrote:
> 
> 
> I fed this into tokenize.tokenize():
> 
>     b''' x = "\u1234" '''
> 
> I was a bit surprised to see \Uxxxx in the output.  Particularly because 
> the output (t.string) was a *string* and not *bytes*.

For those (like me) who have no idea how to use tokenize.tokenize's 
wacky interface, the test code is:

list(tokenize.tokenize(io.BytesIO(b''' x = "\u1234" ''').readline))

> Maybe I'm making a parade of my ignorance, but I assumed that string 
> literals were parsed by the parser--just like everything else is parsed 
> by the parser, hey it seems like a good place for it--and in particular 
> that the escape sequence substitutions would be done in the tokenizer.  
> Having stared at it a little, I now detect a whiff of "this design 
> solved a real problem".  So... what was the problem, and how does this 
> design solve it?

I assume the intent is to not throw away any information in the lexer, 
and give the parser full access to the original string. But that's just 
a guess.

> BTW, my use case is that I hoped to use CPython's tokenizer to parse 
> some Python-ish-looking text and handle double-quoted strings for me.  
> *Especially* all the escape sequences--leveraging all CPython's support 
> for funny things like \U{penguin}.  The current behavior of the 
> tokenizer makes me think it'd be easier to roll my own!

Can you feed the token text to the ast?

 >>> ast.literal_eval('"\u1234"')
'ሴ'

Eric
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4