On 29 Sep 2010, at 00:22, "Martin v. Löwis" <martin at v.loewis.de> wrote: >> I certainly wouldn't be opposed to an API that accepts a string as well >> though. > > Notice that this can't really work for Python 2 source code (but of > course, it doesn't need to). > > In Python 2, if you have a string literal in the source code, you need > to know the source encoding in order to get the bytes *back*. Now, > if you parse a Unicode string as source code, and it contains byte > string literals, you wouldn't know what encoding to apply. > > Fortunately, Python 3 byte literals ban non-ASCII literal characters, > so assuming an ASCII-compatible encoding for the original source is > fairly safe. > The new API couldn't be ported to Python 2 •anyway•. As Nick pointed out, the underlying tokenization happens on decoded strings - so starting with source as Unicode will be fine. Michael > Regards, > Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4