Just van Rossum wrote: > M.-A. Lemburg wrote: > > >>Just van Rossum wrote: >> >>>Now that PEP 263 is in place (yet hotly debated on c.l.py ;-), >>>wouldn't it be fairly small step to fully support unicode strings >>>in compile(), eval() and exec? I notice these still attempt to >>>convert unicode to 8 bit with the default encoding, which isn't >>>very useful. >> >>Patches are most welcome. > > Some guidance on where to look is more than welcome. The tokenizer/compiler works as follows (quote from another email): """ source code using encoding ENC -> via codec for ENC into Unicode -> via UTF-8 codec into UTF-8 string -> tokenizer -> compiler for 8-bit string literals in the source code -> UTF-8 string is converted back into encoding ENC Provided that the encoding ENC is roundtrip safe for all 256 base character ordinals, 8-bit strings will turn out as-is in the compiled byte code. """ Now, to accept Unicode it would probably be worthwhile hooking into this chain at step 2 rather than step 1 (the code for the tokenizer is in Parser/tokenizer.c, the compiler code in Python/compiler.c), however, this is difficult because most APIs for compiling code are built on char* buffers. A short-term solution would probably be to convert Unicode to UTF-8 and prepend a UTF-8 BOM mark so that the tokenizer knows that it is getting UTF-8. Haven't tested this though. A slightly better solution (on narrow Unicode Python builds) would be to use UTF-16 for this. The UTF-16 support in the tokenizer would have to be enabled for this, though. It is currently disabled for some reason I don't remember. Martin should know... but he's on vacation. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Feb 09 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 51 days left EuroPython 2003, Charleroi, Belgium: 135 days left
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4