M.-A. Lemburg wrote: > Now, to accept Unicode it would probably be worthwhile hooking > into this chain at step 2 rather than step 1 (the code for the > tokenizer is in Parser/tokenizer.c, the compiler code in > Python/compiler.c), however, this is difficult because most > APIs for compiling code are built on char* buffers. > > A short-term solution would probably be to convert Unicode to > UTF-8 and prepend a UTF-8 BOM mark so that the tokenizer > knows that it is getting UTF-8. Haven't tested this though. Hm. What I'm looking into now is to simply define a PyCompilerFlags flag called PyCF_SOURCE_IS_UTF8. eval() and compile() will then convert a unicode string to utf-8 and set this flag. This seems a very low-impact solution. Does this make sense? Just
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4