Guido van Rossum wrote: > > Thinking about entering Japanese into raw_input() in IDLE more, I > thought I figured a way to give Takeuchi a Unicode string when he > enters Japanese characters. > > I added an experimental patch to the readline method of the PyShell > class: if the line just read, when converted to Unicode, has fewer > characters but still compares equal (and no exceptions happen during > this test) then return the Unicode version. > > This doesn't currently work because the built-in raw_input() function > requires that the readline() call it makes internally returns an 8-bit > string. Should I relax that requirement in general? (I could also > just replace __builtin__.[raw_]input with more liberal versions > supplied by IDLE.) > > I also discovered that the built-in unicode() function is not > idempotent: unicode(unicode('a')) returns u'\000a'. I think it should > special-case this and return u'a' ! Good idea. I'll fix this in the next round. > Finally, I believe we need a way to discover the encoding used by > stdin or stdout. I have to admit I know very little about the file > wrappers that Marc wrote -- is it easy to get the encoding out of > them? I'm not sure what you mean: the name of the input encoding ? Currently, only the names of the encoding and decoding functions are available to be queried. > IDLE should probably emulate this, as it's encoding is clearly > UTF-8 (at least when using Tcl 8.1 or newer). It should be possible to redirect sys.stdin/stdout using the codecs.EncodedFile wrapper. Some tests show that raw_input() doesn't seem to use the redirected sys.stdin though... >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') >>> s = raw_input() äöü >>> s '\344\366\374' >>> s = sys.stdin.read() äöü >>> s '\303\244\303\266\303\274\012' -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4