Guido van Rossum wrote: > > > Let me tell you why you would want to have an encoding > > which can be set: > > > > (1) sday I am on a Japanese Windows box, I have a > > string called 'address' and I do 'print address'. If > > I see utf8, I see garbage. If I see Shift-JIS, I see > > the correct Japanese address. At this point in time, > > utf8 is an interchange format but 99% of the world's > > data is in various native encodings. > > > > Analogous problems occur on input. > > > > (2) I'm using htmlgen, which 'prints' objects to > > standard output. My web site is supposed to be > > encoded in Shift-JIS (or EUC, or Big 5 for Taiwan, > > etc.) Yes, browsers CAN detect and display UTF8 but > > you just don't find UTF8 sites in the real world - and > > most users just don't know about the encoding menu, > > and will get pissed off if they have to reach for it. > > > > Ditto for streaming output in some protocol. > > > > Java solves this (and we could too by hacking stdout) > > using Writer classes which are created as wrappers > > around an output stream and can take an encoding, but > > you lose the flexibility to 'just print'. > > > > I think being able to change encoding would be useful. > > What I do not want is to auto-detect it from the > > operating system when Python boots - that would be a > > portability nightmare. > > You almost convinced me there, but I think this can still be done > without changing the default encoding: simply reopen stdout with a > different encoding. This is how Java does it. I/O streams with an > encoding specified at open() are a very powerful feature. You can > hide this in your $PYTHONSTARTUP. True and it probably covers all cases where setting the default encoding to something other than UTF-8 makes sense. I guess you've convinced me there ;-) The current proposal has wrappers around stream for this purpose: For explicit handling of Unicode using files, the unicodec module could provide stream wrappers which provide transparent encoding/decoding for any open stream (file-like object): import unicodec file = open('mytext.txt','rb') ufile = unicodec.stream(file,'utf-16') u = ufile.read() ... ufile.close() XXX unicodec.file(<filename>,<mode>,<encname>) could be provided as short-hand for unicodec.file(open(<filename>,<mode>),<encname>) which also assures that <mode> contains the 'b' character when needed. The above can be done using: import sys,unicodec sys.stdin = unicodec.stream(sys.stdin,'jis') sys.stdout = unicodec.stream(sys.stdout,'jis') -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 50 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4