[Guido going ASCII] Do you mean going ASCII all the way (using it for all aspects where Unicode gets converted to a string and cases where strings get converted to Unicode), or just for some aspect of conversion, e.g. just for the silent conversions from strings to Unicode ? [BTW, I'm pretty sure that the Latin-1 folks won't like ASCII for the same reason they don't like UTF-8: it's simply an inconvenient way to write strings in their favorite encoding directly in Python source code. My feeling in this whole discussion is that it's more about convenience than anything else. Still, it's very amusing ;-) ] FYI, here's the conversion table of (potentially) all conversions done by the implementation: Python: ------- string + unicode: unicode(string,'utf-8') + unicode string.method(unicode): unicode(string,'utf-8').method(unicode) print unicode: print unicode.encode('utf-8'); with stdout redirection this can be changed to any other encoding str(unicode): unicode.encode('utf-8') repr(unicode): repr(unicode.encode('unicode-escape')) C (PyArg_ParserTuple): ---------------------- "s" + unicode: same as "s" + unicode.encode('utf-8') "s#" + unicode: same as "s#" + unicode.encode('unicode-internal') "t" + unicode: same as "t" + unicode.encode('utf-8') "t#" + unicode: same as "t#" + unicode.encode('utf-8') This effects all C modules and builtins. In case a C module wants to receive a certain predefined encoding, it can use the new "es" and "es#" parser markers. Ways to enter Unicode: ---------------------- u'' + string same as unicode(string,'utf-8') unicode(string,encname) any supported encoding u'...unicode-escape...' unicode-escape currently accepts Latin-1 chars as single-char input; using escape sequences any Unicode char can be entered (*) codecs.open(filename,mode,encname) opens an encoded file for reading and writing Unicode directly raw_input() + stdin redirection (see one of my earlier posts for code) returns UTF-8 strings based on the input encoding IO: --- open(file,'w').write(unicode) same as open(file,'w').write(unicode.encode('utf-8')) open(file,'wb').write(unicode) same as open(file,'wb').write(unicode.encode('unicode-internal')) codecs.open(file,'wb',encname).write(unicode) same as open(file,'wb').write(unicode.encode(encname)) codecs.open(file,'rb',encname).read() same as unicode(open(file,'rb').read(),encname) stdin + stdout can be redirected using StreamRecoders to handle any of the supported encodings -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4