Martin v. Löwis wrote: >>> The ability to change the default encoding is a misfeature. There's >>> essentially no way to write correct Python code in the presence of >>> this feature. >> How so? If every single piece of text in your project is encoded in a >> superset of ascii (such as utf-8), why would this be a problem? I guess I should have said "every single piece of text in your project is encoded in a superset of ascii (such as utf-8) or is decoded into a unicode object at the application boundaries, such as an incoming http request or in the process of parsing a file off disk", in which case: > What is "every single piece of text"? Every string occurring in source > code? Yes. > or also every single string that may be read from a file, Yes. > a > socket, Yes. > out of a database, Yes. > or from a user interface? Yes. Any others I can say Yes to? ;-) > How can you be certain that any string is UTF-8 when doing any > reasonable IO? Careful checking, and a knowledge for people working on the app's development that anything else will result in severe pain, both physical and mental ;-) >> Even if you were evil/stupid and mixed encodings, surely all you'd get >> is different unicode errors or mayvbe the odd strange character during >> display? > > One specific problem is dictionaries will stop working correctly if you > set the default encoding to anything but ASCII. ...except they haven't. > The reason is that > with UTF-8 as the default encoding, you get > > py> u"\u20ac" == u"\u20ac".encode("utf-8") > True > py> hash(u"\u20ac") == hash(u"\u20ac".encode("utf-8")) > False > > So objects that compare equal will not hash equal. As a consequence, you > may have two different values for what should be the same key in a > dictionary. Indeed, but this doesn't happen because the app never has a situation where strings and unicodes are put in the same dict. However, it does have plenty of situations where lists containing a mixture of utf-8 encoded strings and unicodes exist, where changing the default encoding removes a *lot* of pain. > It has worked in your application. See my example above: it is very easy > to create applications that stop working correctly if you use > setdefaultencoding (at all - the only supported value is "latin-1", > since Unicode strings hash the same as byte strings if all characters > are in row 0). Would anyone object if I added this snippet to the .rst that generates: http://docs.python.org/library/sys.html It doesn't seem to be recorded anywhere anyone who's likely to use setdefaultencoding is likely to find it... Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4