Neil Schemenauer wrote: > Forgive me if I'm being obtuse, but I'm trying to understand the > overall Python unicode design. This works: > > >>> sys.getdefaultencoding() > 'utf-8' > >>> str(A()) > '\xe1\x88\xb4' Ah, ok, so you have changed sys.getdefaultencoding on your installation. Doing so means that some programs will only run on your installation, but not on others (e.g. mine). One shouldn't change the default encoding away from ASCII except to work around buggy applications which would fail because of their unicode-unawareness. > Can you be more specific about what is incorrect with the above > class? In the default installation, it gives a UnicodeEncodeError. >>No. In some cases, str() needs to compromise, where unicode() >>doesn't. > > > Sorry, I don't understand that statement. Are you saying that we > will eventually get rid of __str__ and only have __unicode__? No. Eventually, when strings are Unicode objects, the string conversion function will return such a thing. Whether this will be called __str__, __unicode__, or __string__, I don't know. However, this won't happen until Python 3, and it is not clear to me how it will look. We may also need a conversion routine into byte strings. > If only we could. :-) Seriously though, I'm trying to understand > the point of __unicode__. To me it seems to make the transition to > unicode string needlessly more complicated. Why do you say that? You don't *have* to implement __unicode__ if you don't need it - just like as you don't have to implement __len__ or __nonzero__: If your class is fine with the standard "non-None is false", implement neither. If your conceptually have a sequence type, implement __len__ for "empty is false". If you have a more different class, implement __nonzero__ for "I decide what false is". Likewise, if you are happy with the standard '<Foo instance>', implement neither __str__ nor __unicode__. If your class has a canonical byte string representation, implement __str__. If this byte string representation is not meaningful ASCII, and if a more meaningful string representation using other Unicode characters would be possible, also implement __unicode__. Never rely on the default encoding being something other than ASCII, though. Eventually, when strings are Unicode objects, you won't be able to change it. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4