"M.-A. Lemburg" <mal@lemburg.com> writes: > Fredrik Lundh wrote: > > can you take that again? shouldn't michael's example be > > equivalent to: > > > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > > > if not, I'd argue that your "decode" design is broken, instead > > of just buggy... > > Well, it is sort of broken, I agree. The reason is that > PyString_Encode() and PyString_Decode() guarantee the returned > object to be a string object. To be able to reuse Unicode codecs > I added code which converts Unicode back to a string in case the > codec return an Unicode object (which the .decode() method does). > This is what's failing. It strikes me that if someone executes aString.decode("latin-1") they're going to expect a unicode string. AIUI, what's currently happening is that the string is converted from a latin-1 8-bit string to the 16-bit unicode string I expected and then there is an attempt to convert it back to an 8-bit string using the default encoding. So if I'd done a sys.setdefaultencoding("latin-1") in my sitecustomize.py, then aString.decode("latin-1") would just be aString again? This doesn't seem optimal. > Perhaps I should simply remove the restriction and have both APIs > return the codec's return object as-is ?! (I would be in favour of > this, but I'm not sure whether this is already in use by someone...) Are all the codecs ditributed with Python 2.1 unicode-related? If that's the case, PyString_Decode isn't terribly useful is it? It seems unlikely that it received much use. Could be wrong of course. OTOH, maybe I'm trying to wedge to much behaviour onto a a particular operation. Do we want open(file).read().decode("jpeg") -> some kind of PIL object to be possible? Cheers, M. -- GET *BONK* BACK *BONK* IN *BONK* THERE *BONK* -- Naich using the troll hammer in cam.misc
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4