Fredrik Lundh wrote: > > mal wrote: > > > > I may be being dense, but can you explain what's going on here: > > > > > > ->> u'\u00e3'.encode('latin-1') > > > '\xe3' > > > ->> u'\u00e3'.encode("latin-1").decode("latin-1") > > > Traceback (most recent call last): > > > File "<input>", line 1, in ? > > > UnicodeError: ASCII encoding error: ordinal not in range(128) > > > > The string.decode() method will try to reuse the Unicode > > codecs here. To do this, it will have to convert the string > > to Unicode first and this fails due to the character not being > > in the ASCII range. > > can you take that again? shouldn't michael's example be > equivalent to: > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > if not, I'd argue that your "decode" design is broken, instead > of just buggy... Well, it is sort of broken, I agree. The reason is that PyString_Encode() and PyString_Decode() guarantee the returned object to be a string object. To be able to reuse Unicode codecs I added code which converts Unicode back to a string in case the codec return an Unicode object (which the .decode() method does). This is what's failing. Perhaps I should simply remove the restriction and have both APIs return the codec's return object as-is ?! (I would be in favour of this, but I'm not sure whether this is already in use by someone...) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4