Michael Hudson wrote: > > "M.-A. Lemburg" <mal@lemburg.com> writes: > > > Fredrik Lundh wrote: > > > can you take that again? shouldn't michael's example be > > > equivalent to: > > > > > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > > > > > if not, I'd argue that your "decode" design is broken, instead > > > of just buggy... > > > > Well, it is sort of broken, I agree. The reason is that > > PyString_Encode() and PyString_Decode() guarantee the returned > > object to be a string object. To be able to reuse Unicode codecs > > I added code which converts Unicode back to a string in case the > > codec return an Unicode object (which the .decode() method does). > > This is what's failing. > > It strikes me that if someone executes > > aString.decode("latin-1") > > they're going to expect a unicode string. AIUI, what's currently > happening is that the string is converted from a latin-1 8-bit string > to the 16-bit unicode string I expected and then there is an attempt > to convert it back to an 8-bit string using the default encoding. So > if I'd done a > > sys.setdefaultencoding("latin-1") > > in my sitecustomize.py, then aString.decode("latin-1") would just be > aString again? This doesn't seem optimal. True and that's why I am proposing to losen the restriction on having the two APIs returning strings only. > > Perhaps I should simply remove the restriction and have both APIs > > return the codec's return object as-is ?! (I would be in favour of > > this, but I'm not sure whether this is already in use by someone...) > > Are all the codecs ditributed with Python 2.1 unicode-related? If > that's the case, PyString_Decode isn't terribly useful is it? It > seems unlikely that it received much use. Could be wrong of course. All standard codecs in 2.0 and 2.1 are Unicode related. I am planning to write up a bunch of string-to-string codecs next week though which will then be the first non-Unicode related codecs in 2.2. > OTOH, maybe I'm trying to wedge to much behaviour onto a a particular > operation. Do we want > > open(file).read().decode("jpeg") -> some kind of PIL object > > to be possible? This would be possible indeed. Even though some may find this coding style obscure, I think this technique has the same usefulness as e.g. piping at OS level. I am thinking of these use cases: "äöü".decode("latin-1") -> Unicode (object construction) "...jpeg data...".decode("jpeg") -> JpegImage object (dito) "äöü".decode("latin-1").encode("cp1521") -> string (recoding data) "...long data...".encode("gzip") -> string (transfer encoding) "...gzipped data...".decode("gzip") -> string (transfer decoding) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4