On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote: >I like the idea of having encoding information carried with the data. >I don't think that an ebytes type that can *optionally* have an encoding >attribute makes the situation less confusing, though. Agreed. I think the attribute should always be there, but there probably needs to be a magic value (perhaps None) that indicates and unknown, manual, garbage, error, broken encoding. Examples: you read bytes off a socket and don't know what the encoding is; you concatenate two ebytes that have incompatible encodings. >To me the biggest >problem with python-2.x's unicode/bytes handling was not that it threw >exceptions but that it didn't always throw exceptions. You might test this >in python2:: > t = u'cafe' > function(t) > >And say, ah my code works. Then a user gives it this:: > t = u'café' > function(t) > >And get a unicode error because the function only works with unicode in the >ascii range. That's an excellent point. >ebytes seems to have the same pitfall where the code path exercised by your >tests could work with:: > eb = ebytes(b) > eb.encoding = 'euc-jp' > function(eb) > >but the user exercises a code path that does this and fails:: > eb = ebytes(b) > function(eb) > >What do you think of making the encoding attribute a mandatory part of >creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). If ebytes is a separate type, then definitely +1. If 'ebytes is bytes' then I'd probably want to default the second argument to the magical "i-don't-know' marker. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: <http://mail.python.org/pipermail/python-dev/attachments/20100621/344b2054/attachment.pgp>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4