> I could live with this compromise as long as we document that a future > version may use the "character is a character" model. I just don't want > people to start depending on a catchable exception being thrown because > that would stop us from ever unifying unmarked literal strings and > Unicode strings. Agreed (as I've said before). > -- > > Are there any steps we could take to make a future divorce of strings > and byte arrays easier? What if we added a > > binary_read() > > function that returns some form of byte array. The byte array type could > be just like today's string type except that its type object would be > distinct, it wouldn't have as many string-ish methods and it wouldn't > have any auto-conversion to Unicode at all. You can do this now with the array module, although clumsily: >>> import array >>> f = open("/core", "rb") >>> a = array.array('B', [0]) * 1000 >>> f.readinto(a) 1000 >>> Or if you wanted to read raw Unicode (UTF-16): >>> a = array.array('H', [0]) * 1000 >>> f.readinto(a) 2000 >>> u = unicode(a, "utf-16") >>> There are some performance issues, e.g. you have to initialize the buffer somehow and that seems a bit wasteful. > People could start to transition code that reads non-ASCII data to the > new function. We could put big warning labels on read() to state that it > might not always be able to read data that is not in some small set of > recognized encodings (probably UTF-8 and UTF-16). > > Or perhaps binary_open(). Or perhaps both. > > I do not suggest just using the text/binary flag on the existing open > function because we cannot immediately change its behavior without > breaking code. A new method makes most sense -- there are definitely situations where you want to read in text mode for a while and then switch to binary mode (e.g. HTTP). I'd like to put this off until after Python 1.6 -- but it deserves attention. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4