Guido van Rossum wrote: > ... > I would have concluded that the buffer object is entirely useless, if > it weren't for some very light use that is being made of it by the > Unicode machinery. I can't quite tell whether that was done just > because it was convenient, or whether that shows there is a real > need. I used the buffer object since I thought that buffer() objects were to replace strings as container for binary data. The buffer object wraps a memory buffer into a Python object for the purpose of decoding it into Unicode. 8-bit string objects would have worked just as well... > What Now? > --------- > > I'm not convinced that we need the buffer object at all. For example, > the mmap module defines a sequence object so doesn't seem to need the > buffer object to help it support slices. It would be nice to have an object for "copy by reference" rather than "malloc + copy". This would be useful for strings (e.g. to access substrings of a large string), Unicode and binary data. The buffer object almost does this... it would only have to stick to always returning buffer objects in coercion, slicing etc. I also think that the name "buffer" is misleading, since it really means "reference" in the context published by the Python interface (the C API also has a way of defining new malloc areas and referencing them through the buffer interface, but that is not published in Python). The other missing data type in Python is one for binary data. Currently, string objects are in common use for this kind of data. The problems with this are obvious: in some contexts strings are expected to contain text data in other binary data. When the two meet there's great confusion. I'd suggest either making arrays the Python standard type for holding binary data, or creating a completely new type (this should then be called something like "buffer"). > Regarding the buffer API, it's clearly useful, although I'm not > convinced that it needs the multiple segment count option or the char > vs. binary buffer distinction, given that we're not using this for > Unicode objects as we originally planned. True. > I also feel that it would be helpful if there was an explicit way to > lock and unlock the data, so that a file object can release the global > interpreter lock while it is doing the I/O. But that's not a high > priority (and there are no *actual* problems caused by the lack of > such an API -- just *theoretical*). How about adding a generic low-level lock type for these kind of tasks. The interpreter could be made aware of these types to allow a much more fine-grained lock mechanism, e.g. to check for acquired locks of certain objects only. > For Python 3000, I think I'd like to rethink this whole mess. Perhaps > byte buffers and character strings should be different beasts, and > maybe character strings could have Unicode and 8-bit subclasses (and > maybe other subclasses that explicitly know about their encoding). > And maybe we'd have a real file base class. And so on. Great... but 3000 is a long way ahead :-( > What to do in the short run? I'm still for severely simplifing the > buffer object (ripping out the unused operations) and deprecating it. Since it isn't all that known anyway, how about streamlining the buffer object implementations of the various protocols and removing the distinction between "s" and "t" ?! -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4