Greg Stein wrote: > > [ damn, I wish people would pay more attention to changing the subject > line to reflect the contents of the email ... I could not figure out if > there were any further responses to this without opening most of those > dang "Unicode debate" emails. sheesh... ] > > On Tue, 2 May 2000, M.-A. Lemburg wrote: > > Guido van Rossum wrote: > > > > > > [MAL] > > > > Let's not do the same mistake again: Unicode objects should *not* > > > > be used to hold binary data. Please use buffers instead. > > > > > > Easier said than done -- Python doesn't really have a buffer data > > > type. > > The buffer object. We *do* have the type. > > > > Or do you mean the array module? It's not trivial to read a > > > file into an array (although it's possible, there are even two ways). > > > Fact is, most of Python's standard library and built-in objects use > > > (8-bit) strings as buffers. > > For historical reasons only. It would be very easy to change these to use > buffer objects, except for the simple fact that callers might expect a > *string* rather than something with string-like behavior. Would this be a too drastic change, then ? I think that we should at least make use of buffers in the standard lib. > > >... > > > > BTW, I think that this behaviour should be changed: > > > > > > > > >>> buffer('binary') + 'data' > > > > 'binarydata' > > In several places, bufferobject.c uses PyString_FromStringAndSize(). It > wouldn't be hard at all to use PyBuffer_New() to allow the memory, then > copy the data in. A new API could also help out here: > > PyBuffer_CopyMemory(void *ptr, int size) > > > > > while: > > > > > > > > >>> 'data' + buffer('binary') > > > > Traceback (most recent call last): > > > > File "<stdin>", line 1, in ? > > > > TypeError: illegal argument type for built-in operation > > The string object can't handle the buffer on the right side. Buffer > objects use the buffer interface, so they can deal with strings on the > right. Therefore: asymmetry :-( > > > > > IMHO, buffer objects should never coerce to strings, but instead > > > > return a buffer object holding the combined contents. The > > > > same applies to slicing buffer objects: > > > > > > > > >>> buffer('binary')[2:5] > > > > 'nar' > > > > > > > > should prefereably be buffer('nar'). > > Sure. Wouldn't be a problem. The FromStringAndSize() thing. Right. Before digging deeper into this, I think we should here Guido's opinion on this again: he said that he wanted to use Java's binary arrays for binary data... perhaps we need to tweak the array type and make it more directly accessible (from C and Python) instead. > > > Note that a buffer object doesn't hold data! It's only a pointer to > > > data. I can't off-hand explain the asymmetry though. > > > > Dang, you're right... > > Untrue. There is an API call which will construct a buffer object with its > own memory: > > PyObject * PyBuffer_New(int size) > > The resulting buffer object will be read/write, and you can stuff values > into it using the slice notation. Yes, but that API is not reachable from within Python, AFAIK. > > > > Hmm, perhaps we need something like a data string object > > > > to get this 100% right ?! > > Nope. The buffer object is intended to be exactly this. > > >... > > > Not clear. I'd rather do the equivalent of byte arrays in Java, for > > > which no "string literal" notations exist. > > > > Anyway, one way or another I think we should make it clear > > to users that they should start using some other type for > > storing binary data. > > Buffer objects. There are a couple changes to make this a bit easier for > people: > > 1) buffer(ob [,offset [,size]]) should be changed to allow buffer(size) to > create a read/write buffer of a particular size. buffer() should create > a zero-length read/write buffer. This looks a lot like function overloading... I don't think we should get into this: how about having the buffer() API take keywords instead ?! buffer(size=1024,mode='rw') - 1K of owned read write memory buffer(obj) - read-only referenced memory from obj buffer(obj,mode='rw') - read-write referenced memory in obj etc. Or we could allow passing None as object to obtain an owned read-write memory block (much like passing NULL to the C functions). > 2) if slice assignment is updated to allow changes to the length (for > example: buf[1:2] = 'abcdefgh'), then the buffer object definition must > change. Specifically: when the buffer object owns the memory, it does > this by appending the memory after the PyObject_HEAD and setting its > internal pointer to it; when the dealloc() occurs, the target memory > goes with the object. A flag would need to be added to tell the buffer > object to do a second free() for the case where a realloc has returned > a new pointer. > [ I'm not sure that I would agree with this change, however; but it > does make them a bit easier to work with; on the other hand, people > have been working with immutable strings for a long time, so they're > okay with concatenation, so I'm okay with saying length-altering > operations must simply be done thru concatenation. ] I don't think I like this either: what happens when the buffer doesn't own the memory ? > IMO, extensions should be using the buffer object for raw bytes. I know > that Mark has been updating some of the Win32 extensions to do this. > Python programs could use the objects if the buffer() builtin is tweaked > to allow a bit more flexibility in the arguments. Right. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4