On Sat, 13 Nov 1999, Mark Hammond wrote: >... > Im inclined to agree that holding 2 internal buffers for the unicode > object is not ideal. However, I _am_ concerned with getting decent > PyArg_ParseTuple and Py_BuildValue support, and if the cost is an > extra buffer I will survive. So lets look for solutions that dont > require it, rather than holding it up as evil when no other solution > is obvious. I believe Py_BuildValue is pretty straight-forward. Simply state that it is allowed to perform conversions and place the resulting object into the resulting tuple. (with appropriate refcounting) In other words: tuple = Py_BuildValue("U", stringOb); The stringOb will be converted to a Unicode object. The new Unicode object will go into the tuple (with the tuple holding the only reference!). The stringOb will NOT acquire any additional references. [ "U" format may be wrong; it is here for example purposes ] Okay... now the PyArg_ParseTuple() is the *real* kicker. >... > Prob1: > name = SomeComObject.GetFileName() # A Unicode object > f = open(name) > Prob2: > SomeComObject.SetFileName("foo.txt") Both of these issues are due to PyArg_ParseTuple. In Prob1, you want a string-like object which can be passed to the OS as an 8-bit string. In Prob2, you want a string-like object which can be passed to the OS as a Unicode string. I see three options for PyArg_ParseTuple: 1) allow it to return NEW objects which must be DECREF'd. [ current policy only loans out references ] This option could be difficult in the presence of errors during the parse. For example, the current idiom is: if (!PyArg_ParseTuple(args, "...")) return NULL; If an object was produced, but then a later argument cause a failure, then who is responsible for freeing the object? 2) like step 1, but PyArg_ParseTuple is smart enough to NOT return any new objects when an error occurred. This basically answers the last question in option (1) -- ParseTuple is responsible. 3) Return loaned-out-references to objects which have been tested for convertability. Helper functions perform the conversion and the caller will then free the reference. [ this is the model used in PyWin32 ] Code in PyWin32 typically looks like: if (!PyArg_ParseTuple(args, "O", &ob)) return NULL; if ((unicodeOb = GiveMeUnicode(ob)) == NULL) return NULL; ... Py_DECREF(unicodeOb); [ GiveMeUnicode is descriptive here; I forget the name used in PyWin32 ] In a "real" situation, the ParseTuple format would be "U" and the object would be type-tested for PyStringType or PyUnicodeType. Note that GiveMeUnicode() would also do a type-test, but it can't produce a *specific* error like ParseTuple (e.g. "string/unicode object expected" vs "parameter 3 must be a string/unicode object") Are there more options? Anybody? All three of these avoid the secondary buffer. The last is cleanest w.r.t. to keeping the existing "loaned references" behavior, but can get a bit wordy when you need to convert a bunch of string arguments. Option (2) adds a good amount of complexity to PyArg_ParseTuple -- it would need to keep a "free list" in case an error occurred. Option (1) adds DECREF logic to callers to ensure they clean up. The add'l logic isn't much more than the other two options (the only change is adding DECREFs before returning NULL from the "if (!PyArg_ParseTuple..." condition). Note that the caller would probably need to initialize each object to NULL before calling ParseTuple. Personally, I prefer (3) as it makes it very clear that a new object has been created and must be DECREF'd at some point. Also note that GiveMeUnicode() could also accept a second argument for the type of decoding to do (or NULL meaning "UTF-8"). Oh: note there are equivalents of all options for going from unicode-to-string; the above is all about string-to-unicode. However, the tricky part of unicode-to-string is determining whether backwards compatibility will be a requirement. i.e. does existing code that uses the "t" format suddenly achieve the capability to accept a Unicode object? This obviously causes problems in all three options: since a new reference must be created to handle the situation, then who DECREF's it? The old code certainly doesn't. [ <IMO> I'm with Fredrik in saying "no, old code *doesn't* suddenly get the ability to accept a Unicode object." The Python code must use str() to do the encoding manually (until the old code is upgraded to one of the above three options). </IMO> ] I think that's it for me. In the several years I've been thinking on this problem, I haven't come up with anything but the above three. There may be a whole new paradigm for argument parsing, but I haven't tried to think on that one (and just fit in around ParseTuple). Cheers, -g -- Greg Stein, http://www.lyra.org/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4