"Martin v. Loewis" wrote: > > > I have a number of MacOSX API's that expect Unicode buffers, passed as > > "long count, UniChar *buffer". > > Well, my first question would be: Are you sure that UniChar has the > same underlying integral type as Py_UNICODE? If not, you lose. > > So you may need to do even more conversion. This should be the first thing to check. Also note that Python has two different flavors of Unicode support: UCS-2 and UCS-4, so you'll have to be careful about this too. > > I have the machinery in bgen to generate code for this, iff "u#" (or > > something else) would work the same as "s#", i.e. it returns you a > > pointer and a size, and it would work equally well for unicode > > objects as for classic strings (after conversion). > > I see. u# could be made work for Unicode objects alone, but it would > have to reject string objects. Martin, I don't agree here: string objects could hold binary UCS-2/UCS-4 data. Jack, u# cannot auto-convert strings to Unicode since this would require allocation of a temporary object and there's no logic there to free that object after usage. es# has logic in place which allows either copying the raw data to a buffer you provide or have it allocate a buffer of the right size for you. That's why I proposed to extend it support Unicode raw data as well. > > But as a general solution it doesn't look right: "How do I call a C > > routine with a string parameter?" "Use the "s" format and you get the > > string pointer to pass". "How do I call a C routine with a unicode string > > parameter?" > > For that, the answer is u. But you want the length also. So for that, > the answer is u#. But your question is "How do I call a C routine with > either a Unicode object or a string object, getting a reasonable > Py_UNICODE* and the length?". > > For that, I'd recommend to use O&, with a conversion function > > PyObject *Py_UnicodeOrString(PyObject *o, void *ignored)){ > if (PyUnicode_Check(o)){ > Py_INCREF(o);return o; > } > if (PyString_Check(o)){ > return PyUnicode_FromObject(o); > } > PyErr_SetString(PyExc_TypeError,"unicode object expecpected"); > return NULL; > } Martin, note that PyUnicode_FromObject() already does the Unicode pass-through (even more: it makes sure that you get a true Unicode object, not a subclass). > > "Use O and PyUnicode_FromObject() and PyUnicode_AsUnicode and > > make sure you get all your decrefs right and.....". > > With the function above, this becomes > > Use O&, passing a PyObject**, the function, and a NULL pointer, using > PyUnicode_AS_UNICODE and PyUnicode_SIZE, performing a single DECREF at > the end [allowing to specify an encoding is optional] > > In this scenario, somebody *has* to deallocate memory, you cannot get > around this. It is your choice whether this is Py_DECREF or PyMem_Free > that you have to call (as with the "esomething" conversions); the > DECREF is more efficient as it will not copy a Unicode object. > > > The "es#" is a very strange beast, and a similar "eu#" would help me a > > little, but it has some serious drawbacks. Aside from it being completely > > different from the other converters (being a prefix operator in stead of a > > postfix one, and having a value-return argument) I would also have to > > pre-allocate the buffer in advance, and that sort of defeats the purpose. > > You don't. If you set the buffer to NULL before invoking getargs, you > have to PyMem_Free it afterwards. Right. Let me see if I can summarize this: Jack wants to get string and Unicode objects converted to Unicode automagically and then receive a pointer to a Py_UNICODE buffer and a size. The current solution for this is to use the "O" parser, fetch the object, pass it through PyUnicode_FromObject(), then use PyUnicode_GET_SIZE() and PyUnicode_AS_UNICODE() to access the Py_UNICODE buffer and finally to Py_DECREF() the object returned by PyUnicode_FromObject(). What I proposed was to extend the "es#" parser marker with a new modifier: "eu#" which does all of the above except that it either copies the Py_UNICODE data to a buffer you provide or a newly allocated buffer which you then have to PyMem_Free() after usage. How does this sound ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4