> > I see. u# could be made work for Unicode objects alone, but it would > > have to reject string objects. > > Martin, I don't agree here: string objects could hold binary UCS-2/UCS-4 > data. They could. Most likely, they don't. Explicit is better then implicit: Anybody wishing to pass UCS-2 binary data to a function expecting character strings should do function(unicode(data, "UCS-2BE")) # or LE if appropriate > es# has logic in place which allows either copying the raw data > to a buffer you provide or have it allocate a buffer of the > right size for you. That's why I proposed to extend it support > Unicode raw data as well. Even though es# is cleanly defined, it is still undesirable to use, IMO: it requires more copies of data than necessary. If explicit memory management is required, it should be exposed through Py_DECREF. That is easy to understand, and it allows to share immutable objects, thus avoiding copies. > > PyObject *Py_UnicodeOrString(PyObject *o, void *ignored)){ > > if (PyUnicode_Check(o)){ > > Py_INCREF(o);return o; > > } > > if (PyString_Check(o)){ > > return PyUnicode_FromObject(o); > > } > > PyErr_SetString(PyExc_TypeError,"unicode object expecpected"); > > return NULL; > > } > > Martin, note that PyUnicode_FromObject() already does the Unicode > pass-through (even more: it makes sure that you get a true Unicode > object, not a subclass). I noticed. However, I'd like Py_UnicodeOrString to fail if you are not passing a character string (and I'd see no problem in accepting Unicode subtypes without copying them). This is a minor point, though - I might have written PyObject *Py_UnicodeOrString(PyObject *p, void* ignored){ return PyObject_FromObject(o); } as well. > Jack wants to get string and Unicode objects converted to Unicode > automagically and then receive a pointer to a Py_UNICODE buffer and > a size. > > The current solution for this is to use the "O" parser, > fetch the object, pass it through PyUnicode_FromObject(), then > use PyUnicode_GET_SIZE() and PyUnicode_AS_UNICODE() to access > the Py_UNICODE buffer and finally to Py_DECREF() the object returned > by PyUnicode_FromObject(). That is the solution, although I would claim that using the O& parser is simpler, and more flexible. > What I proposed was to extend the "es#" parser marker with a new > modifier: "eu#" which does all of the above except that it either > copies the Py_UNICODE data to a buffer you provide or a newly > allocated buffer which you then have to PyMem_Free() after usage. > > How does this sound ? Terrible. It copies a Unicode object without any need. It also adds to the inflation of format specifiers for getargs; this inflation is terrible in itself. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4