Am 16.09.11 00:42, schrieb Nick Coghlan: > On Fri, Sep 16, 2011 at 7:39 AM, "Martin v. Löwis > <martin at v.loewis.de> wrote: >> Thinking about this, the following may work: >> >> - ASCIIObject: state, length, hash, wstr*, data follow >> >> - SingleBlockUnicode: ASCIIObject, wstr_len, utf8*, utf8_len, data >> follow >> >> - UnicodeObject: SingleBlockUnicode, data pointer, no data follow >> >> This is essentially your proposal, except that the wstr_len is >> dropped for ASCII strings, and that it uses nested structs. >> >> The single-block variants would always be "ready", the full unicode >> object is ready only if the data pointer is set. > > In your "UnicodeObject" here, is the 'data pointer' the > any/latin1/ucs2/ucs4 union from the original structure definition? Yes, it is. I'm considering dropping the union again, since you'll have to cast the data pointer anyway in the compact cases. > Also, what are the constraints on the "SingleBlockUnicode"? Does it > only hold strings that can be represented in latin1? Or can the size > of the individual elements be more than 1 byte? Any size - what matters is whether the maximum character is known at creation time (i.e. whether you've used PyUnicode_New(size, maxchar) or PyUnicode_FromUnicode(NULL, size)). In the latter case, a Py_UNICODE block will be allocated in wstr, and the data pointer left NULL. Then, when PyUnicode_Ready is called, the maxmimum character is determined in the Py_UNICODE block, and a new data block allocated - but that will have to be a second memory block (the Py_UNICODE block is then dropped in _Ready). Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4