On 9/15/2011 11:50 AM, "Martin v. Löwis" wrote: > To comply with the C aliasing rules, the structures would look like this: > > typedef struct { > PyObject_HEAD > Py_ssize_t length; > union { > void *any; > Py_UCS1 *latin1; > Py_UCS2 *ucs2; > Py_UCS4 *ucs4; > } data; > Py_hash_t hash; > int state; /* may include SSTATE_SHORT_ASCII flag */ > wchar_t *wstr; > } PyASCIIObject; > > > typedef struct { > PyASCIIObject _base; > Py_ssize_t utf8_length; > char *utf8; > Py_ssize_t wstr_length; > } PyUnicodeObject; > > Code that directly accesses the structures would become more > complex; code that use the accessor macros wouldn't notice. ... > What do you think? That nearly all code outside CPython itself should treat the unicode types, especially, as opaque types and only access instances through functions and macros -- the 'public' interfaces. We need to be free to fiddle with internal implementation details as experience suggests changes. > P.S. There are similar reductions that could be applied > to the wstr_length in general: on 32-bit wchar_t systems, > it could be always dropped, on a 16-bit wchar_t system, > it could be dropped for UCS-2 strings. However, I'm not > proposing these, as I think the increase in complexity > is not worth the savings. I would certainly do just the one change now and see how it goes. I think you should be free to do more like the above if you change your mind with experience. -- Terry Jan Reedy
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4