Oren Tirosh <oren-py-d@hishome.net> writes: > In stringobject.c most references to ob_sinterned are to initialize it. The > only place that uses it is string_hash: if ob_sinterned is not NULL it uses > the hash of the string it points to instead of the current string object. This is not true: PyString_InternInPlace has if ((t = s->ob_sinterned) != NULL) { which checks whether the string being interned had been interned before. > Summary: As far as I can tell, indirectly interned strings are redundant. > Without them the ob_sinterned field is effectively a boolean flag. > > Can anyone explain why interning is implemented the way it is? Can anyone > explain why Mac/Python/macimport.c is messing with ob_sinterned? I'm not sure what meaning you would assiocate with the boolean flag. If this is meant to denote "this is an interned string", then if ((t = s->ob_sinterned) != NULL) { if (t == (PyObject *)s) return; would become if (s->ob_isinterned) return; To see the difference, I added if ((t = s->ob_sinterned) != NULL) { if (t == (PyObject *)s) return; fprintf(stderr, "reinterning\n"); If that code prints "reinterning", it can efficiently intern the argument, but couldn't with your change. I agree that this is very rare, but in the test suite, it triggers 5 times in test_descr. > The size of all string objects can be reduced by 3 bytes. That is not true. Taking a 32-bit architecture, and considering that each string has 16 bytes minimum storage (without ob_sinterned), and taking into account the 8-byte clustering of pymalloc, we get stringsize current-storage new-storage savings 0 24 24 0 1 24 24 0 2 24 24 0 3 24 24 0 4 32 24 8 5 32 24 8 6 32 24 8 7 32 32 0 So the size reduction depends on the actual length of the strings; it's 3 bytes only on average, assuming a uniform distribution of string sizes. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4