I was looking into the string interning code to see how it works. Here are my observations: A Python string can be in one of three states: 1. Not interned: s->ob_sinterned == NULL 2. Directly-interned: s->ob_sinterned == s 3. Indirectly interned. ob_sinterned points to another string: s->ob_sinterned != s && s->ob_sinterned != NULL Indirectly interned strings are quite rare. Creating one requires that a string already exist in the interned dictionary and that an equal string with multiple references be interned. The reference used for internining is replaced with the previously interned string. The other references to the same string will become indirectly interned. References to the ob_sinterned field are found in in stringobject.c and dictobject.c and, inexplicably, in Mac/Python/macimport.c In stringobject.c most references to ob_sinterned are to initialize it. The only place that uses it is string_hash: if ob_sinterned is not NULL it uses the hash of the string it points to instead of the current string object. If the string is directly interned this is just a longer way of doing the same thing. If the string is indirectly interned this is merely redundant because the hash of the two strings should be equal (I hope so!). The only thing this test could have saved is recalculating the hash from the string if the cached hash is zero. This doesn't happen because if the string is indirectly interned it has been used as a key during interning which initializes its cached hash. In dictobject.c the only reference to ob_sinterned is in PyDict_SetItem: if the string is interned it uses the ob_sinterned pointer as the key instead of the argument. This could only make a difference if the string is indirectly interned. It turns out that this never happens. I couldn't find one occurence of SetItem with an indirectly interned string as key in the regression tests or any other Python code I have tested. Even if this did happen it wouldn't cause any problems to ignore this case - the lookup would still function correctly. Summary: As far as I can tell, indirectly interned strings are redundant. Without them the ob_sinterned field is effectively a boolean flag. The size of all string objects can be reduced by 3 bytes. Can anyone explain why interning is implemented the way it is? Can anyone explain why Mac/Python/macimport.c is messing with ob_sinterned? Oren
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4