Behind the scenes, Damien Morton has been playing with radically different designs for "small" dicts. This reminded me that, in previous lives, I always used chaining ("linked lists") for collision resolution in hash tables. I don't have time to pursue this now, but it would make a nice experiment for someone who wants to try a non-trivial but still-simple and interesting project. It may even pay off, but that shouldn't be a motivation <wink>. Design notes: PyDictEntry would grow a new PyDictEntry *next; slot, boosting it from 12 to 16 bytes on a 32-bit box. This wasn't reasonable when Python was designed, but pymalloc allocates 16-byte chunks with virtually no wasted space. PyDictObject would loose the ma_fill and ma_table and ma_smalltable members, and gain a pointer to a variable-size vector of pointers PyDictEntry **first; For a hash table with 2**n slots, this vector holds 2**n pointers, memset to NULL initially. The hash chain for an object with hash code h starts at first[h & ma_mask], and is linked together via PyDictEntry.next pointers. Hash chains per hash code are independent. There's no use for the "dummy" state. There's no *logical* need for the "unused" state either, although a micro-optimizer may want to retain that in some form. Memory use isn't nearly as bad as it may first appear -- to the contrary, it's probably better on average! Assuming a 32-bit box: Current: tables are normally 1/3 to 2/3 full. If there are N active objects in the table, at 1/3 full the table contains 3*N PyDictEntries, and at 2/3 full it contains 1.5*N PyDictEntries, for a total of (multiplying by 12 bytes per PyDictEntry) 18*N to 36*N bytes. Chaining: assuming tables are still 1/3 to 2/3 full. At 1/3 full there are 3*N first pointers and at 2/3 full there are 1.5*N first pointers, for a total of 6*N to 12*N bytes for first pointers. Independent of load factor, 16*N bytes are consumed by the larger PyDictEntry structs. Adding, that's 22*N to 28*N bytes. This relies on pymalloc's tiny wastage when allocating 16-byte chunks (under 1%). The worst case is worse than the current scheme, and the best case is better. The average is probably better. Note that "full" is a misnomer here. A chained table with 2**i slots can actually hold any number of objects, even if i==0; on average, each hash chain contains N/2**i PyDictEntry structs. Note that a small load factor is less important with chained resolution than with open addressing, because collisions at different hash codes can't interfere with each other (IOW, an object in slot #i never slows access to an object in the slot #j collision list, whenever i != j; "breathing room" to ease cross-hashcode collision pressure isn't needed; primary collisions are all that exist). Collision resolution code: Just a list walk. For example, lookdict_string could be, in its entirety: static dictentry * lookdict_string(dictobject *mp, PyObject *key, register long hash) { dictentry *p = mp->first[hash & mp->ma_mask]; if (PyString_CheckExact(key)) { for (; p != NULL; p = p->next) { if (p->me_key == key || (ep->me_hash == hash && PyString_Eq(ep->me_key, key))) return p; } return NULL; } mp->ma_lookup = lookdict; return lookdict(mp, key, hash); } Resizing: Probably much faster. The vector of first pointers can be realloc'ed, and sometimes benefit from the platform malloc extending it in-place. No other memory allocation operation is needed on a resize. Instead about half the PyDictEntry structs will need to move to "the other half" of the table (the structs themselves don't get copied or moved; they just get relinked via their next pointers). Copying: Probably slower, due to needing a PyObject_Malloc() call for each key/value pair. Building a dict up: Probably slower, again due to repeated PyObject_Malloc() calls. Referencing a dict: Probably a wash, although because the code can be so much simpler compilers may do a better job of optimizing it, and no tests are needed to distinguish among three kinds of states. Out-of-cache dicts are killers either way. Also see next point. Optimizations: The cool algorithmic thing about chaining is that self-organizing gimmicks (like swap-toward-front (or move-to-front) on reference) are easy to code and run fast (again, the dictentry structs don't move, you just need to fiddle a few next pointers). When collision chains can collide, dynamic table reorganization is so complicated and expensive that nobody has even thought about trying it in Python. When they can't collide, it's simple. Note too that since the memory burden per unused slot falls from 12 to 4 bytes, sparser tables are less painful to contemplate. Small dicts: There's no gimmick here to favor them.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4