Currently, dictionaries always grow until they are deallocated from memory. This happens in PyDict_SetItem according to the following code (before inserting the new item into the dict): /* if fill >= 2/3 size, double in size */ if (mp->ma_fill*3 >= mp->ma_size*2) { if (dictresize(mp, mp->ma_used*2) != 0) { if (mp->ma_fill+1 > mp->ma_size) return -1; } } The symmetric case is missing and this has intrigued me for a long time, but I've never had the courage to look deeply into this portion of code and try to propose a solution. Which is: reduce the size of the dict by half when the nb of used items <= 1/6 the size. This situation occurs far less frequently than dict growing, but anyways, it seems useful for the degenerate cases where a dict has a peek usage, then most of the items are deleted. This is usually the case for global dicts holding dynamic object collections, etc. A bonus effect of shrinking big dicts with deleted items is that the lookup speed may be improved, because of the cleaned <dummy> entries and the reduced overall size (resulting in a better hit ratio). The (only) solution I could came with for this pb is the appended patch. It is not immediately obvious, but in practice, it seems to work fine. (inserting a print statement after the condition, showing the dict size and current usage helps in monitoring what's going on). Any other ideas on how to deal with this? Thoughts, comments? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 -------------------------------[ cut here ]--------------------------- *** dictobject.c-1.5.2 Fri Aug 6 18:51:02 1999 --- dictobject.c Tue Aug 10 12:21:15 1999 *************** *** 417,423 **** ep->me_value = NULL; mp->ma_used--; Py_DECREF(old_value); ! Py_DECREF(old_key); return 0; } --- 417,430 ---- ep->me_value = NULL; mp->ma_used--; Py_DECREF(old_value); ! Py_DECREF(old_key); ! /* For bigger dictionaries, if used <= 1/6 size, half the size */ ! if (mp->ma_size > MINSIZE*4 && mp->ma_used*6 <= mp->ma_size) { ! if (dictresize(mp, mp->ma_used*2) != 0) { ! if (mp->ma_fill > mp->ma_size) ! return -1; ! } ! } return 0; }
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4