Tim Peters wrote: > > [Vladimir] > > Currently, dictionaries always grow until they are deallocated from > > memory. > > It's more accurate to say they never shrink <0.9 wink>. Even that has > exceptions, though, starting with: > > > This happens in PyDict_SetItem according to the following > > code (before inserting the new item into the dict): > > > > /* if fill >= 2/3 size, double in size */ > > if (mp->ma_fill*3 >= mp->ma_size*2) { > > if (dictresize(mp, mp->ma_used*2) != 0) { > > if (mp->ma_fill+1 > mp->ma_size) > > return -1; > > } > > } > > This code can shrink the dict too. The load factor computation is based on > "fill", but the resize is based on "used". If you grow a huge dict, then > delete all the entries one by one, "used" falls to 0 but "fill" stays at its > high-water mark. At least 1/3rd of the entries are NULL, so "fill" > continues to climb as keys are added again: when the load factor > computation triggers again, "used" may be as small as 1, and dictresize can > shrink the dict dramatically. Thanks for clarifying this! > [snip] > > > ... > > Any other ideas on how to deal with this? Thoughts, comments? > > Just that slowing the expected case to prevent theoretical bad cases is > usually a net loss -- I think the onus is on you to demonstrate that this > change is an exception to that rule. I won't, because this case is rare in practice, classifying it already as an exception. A real exception. I'll have to think a bit more about all this. Adding 1/3 new entries to trigger the next resize sounds suboptimal (if it happens at all). > I do recall one real-life complaint > about it on c.l.py a couple years ago: the poster had a huge dict, > eventually deleted most of the items, and then kept it around purely for > lookups. They were happy enough to copy the dict into a fresh one a > key+value pair at a time; today they could just do > > d = d.copy() > > or even > > d.update({}) > > to shrink the dict. > > It would certainly be good to document these tricks! I think that officializing these tricks in the documentation is a bad idea. > > if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to- > see-why-1999-is-special-ly y'rs - tim > This is a good (your favorite ;-) argument, but don't forget that you've been around, teaching people various tricks. And 1999 is special -- we just had a solar eclipse today, the next being scheduled for 2081. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4