[Tim Delaney] >> What is the effect on peak memory usage over "average" programs? [Raymond Hettinger] > Since the amortized growth is the same, the effect is Nil on average. > Special cases can be contrived with a specific number of records > where the memory use is doubled, but in general it is nearly unchanged > for average programs. That doesn't make sense. Dicts can be larger after the patch, but never smaller, so there's nothing opposing the "can be larger" part: on average, allocated address space must be strictly larger than before. Whether that *matters* on average to the average user is something we can answer rigorously just as soon as we find an average user with an average program <wink>. I'm not inclined to worry much about it. >> This might be a worthwhile speedup on small dicts (up to a TBD >> number of entries) but not worthwhile for large dicts. > Actually, it helps large dictionaries even more that small dictionaries. > Collisions in large dicts are resolved through other memory probes > which are almost certain not to be in the current cache line. That part makes sense. Resizing a large dict is an expensive operation too. >> However, to add this capability in would of course add more code >> to a very common code path (additional test for current size to >> decide what factor to increase by). > Your intuition is exactly correct. All experiments to special case > various sizes results in decreased performance because it added > a tiny amount to some of the most heavily exercised code in > python. This part isn't clear: the changed code is in the body of an if() block that normally *isn't* entered (over an ever-growing dict's life, it's entered O(log(len(dict))) times, and independent of the number of dict lookups). The change cuts the number of times it's entered by approximately a factor of 2, but it isn't entered often even now. > Further, it results in an unpredicable branch which is > also not a good thing. Since the body of the loop isn't entered often, unpredictable one-shot branches within the body shouldn't have a measurable effect. The unpredictable branches when physically resizing the dict will swamp them regardless. The surrounding if-test continues to be predictable in the "branch taken" direction. What could be much worse is that stuffing code into the if-block bloats the code so much as to frustrate lookahead I-stream caching of the normal "branch taken and return 0" path: if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) { if (dictresize(mp, mp->ma_used*2) != 0) return -1; } return 0; Rewriting as if (mp->ma_used <= n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2) return 0; return dictresize(mp, mp->ma_used*2) ? -1 : 0; would help some compilers generate better code for the expected path, and especially if the blob after "return 0;" got hairier. IOW, if fiddling with different growth factors at different sizes slowed things down, we have to look for something that affected the *normal* paths; it's hard to imagine that the the guts of the if-block execute often enough to matter (discounting its call to dictresize(), which is an expensive routine). > I found out that timing dict performance was hard. > Capturing memory usage was harder. Checking entry > space,space plus unused space, calls to PyMalloc, and > calls to the os malloc, only the last is important, but > it depends on all kinds of things that are not easily > controlled. In my early Cray days, the Cray boxes were batch one-job-at-a-time, and all memory was real. If you had a CPU-bound program, it took the same number of nanoseconds each time you ran it. Benchmarking was hard then too <0.5 wink>.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4