On Sat, May 25, 2002 at 08:17:47AM -0400, Guido van Rossum wrote: > Two observations: > > - Your benchmarks use an almost empty dict (for locals and globals at > least). I'd like to see how they perform with a realistic number of > other names in the dict True, the fastest code path (the macro) only works as long as the entry is in the first hash position. For the tests in my posting this is 100% of the cases. For real code the results are around 75%. To enable the display of dictionary statistics on exit compile with -dSHOW_DICTIONARY_STATS. One problem is that 75% is still not good enough - the fallback code path is significantly slower (although still faster than PyDict_GetItem). Another problem is that if you get a hash collision for a global symbol used inside a tight loop the hit rate can drop closer to 0%. One trick that may help is to shuffle the hash entries - for every 100th time the macro fails the entry will be moved up to the first hash position and the entry which previously occupied that position will be moved to the first empty hash position for its own hash chain. Statistically, this will ensure that the most commonly referenced names will tend to stay at the first hash position. I think it may improve the hit rate from 75% to 85% or higher and eliminate the worst-case scenario. > - I'm worried that negative dict entries could fill the dict beyond > its capacity. For each dictionary, the number of negative entries is bound by the number of global and builtin names in the co_names of all code objects that get executed with that dictionary as their namespace. For general-purpose dictionaries, of course, there is no such upper bound and therefore this optimization cannot be used in PyDict_GetItem. Oren
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4