> Since we're at it, it's worth mentioning another conclusion we came across > at the time: the cache effects in the main loop are significant -- it is > more important to try keeping at best the main loop small enough, so that > those effects are minimized. Yes, that's what Tim keeps hammering on too. > An experiment I did at the time which gave some delta-speedup: > I folded most of the UNOP & BINOP opcodes since they differ only by the > functon they call and they duplicate most of the opcode body. Like this: [...] > This reduced the code size of ceval.c, which resulted in less cache effects > and gave more speed, despite the additional jumps. It possibly results in > less page-faults too, although this is unlikely. I expect this is wholly attributable to the reduced code size. Most binary operators aren't used frequently enough to make a difference in other ways. If you put the common code at the end of the code for binary '+', that would optimize the most common operator. > Which makes me think that, if we want to do something about cache effects, > it is probably not a bad idea to just "reorder" the bytecodes in the big > switch by decreasing frequency (we have some stats about this -- I believe > Skip and MAL have discussed the opcodes' frequency and the charts lie > somewhere in the archives). I remember Marc-Andre had done something in > this direction and reported some perf improvements too. Since reordering > the opcodes doesn't really hurt, if I'm about to do something with the > main loop, it'll be only this. Go for it -- sounds good! --Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4