Guido van Rossum wrote: > > > Since we're at it, it's worth mentioning another conclusion we came across > > at the time: the cache effects in the main loop are significant -- it is > > more important to try keeping at best the main loop small enough, so that > > those effects are minimized. > > Yes, that's what Tim keeps hammering on too. +1 from here ;-) I have done quite a bit of testing with the old 1.5 ceval loop (patch still available in case someone wants to look at it) and found these things: 1. It pays off to special case LOAD_FAST by handling it before going into the switch at all, since this is the one most often used opcode in Python. 2. Reordering the opcodes has some noticable effect on performance too, even though it is very touchy to cache lines. 3. Splitting the big switch in two with the first one holding the most of often used opcodes while the second one takes care of the more esoteric ones helps keeping the inner loop in the CPU cache and thus increases performance. From an old mail: """ Here is a little statistic that I've run over a total of more than 100M opcodes: Opcode frequencies: ------------------------------------------------------------------------ LOAD_FAST(124) : 19323126 ================================ SET_LINENO(127) : 15055591 ======================== LOAD_CONST(100) : 9254683 =============== LOAD_NAME(101) : 8218954 ============= LOAD_GLOBAL(116) : 7174876 =========== STORE_FAST(125) : 5927769 ========= POP_TOP( 1) : 5587424 ========= CALL_FUNCTION(131) : 5404709 ======== JUMP_IF_FALSE(111) : 5289262 ======== COMPARE_OP(106) : 4495179 ======= LOAD_ATTR(105) : 3481878 ===== BINARY_ADD( 23) : 3420811 ===== RETURN_VALUE( 83) : 2221212 === STORE_NAME( 90) : 2176228 === STORE_ATTR( 95) : 2085338 === BINARY_SUBSCR( 25) : 1834612 === JUMP_ABSOLUTE(113) : 1648327 == STORE_SUBSCR( 60) : 1446307 == JUMP_FORWARD(110) : 1014821 = BINARY_SUBTRACT( 24) : 910085 = POP_BLOCK( 87) : 806160 = STORE_GLOBAL( 97) : 779880 = FOR_LOOP(114) : 735245 = SETUP_LOOP(120) : 657432 = BINARY_MODULO( 22) : 610121 = 32( 32) : 530811 31( 31) : 530657 BINARY_MULTIPLY( 20) : 392274 SETUP_EXCEPT(121) : 285523 ... """ These stats were created using an instrumented Python interpreter running real applications (and recording all executed opcodes to a file), so they can be considered close enough to what Python typically does in its inner loop. > > An experiment I did at the time which gave some delta-speedup: > > I folded most of the UNOP & BINOP opcodes since they differ only by the > > functon they call and they duplicate most of the opcode body. Like this: > [...] > > This reduced the code size of ceval.c, which resulted in less cache effects > > and gave more speed, despite the additional jumps. It possibly results in > > less page-faults too, although this is unlikely. > > I expect this is wholly attributable to the reduced code size. Most > binary operators aren't used frequently enough to make a difference in > other ways. If you put the common code at the end of the code for > binary '+', that would optimize the most common operator. > > > Which makes me think that, if we want to do something about cache effects, > > it is probably not a bad idea to just "reorder" the bytecodes in the big > > switch by decreasing frequency (we have some stats about this -- I believe > > Skip and MAL have discussed the opcodes' frequency and the charts lie > > somewhere in the archives). I remember Marc-Andre had done something in > > this direction and reported some perf improvements too. Since reordering > > the opcodes doesn't really hurt, if I'm about to do something with the > > main loop, it'll be only this. > > Go for it -- sounds good! If you need help, I can dig up those old tools and patches... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4