I tried to find any research on this subject, but I couldn't find any, so I'll be daring and vulnerable and just try it out to see what your thoughts are. I single stepped a simple loop in Python to see where the efficiency bottlenecks are. I was impressed by the optimizations already in there, but I still dare to suggest an optimization that from my estimates might shave off a few cycles, speeding up Python about 5%. The idea is simple: change the byte code argument values from two bytes to one. Implications are: - code changes are relatively simple, see below - fewer memory reads, which are becoming more and more expensive - saves three instructions for every opcode with args (i.e. most of them) Code changes are, as far as I could find: compile.c: assemble_emit must produce extended opcodes for all cases of more than 8 bits instead of 16 ceval.c: NEXTARG and PEEKARG need adjustment EXTENDED_ARG needs adjustment (this will be a four byte instruction, which is ugly, I agree) peephole.c: GETARG, SETARG, need adjustment also GETJUMPTGT, CODESIZE routine tuple_of_constants, fold_binops_on_constants, PyCode_Optimize are dependent on instruction length, which will be 2 instead of 3 (search for the digit 3 will find all cases, as far as I checked) you probably will have to write a macro for codestr[i+3] there is a check for code length >32700, but I think this one might stay, maybe if a few extra checks are added. dis: minor adjustments Estimation of speed impact: about 80% of the instructions seem to have an argument, and I never saw an opcode >255 while looking at bytecode, so they are probably not frequent. The NEXTARG macro expands on my Macbook to: mov -408(%ebp),%edx (next_instr) movzbl 2(%edx),%eax (*second byte) shl $0x8,%eax (*shift) movzbl 1(%edx),%edx (first byte) add %edx,%eax (*combine) and the starred instructions will vanish. The main loop is approximately 40 instructions, so a saving of three instructions is significant. I don't dare to claim 3/40 = 7.5% savings, but I think 5% may be realistic. Did anyone try this already? If not, I might take up the gauntlet and try it myself, but I never did this before... - Jurjen PS I also saw that some scratch variables, mainly v and x, are carefull stored back in memory by the compiler and the end of the big interpreter loop, while their value isn't used anymore, of course. A few carefully placed braces might tell the compiler how useless this is and save another few percent.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4