> > Now that we have new bytecode optimizations, the pyc file magic > > number needs to be changed. We have several options: 1. change the magic number to accomodate NOP. 2. install an additional step that eliminates the NOPs from the bytecode (they are not strictly necessary). this will make the code even shorter and faster without a need to change the magic number. i've got this in my hip pocket if we decide that this is the way to go. the generated code is beautiful. 3. eliminate the last two optimizations which were the only ones that needed a NOP: a) compare_op (is, in,is not, not in) unary_not --> compare_op(is not, not in, is, in) nop b) unary_not jump_if_false (tgt) --> nop jump_if_true (tgt) > I wonder what > the wisdom is of adding more code complexity. Part of the benefit is that there will no longer be any need to re-arrange branches and conditionals in order to avoid 'not'. As of now, it has near-zero cost in most situations (except when used with and/or). > We're still holding off on Ping and Aahz's changes (see the > cache-attr-branch) and Thomas and Brett's CALL_ATTR optimizations, for > similar reasons (inconclusive evidence of speedups in real programs). > > What makes Raymond's changes different? * They are thoroughly tested. * They are decoupled from the surrounding code and will survive changes to ceval.c and newcompile.c. * They provide some benefits without hurting anything else. * They provide a framework for others to build upon. The scanning loop and basic block tester make it a piece of cake to add/change/remove new code transformations. CALL_ATTR ought to go in when it is ready. It certainly provides measurable speed-up in the targeted behavior. It just needs more polish so that it doesn't slow down other pathways. The benefit is real, but in real programs it is being offset by reduced performance in non-targeted behavior. With some more work, it ought to be a real gem. Unfortunately, it is tightly coupled to the implementation of new and old-style class. Still, it looks like a winner. What we're seeing is a consequence of Amdahl's law and Python's broad scope. Instead of a single hotspot, Python exercises many different types of code and each needs to be optimized separately. People have taken on many of these and collectively they are having a great effect. The proposals by Ping, Aahz, Brett, and Thomas are import steps to address untouched areas. I took on the task of making sure that the basic pure python code slithers along quickly. The basics like "while", "for", "if", "not" have all been improved. Lowering the cost of those constructs will result in less effort towards by-passing them with vectorized code (map, etc). Code in something like sets.py won't show much benefit because so much effort had been directed at using filter, map, dict.update, and other high volume c-coded functions and methods. Any one person's optimizations will likely help by a few percent at most. But, taken together, they will be a big win. > I also wonder why this is done unconditionally, rather than only with > -O. Neal, Brett, and I had discussed this a bit and I came to the conclusion that these code transformations are like the ones already built into the compiler -- they have some benefit, but cost almost nothing (two passes over the code string at compile time). The -O option makes sense for optimizations that have a high time overhead, throw-away debugging information, change semantics, or reduce feature access. IOW, -O is for when you're trading something away in return for a bit of speed in production code. There is essentially no benefit to not using the optimized bytecode. Raymond Hettinger
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4