Tim Peters <tim.one@comcast.net> writes: > [Michael Hudson] > > ... > > This makes no sense; after you've commented out the trace stuff, the > > only difference left is that the switch is smaller! > > When things like this don't make sense, it just means we're naive <wink>. > The eval loop overwhelms most optimizers via a crushing overload of "too > many" variables and "too many" basic blocks connected via a complex > topology, and compiler optimization phases are in the business of using > (mostly) linear-time heuristics to solve exponential-time optimization > problems. IOW, the performance of the eval loop is as touchy as a > heterosexual sailor coming off 2 years at sea, and there's no predicting > what minor changes will do to speed. This has been observed repeatedly by > everyone who has tried to speed it, across many platforms, and across a > decade of staring at it: the eval loop is in unstable equilibrium on its > best days. I knew all this, but was still surprised by the magnitude of the slowdown. > In the limit, the eval loop "should be" a little slower now under -O, just > because we've added another test + taken-branch to the normal path. From > that POV, your > > > FWIW gcc makes my patch a small win even with -O. > > is as much "a mystery" as why MSVC 6 hates it. No kidding. I wonder if some of the slow comes from repeatedly hauling the threadstate into the cache. I guess wonderings like this are almost exactly valueless. > > Actually, there are some other changes, like always updating f->f_lasti, > > and allocating 8 more bytes on the stack. Does commenting out the > > definition of instr_lb & instr_ub make any difference? > > I'll try that on Tuesday, but don't hold your breath. It could be that I > can get back all the loss by declaring tstate volatile -- or doing any other > random thing <wink>. > > > ... > > Does reading assembly give any clues? Not that I'd really expect > > anyone to read all of the main loop... > > I will if it's important, but a good HW simulator is a better tool for this > kind of thing, and in any case I doubt I can make enough time to do what > would be needed to address this for real. On linux there's cachegrind which comes with valgrind and might prove helpful. But that only runs on linux, and I'm not sure I want to explain the linux mystery, as it might go away :) > > I'm baffled. > > Join the club -- we've held this invitation open for you for years <wink>. Attempting PhD in mathematics is providing enough bafflement for this schmuck, but thanks for the offer. > > Perhaps you can put SET_LINENO back in for the Windows build > > <1e-6 wink>. > > If it's an unfortunate I-cache conflict among heavily-hit code addresses > (something a good HW simulator can tell you), that could actually solve it! > Then anything that manages to move one of the colliding code chunks to a > different address could yield "a mysterious speedup". These mysteries are > only irritating when they work against you <wink>. Well, quite. Lets send Julian Seward an email asking him if he wants to port valgrind to Windows <wink>. Cheers, M. -- surely, somewhere, somehow, in the history of computing, at least one manual has been written that you could at least remotely attempt to consider possibly glancing at. -- Adam Rixey
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4