On Thu, Jan 04, 2001 at 09:16:39AM -0500, Guido van Rossum wrote: > [Thomas finds that on FreeBSD, getc() is faster than getc_unlocked().] > Thomas, I really don't understand it. The getc() source code you > showed calls getc_unlocked(). So how can it be faster? The answer > must be somewhere else... Cache line conflicts, the rewriting of the > loop that I did, a compiler bug, the inlining, who knows. Can you > compare the generated assembly code? On other platforms, > getc_unlocked() typically speeds the readline() test case up by a > significant factor (as in your BSDI numbers, where it's almost 3x > faster). Nono, reread my message, and your code. getc() isn't faster than getc_unlocked(). getc() is faster than flockfile(f) + getc_unlocked(f) (+ the rearranging of the function, use of PyTHREAD_ALLOW inside the outer loop, etc.) Significantly so when there is only one thread running (which is still the common case, for most systems, and FreeBSD's libc has easy inside knowledge about) and marginally so when there is at least one other thread. The small advantage in the multi-threaded case can be explained by the rest of the changes. You see, I was comparing a patched tree versus a non-patched tree, not a getc_unlocked() enabled one versus a disabled one, so I was measuring the speed difference of the *patch*, not of the use of getc_unlocked() vs getc(). Here is the speed difference of just the use of getc() vs getc_unlocked() (same tree, hand-edited config.h) in a non-threaded environment: > ./python-getc-disabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.271 0.273 readlines_sizehint 0.149 0.148 using_fileinput 0.898 0.898 while_readline 0.214 0.211 > ./python-getc-enabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.271 0.273 readlines_sizehint 0.148 0.148 using_fileinput 0.898 0.898 while_readline 0.214 0.211 As you see, no significant difference. Here is the difference in a threaded environment (a second thread that does just 'time.sleep(900)'): > ./python-getc-disabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.429 0.422 readlines_sizehint 0.200 0.211 using_fileinput 1.604 1.594 while_readline 0.465 0.461 > ./python-getc-enabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.429 0.430 readlines_sizehint 0.201 0.203 using_fileinput 1.600 1.602 while_readline 0.463 0.461 ... where I have to note that the getc-disabled version's 'using_fileinput' time fluctuates a lot more, mostly upwards, in the threaded environment. (I see it jump to 1.609, 1.617 cputime, every few runs.) Still not a terribly significant difference, but a hint that we, too, can use inside knowledge ;) > Could it be that you're mistaken and that somehow getc_unlocked() is > *not* chosen on FreeBSD? Then I could believe it, the rewritten loop > is so different that the optimizer might have done something different > to it. (Check config.h. When all else fails, I put an #error in the > #ifdef branch that I expect not to be taken.) Yah, #error is great for debugging, I use it a lot ;) But I'm sure of this. FreeBSD's getc() is just craftily optimized. Note that if we can get get_line using getc_unlocked() to run as fast as get_line using getc() on FreeBSD, it should also benifit other platforms, because the only speed to be had is in our own code :) Not that I'm saying it can be improved, just that it apparently got slower, because of this patch. I can't be much help doing any performance tuning, though, I've about used up my lunchhour and I'm working late tonight ;P Good-thing-my-boss-can't-tell-the-difference-between-Apache-and-Python-src-ly y'rs, -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4