[Tim responded] >> >> total 131426612 chars and 514216 lines >You average over 255 chars/line? Really? What kind of file are you >reading? I don't really want to measure the speed of line-at-a-time >input on binary files where "line" doesn't actually make sense <0.6 wink>. Real-life input, my boy! It's actually a syslog from my mailserver, consisting mainly of sendmail log messages, and I have a current need to process these things (MS Exchange, corrupted database, clobbered backup tapes), so this thread came along at the right time... >Guido pointed out that his readlines_sizehint test forced use of a 1Mb >buffer (in the call, not only the default value). For whatever >reason, that was significantly slower than using an 8Kb sizehint on my >box. Removing the buffer size arg in the call to readlines_sizehint results in this (using up-to-the-minute CVS): total 131426612 chars and 514216 lines count_chars_lines 4.922 4.916 readlines_sizehint 3.881 3.850 using_fileinput 10.371 10.366 while_readline 10.943 10.916 for_xreadlines 2.990 2.967 and with an 8Kb sizehint: total 131426612 chars and 514216 lines count_chars_lines 5.241 5.216 readlines_sizehint 2.917 2.900 using_fileinput 10.351 10.333 while_readline 10.990 10.983 for_xreadlines 2.877 2.867 >Another oddity is that while_readline is slower than using_fileinput >for you. From that I take it Python config does *not* #define > > HAVE_GETC_UNLOCKED > >on your platform. If that's true Nope, HAVE_GETC_UNLOCKED is indeed #define'd >(or esp. if it's not!), would you do me a >favor? Recompile fileobject.c with > > USE_MS_GETLINE_HACK > >#define'd, try the timing test again (while_readline is the most >interesting test for this), and run the test_bufio.py std test to make >sure you're actually getting the right answers. Sure: With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #define'd (although defining the former makes the latter def irrelevant): (test_bufio also OK) total 131426612 chars and 514216 lines count_chars_lines 5.056 5.050 readlines_sizehint 3.771 3.667 using_fileinput 11.128 11.116 while_readline 8.287 8.233 for_xreadlines 3.090 3.083 With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #undef'ed (just for completeness): total 131426612 chars and 514216 lines count_chars_lines 4.916 4.900 readlines_sizehint 3.875 3.867 using_fileinput 14.404 14.383 while_readline 322.728 321.837 for_xreadlines 7.113 7.100 So, having HAVE_GETC_UNLOCKED #define'd does make a small improvement <grin> -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4