[Thomas Wouters] > ... > As for speed (which stays a secondary or tertiary consideration > at best) do we really need the xreadlines method to accomplish > that ? Couldn't fileinput get almost the same performance using > readlines() with a sizehint ? There was a long email discussion among Jeff, Paul Prescod, Neel Krishnaswami, and Alex Martelli about this. I started getting copied on it somewhere midstream, but didn't have time to follow it then (like I do now <wink>). About two weeks ago Neel summarized all the approaches then under discussion: """ [Neel Krishnaswami] ... Quick performance summary of the current solutions: Slowest: for line in fileinput.input('foo'): # Time 100 : while 1: line = file.readline() # Time 75 : for line in LinesOf(open('foo')): # Time 25 Fastest: for line in file.readlines(): # Time 10 while 1: lines = file.readlines(hint) # Time 10 for line in xreadlines(file): # Time 10 The difference in speed between the slowest and fastest is about a factor of 10. LinesOf is Alex's Python wrapper class that takes a file and uses readlines() with a size-hint to present a sequence interface. It's around half as fast as the fastest idioms, and 3-4 times faster than while 1:. Jeff's xreadlines is essentially the same thing in C, and is indistinguishable in performance from the other fast idioms. ... """ On his box, line-at-a-time is >7x slower than the fastest Python methods, which latter are usually close (depending on the platform) to Perl line-at-a-time speeds. A factor of 7 is too large for most working programmers to ignore in the interest of somebody else's notion of theoretical purity <wink>. Seriously, speed is not a secondary consideration to me when the gap is this gross, and in an area so visible and common. Alex's LineOf appears a good predictor for how adding fileinput.readlines(hint) would perform, since it appears to *be* that (except off on its own). Then it buys a factor of 3 over line-at-a-time on Neel's box but leaves a factor of 2.5 on the table. The cause of the latter appears mostly to be the overhead of getting a Python method call into the equation for each line returned. Note that Jeff added .xreadlines() as a file object method at Neel's urging. The way he started this is shown on the last line: a function. If we threw out the fileinput and file method aspects, and just added a new module xreadlines with a function xreadlines, then what? I bet it would become as popular as the string module, and for good reason: it's a specific approach that works, to a specific and common problem. > ... > And in the case of simple (x)range()es, I have yet to see a case > where a 'real' list had significantly better performance than > a generator.) It varies by platform, but I don't think I've heard of variations larger than 20% in either direction. 20% is nothing, though; in *this* case we're talking order of magnitude. That's go/nogo territory. > ... > Gelukkig-Nieuwjaar-iedereen-ly y'rs I understand people are passionate when reality clashes with the dream of a wart-free language, but that's no reason to swear at me <wink>. wishing-you-a-happy-new-year-like-a-civilized-man-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4