> with a.py having: > def asum(R): > sum([ x*x for x in R ]) > > def gen(R): > for x in R: yield x*x > def gsum(R, gen=gen): > sum(gen(R)) > > I measure: > > [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(100)' 'a.asum(R)' > 10000 loops, best of 3: 96 usec per loop > [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(100)' 'a.gsum(R)' > 10000 loops, best of 3: 60 usec per loop > [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(1000)' 'a.asum(R)' > 1000 loops, best of 3: 930 usec per loop > [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(1000)' 'a.gsum(R)' > 1000 loops, best of 3: 590 usec per loop > [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(10000)' 'a.asum(R)' > 100 loops, best of 3: 1.28e+04 usec per loop > [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(10000)' 'a.gsum(R)' > 100 loops, best of 3: 8.4e+03 usec per loop > > not sure why gsum's advantage ratio over asum seems to be roughly > constant, but, this IS what I measure!-) Great! This is a plus for iterator comprehensions (we need a better term BTW). I guess that building up a list using repeated append() calls slows things down more than the frame switching used by generator functions; I knew the latter was fast but this is a pleasant result. BTW, if I use a different function that calculates list() instead of sum(), the generator version is a few percent slower than the list comprehension. But that's because list(a) has a shortcut in case a is a list, while sum(a) always uses PyIter_Next(). So this is actually consistent: despite the huge win of the shortcut, the generator version is barely slower. I think the answer lies in the bytecode: >>> def lc(a): return [x for x in a] >>> import dis >>> dis.dis(lc) 2 0 BUILD_LIST 0 3 DUP_TOP 4 LOAD_ATTR 0 (append) 7 STORE_FAST 1 (_[1]) 10 LOAD_FAST 0 (a) 13 GET_ITER >> 14 FOR_ITER 16 (to 33) 17 STORE_FAST 2 (x) 20 LOAD_FAST 1 (_[1]) 23 LOAD_FAST 2 (x) 26 CALL_FUNCTION 1 29 POP_TOP 30 JUMP_ABSOLUTE 14 >> 33 DELETE_FAST 1 (_[1]) 36 RETURN_VALUE 37 LOAD_CONST 0 (None) 40 RETURN_VALUE >>> def gen(a): for x in a: yield x >>> dis.dis(gen) 2 0 SETUP_LOOP 18 (to 21) 3 LOAD_FAST 0 (a) 6 GET_ITER >> 7 FOR_ITER 10 (to 20) 10 STORE_FAST 1 (x) 13 LOAD_FAST 1 (x) 16 YIELD_VALUE 17 JUMP_ABSOLUTE 7 >> 20 POP_BLOCK >> 21 LOAD_CONST 0 (None) 24 RETURN_VALUE >>> The list comprehension executes 7 bytecodes per iteration; the generator version only 5 (this could be more of course if the expression was more complicated than 'x'). The YIELD_VALUE does very little work; falling out of the frame is like falling off a log; and gen_iternext() is pretty sparse code too. On the list comprehension side, calling the list's append method has a bunch of overhead. (Some of which could be avoided if we had a special-purpose opcode which called PyList_Append().) But the executive summary remains: the generator wins because it doesn't have to materialize the whole list. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4