This PEP introduces generator expressions as a high performance, memory efficient generalization of list comprehensions PEP 202 and generators PEP 255.
RationaleExperience with list comprehensions has shown their widespread utility throughout Python. However, many of the use cases do not need to have a full list created in memory. Instead, they only need to iterate over the elements one at a time.
For instance, the following summation code will build a full list of squares in memory, iterate over those values, and, when the reference is no longer needed, delete the list:
sum([x*x for x in range(10)])
Memory is conserved by using a generator expression instead:
sum(x*x for x in range(10))
Similar benefits are conferred on constructors for container objects:
s = set(word for line in page for word in line.split()) d = dict( (k, func(k)) for k in keylist)
Generator expressions are especially useful with functions like sum(), min(), and max() that reduce an iterable input to a single value:
max(len(line) for line in file if line.strip())
Generator expressions also address some examples of functionals coded with lambda:
reduce(lambda s, a: s + a.myattr, data, 0) reduce(lambda s, a: s + a[3], data, 0)
These simplify to:
sum(a.myattr for a in data) sum(a[3] for a in data)
List comprehensions greatly reduced the need for filter() and map(). Likewise, generator expressions are expected to minimize the need for itertools.ifilter() and itertools.imap(). In contrast, the utility of other itertools will be enhanced by generator expressions:
dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector))
Having a syntax similar to list comprehensions also makes it easy to convert existing code into a generator expression when scaling up application.
Early timings showed that generators had a significant performance advantage over list comprehensions. However, the latter were highly optimized for Py2.4 and now the performance is roughly comparable for small to mid-sized data sets. As the data volumes grow larger, generator expressions tend to perform better because they do not exhaust cache memory and they allow Python to re-use objects between iterations.
BDFL PronouncementsThis PEP is ACCEPTED for Py2.4.
The Details(None of this is exact enough in the eye of a reader from Mars, but I hope the examples convey the intention well enough for a discussion in c.l.py. The Python Reference Manual should contain a 100% exact semantic and syntactic specification.)
g = (x**2 for x in range(10)) print g.next()
is equivalent to:
def __gen(exp): for x in exp: yield x**2 g = __gen(iter(range(10))) print g.next()
Only the outermost for-expression is evaluated immediately, the other expressions are deferred until the generator is run:
g = (tgtexp for var1 in exp1 if exp2 for var2 in exp3 if exp4)
is equivalent to:
def __gen(bound_exp): for var1 in bound_exp: if exp2: for var2 in exp3: if exp4: yield tgtexp g = __gen(iter(exp1)) del __gen
changes to:
atom: '(' [testlist_gexp] ')'
where testlist_gexp is almost the same as listmaker, but only allows a single test after ‘for’ … ‘in’:
testlist_gexp: test ( gen_for | (',' test)* [','] )
This means that you can write:
sum(x**2 for x in range(10))
but you would have to write:
reduce(operator.add, (x**2 for x in range(10)))
and also:
g = (x**2 for x in range(10))
i.e. if a function call has a single positional argument, it can be a generator expression without extra parentheses, but in all other cases you have to parenthesize it.
The exact details were checked in to Grammar/Grammar version 1.49.
For example:
x = "hello" y = list(x for x in "abc") print x # prints "hello", not "c"
[x for x in S] # This is a list comprehension. [(x for x in S)] # This is a list containing one generator # expression.
Unfortunately, there is currently a slight syntactic difference. The expression:
is legal, meaning:
But generator expressions will not allow the former version:
is illegal.
The former list comprehension syntax will become illegal in Python 3.0, and should be deprecated in Python 2.4 and beyond.
List comprehensions also “leak” their loop variable into the surrounding scope. This will also change in Python 3.0, so that the semantic definition of a list comprehension in Python 3.0 will be equivalent to list(<generator expression>). Python 2.4 and beyond should issue a deprecation warning if a list comprehension’s loop variable has the same name as a variable used in the immediately surrounding scope.
After much discussion, it was decided that the first (outermost) for-expression should be evaluated immediately and that the remaining expressions be evaluated when the generator is executed.
Asked to summarize the reasoning for binding the first expression, Guido offered [1]:
Consider sum(x for x in foo()). Now suppose there's a bug in foo() that raises an exception, and a bug in sum() that raises an exception before it starts iterating over its argument. Which exception would you expect to see? I'd be surprised if the one in sum() was raised rather the one in foo(), since the call to foo() is part of the argument to sum(), and I expect arguments to be processed before the function is called. OTOH, in sum(bar(x) for x in foo()), where sum() and foo() are bugfree, but bar() raises an exception, we have no choice but to delay the call to bar() until sum() starts iterating -- that's part of the contract of generators. (They do nothing until their next() method is first called.)
Various use cases were proposed for binding all free variables when the generator is defined. And some proponents felt that the resulting expressions would be easier to understand and debug if bound immediately.
However, Python takes a late binding approach to lambda expressions and has no precedent for automatic, early binding. It was felt that introducing a new paradigm would unnecessarily introduce complexity.
After exploring many possibilities, a consensus emerged that binding issues were hard to understand and that users should be strongly encouraged to use generator expressions inside functions that consume their arguments immediately. For more complex applications, full generator definitions are always superior in terms of being obvious about scope, lifetime, and binding [2].
Reduction FunctionsThe utility of generator expressions is greatly enhanced when combined with reduction functions like sum(), min(), and max(). The heapq module in Python 2.4 includes two new reduction functions: nlargest() and nsmallest(). Both work well with generator expressions and keep no more than n items in memory at one time.
AcknowledgementsThis document has been placed in the public domain.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4