zip()
This PEP describes the ‘lockstep iteration’ proposal. This PEP tracks the status and ownership of this feature, slated for introduction in Python 2.0. It contains a description of the feature and outlines changes necessary to support the feature. This PEP summarizes discussions held in mailing list forums, and provides URLs for further information, where appropriate. The CVS revision history of this file contains the definitive historical record.
MotivationStandard for-loops in Python iterate over every element in a sequence until the sequence is exhausted [1]. However, for-loops iterate over only a single sequence, and it is often desirable to loop over more than one sequence in a lock-step fashion. In other words, in a way such that the i-th iteration through the loop returns an object containing the i-th element from each sequence.
The common idioms used to accomplish this are unintuitive. This PEP proposes a standard way of performing such iterations by introducing a new builtin function called zip
.
While the primary motivation for zip() comes from lock-step iteration, by implementing zip() as a built-in function, it has additional utility in contexts other than for-loops.
Lockstep For-LoopsLockstep for-loops are non-nested iterations over two or more sequences, such that at each pass through the loop, one element from each sequence is taken to compose the target. This behavior can already be accomplished in Python through the use of the map() built-in function:
>>> a = (1, 2, 3) >>> b = (4, 5, 6) >>> for i in map(None, a, b): print i ... (1, 4) (2, 5) (3, 6) >>> map(None, a, b) [(1, 4), (2, 5), (3, 6)]
The for-loop simply iterates over this list as normal.
While the map() idiom is a common one in Python, it has several disadvantages:
None
first argument is non-obvious.None
:
>>> c = (4, 5, 6, 7) >>> map(None, a, c) [(1, 4), (2, 5), (3, 6), (None, 7)]
For these reasons, several proposals were floated in the Python 2.0 beta time frame for syntactic support of lockstep for-loops. Here are two suggestions:
for x in seq1, y in seq2: # stuff
for x, y in seq1, seq2: # stuff
Neither of these forms would work, since they both already mean something in Python and changing the meanings would break existing code. All other suggestions for new syntax suffered the same problem, or were in conflict with other another proposed feature called ‘list comprehensions’ (see PEP 202).
The Proposed SolutionThe proposed solution is to introduce a new built-in sequence generator function, available in the __builtin__
module. This function is to be called zip
and has the following signature:
zip()
takes one or more sequences and weaves their elements together, just as map(None, ...)
does with sequences of equal length. The weaving stops when the shortest sequence is exhausted.
zip()
returns a real Python list, the same way map()
does.
Here are some examples, based on the reference implementation below:
>>> a = (1, 2, 3, 4) >>> b = (5, 6, 7, 8) >>> c = (9, 10, 11) >>> d = (12, 13) >>> zip(a, b) [(1, 5), (2, 6), (3, 7), (4, 8)] >>> zip(a, d) [(1, 12), (2, 13)] >>> zip(a, b, c, d) [(1, 5, 9, 12), (2, 6, 10, 13)]
Note that when the sequences are of the same length, zip()
is reversible:
>>> a = (1, 2, 3) >>> b = (4, 5, 6) >>> x = zip(a, b) >>> y = zip(*x) # alternatively, apply(zip, x) >>> z = zip(*y) # alternatively, apply(zip, y) >>> x [(1, 4), (2, 5), (3, 6)] >>> y [(1, 2, 3), (4, 5, 6)] >>> z [(1, 4), (2, 5), (3, 6)] >>> x == z 1
It is not possible to reverse zip this way when the sequences are not all the same length.
Reference ImplementationHere is a reference implementation, in Python of the zip() built-in function. This will be replaced with a C implementation after final approval:
def zip(*args): if not args: raise TypeError('zip() expects one or more sequence arguments') ret = [] i = 0 try: while 1: item = [] for s in args: item.append(s[i]) ret.append(tuple(item)) i = i + 1 except IndexError: return retBDFL Pronouncements
Note: the BDFL refers to Guido van Rossum, Python’s Benevolent Dictator For Life.
zip()
. In the face of no overwhelmingly better choice, the BDFL strongly prefers zip()
due to its Haskell [2] heritage. See version 1.7 of this PEP for the list of alternatives.zip()
shall be a built-in function.pad
keyword argument, which would be used when the argument sequences were not the same length. This is similar behavior to the map(None, ...)
semantics except that the user would be able to specify pad object. This has been rejected by the BDFL in favor of always truncating to the shortest sequence, because of the KISS principle. If there’s a true need, it is easier to add later. If it is not needed, it would still be impossible to delete it in the future.zip()
return a built-in object that performed lazy evaluation using __getitem__()
protocol. This has been strongly rejected by the BDFL in favor of returning a real Python list. If lazy evaluation is desired in the future, the BDFL suggests an xzip()
function be added.zip()
with no arguments. the BDFL strongly prefers this raise a TypeError exception.zip()
with one argument. the BDFL strongly prefers that this return a list of 1-tuples.zip()
In Python 2.4, zip() with no arguments was modified to return an empty list rather than raising a TypeError exception. The rationale for the original behavior was that the absence of arguments was thought to indicate a programming error. However, that thinking did not anticipate the use of zip() with the *
operator for unpacking variable length argument lists. For example, the inverse of zip could be defined as: unzip = lambda s: zip(*s)
. That transformation also defines a matrix transpose or an equivalent row/column swap for tables defined as lists of tuples. The latter transformation is commonly used when reading data files with records as rows and fields as columns. For example, the code:
date, rain, high, low = zip(*csv.reader(file("weather.csv")))
rearranges columnar data so that each field is collected into individual tuples for straightforward looping and summarization:
print "Total rainfall", sum(rain)
Using zip(*args)
is more easily coded if zip(*[])
is handled as an allowable case rather than an exception. This is especially helpful when data is either built up from or recursed down to a null case with no records.
Seeing this possibility, the BDFL agreed (with some misgivings) to have the behavior changed for Py2.4.
Other Changesxzip()
function discussed above was implemented in Py2.3 in the itertools
module as itertools.izip()
. This function provides lazy behavior, consuming single elements and producing a single tuple on each pass. The “just-in-time” style saves memory and runs faster than its list based counterpart, zip()
.itertools
module also added itertools.repeat()
and itertools.chain()
. These tools can be used together to pad sequences with None
(to match the behavior of map(None, seqn)
):
zip(firstseq, chain(secondseq, repeat(None)))
Greg Wilson’s questionnaire on proposed syntax to some CS grad students http://www.python.org/pipermail/python-dev/2000-July/013139.html
CopyrightThis document has been placed in the public domain.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4