A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://python.github.io/peps/pep-0472/ below:

PEP 472 – Support for indexing with keyword arguments

PEP 472 – Support for indexing with keyword arguments
Author:
Stefano Borini, Joseph Martinot-Lagarde
Discussions-To:
Python-Ideas list
Status:
Rejected
Type:
Standards Track
Created:
24-Jun-2014
Python-Version:
3.6
Post-History:
02-Jul-2014
Resolution:
Python-Dev message
Table of Contents Abstract

This PEP proposes an extension of the indexing operation to support keyword arguments. Notations in the form a[K=3,R=2] would become legal syntax. For future-proofing considerations, a[1:2, K=3, R=4] are considered and may be allowed as well, depending on the choice for implementation. In addition to a change in the parser, the index protocol (__getitem__, __setitem__ and __delitem__) will also potentially require adaptation.

Motivation

The indexing syntax carries a strong semantic content, differentiating it from a method call: it implies referring to a subset of data. We believe this semantic association to be important, and wish to expand the strategies allowed to refer to this data.

As a general observation, the number of indices needed by an indexing operation depends on the dimensionality of the data: one-dimensional data (e.g. a list) requires one index (e.g. a[3]), two-dimensional data (e.g. a matrix) requires two indices (e.g. a[2,3]) and so on. Each index is a selector along one of the axes of the dimensionality, and the position in the index tuple is the metainformation needed to associate each index to the corresponding axis.

The current python syntax focuses exclusively on position to express the association to the axes, and also contains syntactic sugar to refer to non-punctiform selection (slices)

>>> a[3]       # returns the fourth element of a
>>> a[1:10:2]  # slice notation (extract a non-trivial data subset)
>>> a[3,2]     # multiple indexes (for multidimensional arrays)

The additional notation proposed in this PEP would allow notations involving keyword arguments in the indexing operation, e.g.

which would allow to refer to axes by conventional names.

One must additionally consider the extended form that allows both positional and keyword specification

This PEP will explore different strategies to enable the use of these notations.

Use cases

The following practical use cases present two broad categories of usage of a keyworded specification: Indexing and contextual option. For indexing:

  1. To provide a more communicative meaning to the index, preventing e.g. accidental inversion of indexes
    >>> gridValues[x=3, y=5, z=8]
    >>> rain[time=0:12, location=location]
    
  2. In some domain, such as computational physics and chemistry, the use of a notation such as Basis[Z=5] is a Domain Specific Language notation to represent a level of accuracy
    >>> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3])
    

    In this case, the index operation would return a basis set at the chosen level of accuracy (represented by the parameter Z). The reason behind an indexing is that the BasisSet object could be internally represented as a numeric table, where rows (the “coefficient” axis, hidden to the user in this example) are associated to individual elements (e.g. row 0:5 contains coefficients for element 1, row 5:8 coefficients for element 2) and each column is associated to a given degree of accuracy (“accuracy” or “Z” axis) so that first column is low accuracy, second column is medium accuracy and so on. With that indexing, the user would obtain another object representing the contents of the column of the internal table for accuracy level 3.

Additionally, the keyword specification can be used as an option contextual to the indexing. Specifically:

  1. A “default” option allows to specify a default return value when the index is not present
    >>> lst = [1, 2, 3]
    >>> value = lst[5, default=0]  # value is 0
    
  2. For a sparse dataset, to specify an interpolation strategy to infer a missing point from e.g. its surrounding data.
    >>> value = array[1, 3, interpolate=spline_interpolator]
    
  3. A unit could be specified with the same mechanism
    >>> value = array[1, 3, unit="degrees"]
    

How the notation is interpreted is up to the implementing class.

Current implementation

Currently, the indexing operation is handled by methods __getitem__, __setitem__ and __delitem__. These methods’ signature accept one argument for the index (with __setitem__ accepting an additional argument for the set value). In the following, we will analyze __getitem__(self, idx) exclusively, with the same considerations implied for the remaining two methods.

When an indexing operation is performed, __getitem__(self, idx) is called. Traditionally, the full content between square brackets is turned into a single object passed to argument idx:

Except for its unique ability to handle slice notation, the indexing operation has similarities to a plain method call: it acts like one when invoked with only one element; If the number of elements is greater than one, the idx argument behaves like a *args. However, as stated in the Motivation section, an indexing operation has the strong semantic implication of extraction of a subset out of a larger set, which is not automatically associated to a regular method call unless appropriate naming is chosen. Moreover, its different visual style is important for readability.

Specifications

The implementation should try to preserve the current signature for __getitem__, or modify it in a backward-compatible way. We will present different alternatives, taking into account the possible cases that need to be addressed

C0. a[1]; a[1,2]         # Traditional indexing
C1. a[Z=3]
C2. a[Z=3, R=4]
C3. a[1, Z=3]
C4. a[1, Z=3, R=4]
C5. a[1, 2, Z=3]
C6. a[1, 2, Z=3, R=4]
C7. a[1, Z=3, 2, R=4]    # Interposed ordering
Strategy “Strict dictionary”

This strategy acknowledges that __getitem__ is special in accepting only one object, and the nature of that object must be non-ambiguous in its specification of the axes: it can be either by order, or by name. As a result of this assumption, in presence of keyword arguments, the passed entity is a dictionary and all labels must be specified.

C0. a[1]; a[1,2]      -> idx = 1; idx = (1, 2)
C1. a[Z=3]            -> idx = {"Z": 3}
C2. a[Z=3, R=4]       -> idx = {"Z": 3, "R": 4}
C3. a[1, Z=3]         -> raise SyntaxError
C4. a[1, Z=3, R=4]    -> raise SyntaxError
C5. a[1, 2, Z=3]      -> raise SyntaxError
C6. a[1, 2, Z=3, R=4] -> raise SyntaxError
C7. a[1, Z=3, 2, R=4] -> raise SyntaxError
Pros Neutral Cons Strategy “mixed dictionary”

This strategy relaxes the above constraint to return a dictionary containing both numbers and strings as keys.

C0. a[1]; a[1,2]      -> idx = 1; idx = (1, 2)
C1. a[Z=3]            -> idx = {"Z": 3}
C2. a[Z=3, R=4]       -> idx = {"Z": 3, "R": 4}
C3. a[1, Z=3]         -> idx = { 0: 1, "Z": 3}
C4. a[1, Z=3, R=4]    -> idx = { 0: 1, "Z": 3, "R": 4}
C5. a[1, 2, Z=3]      -> idx = { 0: 1, 1: 2, "Z": 3}
C6. a[1, 2, Z=3, R=4] -> idx = { 0: 1, 1: 2, "Z": 3, "R": 4}
C7. a[1, Z=3, 2, R=4] -> idx = { 0: 1, "Z": 3, 2: 2, "R": 4}
Pros Cons Strategy “named tuple”

Return a named tuple for idx instead of a tuple. Keyword arguments would obviously have their stated name as key, and positional argument would have an underscore followed by their order:

C0. a[1]; a[1,2]      -> idx = 1; idx = (_0=1, _1=2)
C1. a[Z=3]            -> idx = (Z=3)
C2. a[Z=3, R=2]       -> idx = (Z=3, R=2)
C3. a[1, Z=3]         -> idx = (_0=1, Z=3)
C4. a[1, Z=3, R=2]    -> idx = (_0=1, Z=3, R=2)
C5. a[1, 2, Z=3]      -> idx = (_0=1, _2=2, Z=3)
C6. a[1, 2, Z=3, R=4] -> (_0=1, _1=2, Z=3, R=4)
C7. a[1, Z=3, 2, R=4] -> (_0=1, Z=3, _1=2, R=4)
                      or (_0=1, Z=3, _2=2, R=4)
                      or raise SyntaxError

The required typename of the namedtuple could be Index or the name of the argument in the function definition, it keeps the ordering and is easy to analyse by using the _fields attribute. It is backward compatible, provided that C0 with more than one entry now passes a namedtuple instead of a plain tuple.

Pros Cons Strategy “New argument contents”

In the current implementation, when many arguments are passed to __getitem__, they are grouped in a tuple and this tuple is passed to __getitem__ as the single argument idx. This strategy keeps the current signature, but expands the range of variability in type and contents of idx to more complex representations.

We identify four possible ways to implement this strategy:

Some of these possibilities lead to degenerate notations, i.e. indistinguishable from an already possible representation. Once again, the proposed notation becomes syntactic sugar for these representations.

Under this strategy, the old behavior for C0 is unchanged.

C0: a[1]        -> idx = 1                    # integer
    a[1,2]      -> idx = (1,2)                # tuple

In C1, we can use either a dictionary or a tuple to represent key and value pair for the specific indexing entry. We need to have a tuple with a tuple in C1 because otherwise we cannot differentiate a["Z", 3] from a[Z=3].

C1: a[Z=3]      -> idx = {"Z": 3}             # P1/P2 dictionary with single key
                or idx = (("Z", 3),)          # P3 tuple of tuples
                or idx = keyword("Z", 3)      # P4 keyword object

As you can see, notation P1/P2 implies that a[Z=3] and a[{"Z": 3}] will call __getitem__ passing the exact same value, and is therefore syntactic sugar for the latter. Same situation occurs, although with different index, for P3. Using a keyword object as in P4 would remove this degeneracy.

For the C2 case:

C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4}     # P1 dictionary/ordereddict
                or idx = ({"Z": 3}, {"R": 4}) # P2 tuple of two single-key dict
                or idx = (("Z", 3), ("R", 4)) # P3 tuple of tuples
                or idx = (keyword("Z", 3),
                          keyword("R", 4) )   # P4 keyword objects

P1 naturally maps to the traditional **kwargs behavior, however it breaks the convention that two or more entries for the index produce a tuple. P2 preserves this behavior, and additionally preserves the order. Preserving the order would also be possible with an OrderedDict as drafted by PEP 468.

The remaining cases are here shown:

C3. a[1, Z=3]   -> idx = (1, {"Z": 3})                     # P1/P2
                or idx = (1, ("Z", 3))                     # P3
                or idx = (1, keyword("Z", 3))              # P4

C4. a[1, Z=3, R=4] -> idx = (1, {"Z": 3, "R": 4})          # P1
                   or idx = (1, {"Z": 3}, {"R": 4})        # P2
                   or idx = (1, ("Z", 3), ("R", 4))        # P3
                   or idx = (1, keyword("Z", 3),
                                keyword("R", 4))           # P4

C5. a[1, 2, Z=3]   -> idx = (1, 2, {"Z": 3})               # P1/P2
                   or idx = (1, 2, ("Z", 3))               # P3
                   or idx = (1, 2, keyword("Z", 3))        # P4

C6. a[1, 2, Z=3, R=4] -> idx = (1, 2, {"Z":3, "R": 4})     # P1
                      or idx = (1, 2, {"Z": 3}, {"R": 4})  # P2
                      or idx = (1, 2, ("Z", 3), ("R", 4))  # P3
                      or idx = (1, 2, keyword("Z", 3),
                                      keyword("R", 4))     # P4

C7. a[1, Z=3, 2, R=4] -> idx = (1, 2, {"Z": 3, "R": 4})    # P1. Pack the keyword arguments. Ugly.
                      or raise SyntaxError                 # P1. Same behavior as in function calls.
                      or idx = (1, {"Z": 3}, 2, {"R": 4})  # P2
                      or idx =  (1, ("Z", 3), 2, ("R", 4)) # P3
                      or idx =  (1, keyword("Z", 3),
                                 2, keyword("R", 4))       # P4
Pros Cons Strategy “kwargs argument”

__getitem__ accepts an optional **kwargs argument which should be keyword only. idx also becomes optional to support a case where no non-keyword arguments are allowed. The signature would then be either

__getitem__(self, idx)
__getitem__(self, idx, **kwargs)
__getitem__(self, **kwargs)

Applied to our cases would produce:

C0. a[1,2]            -> idx=(1,2);  kwargs={}
C1. a[Z=3]            -> idx=None ;  kwargs={"Z":3}
C2. a[Z=3, R=4]       -> idx=None ;  kwargs={"Z":3, "R":4}
C3. a[1, Z=3]         -> idx=1    ;  kwargs={"Z":3}
C4. a[1, Z=3, R=4]    -> idx=1    ;  kwargs={"Z":3, "R":4}
C5. a[1, 2, Z=3]      -> idx=(1,2);  kwargs={"Z":3}
C6. a[1, 2, Z=3, R=4] -> idx=(1,2);  kwargs={"Z":3, "R":4}
C7. a[1, Z=3, 2, R=4] -> raise SyntaxError # in agreement to function behavior

Empty indexing a[] of course remains invalid syntax.

Pros Cons C interface

As briefly introduced in the previous analysis, the C interface would potentially have to change to allow the new feature. Specifically, PyObject_GetItem and related routines would have to accept an additional PyObject *kw argument for Strategy “kwargs argument”. The remaining strategies would not require a change in the C function signatures, but the different nature of the passed object would potentially require adaptation.

Strategy “named tuple” would behave correctly without any change: the class returned by the factory method in collections returns a subclass of tuple, meaning that PyTuple_* functions can handle the resulting object.

Alternative Solutions

In this section, we present alternative solutions that would workaround the missing feature and make the proposed enhancement not worth of implementation.

Use a method

One could keep the indexing as is, and use a traditional get() method for those cases where basic indexing is not enough. This is a good point, but as already reported in the introduction, methods have a different semantic weight from indexing, and you can’t use slices directly in methods. Compare e.g. a[1:3, Z=2] with a.get(slice(1,3), Z=2).

The authors however recognize this argument as compelling, and the advantage in semantic expressivity of a keyword-based indexing may be offset by a rarely used feature that does not bring enough benefit and may have limited adoption.

Emulate requested behavior by abusing the slice object

This extremely creative method exploits the slice objects’ behavior, provided that one accepts to use strings (or instantiate properly named placeholder objects for the keys), and accept to use “:” instead of “=”.

>>> a["K":3]
slice('K', 3, None)
>>> a["K":3, "R":4]
(slice('K', 3, None), slice('R', 4, None))
>>>

While clearly smart, this approach does not allow easy inquire of the key/value pair, it’s too clever and esotheric, and does not allow to pass a slice as in a[K=1:10:2].

However, Tim Delaney comments

“I really do think that a[b=c, d=e] should just be syntax sugar for a['b':c, 'd':e]. It’s simple to explain, and gives the greatest backwards compatibility. In particular, libraries that already abused slices in this way will just continue to work with the new syntax.”

We think this behavior would produce inconvenient results. The library Pandas uses strings as labels, allowing notation such as

to extract data from column “A” to column “F”. Under the above comment, this notation would be equally obtained with

which is weird and collides with the intended meaning of keyword in indexing, that is, specifying the axis through conventional names rather than positioning.

Pass a dictionary as an additional index

this notation, although less elegant, can already be used and achieves similar results. It’s evident that the proposed Strategy “New argument contents” can be interpreted as syntactic sugar for this notation.

References Copyright

This document has been placed in the public domain.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4