This lists various issues with built-in functions and methods, together with a plan for a solution and (if applicable) pointers to standards track PEPs discussing the details.
1. NamingThe word “built-in” is overused in Python. From a quick skim of the Python documentation, it mostly refers to things from the builtins
module. In other words: things which are available in the global namespace without a need for importing them. This conflicts with the use of the word “built-in” to mean “implemented in C”.
Solution: since the C structure for built-in functions and methods is already called PyCFunctionObject
, let’s use the name “cfunction” and “cmethod” instead of “built-in function” and “built-in method”.
The various classes involved (such as builtin_function_or_method
) cannot be subclassed:
>>> from types import BuiltinFunctionType >>> class X(BuiltinFunctionType): ... pass Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: type 'builtin_function_or_method' is not an acceptable base type
This is a problem because it makes it impossible to add features such as introspection support to these classes.
If one wants to implement a function in C with additional functionality, an entirely new class must be implemented from scratch. The problem with this is that the existing classes like builtin_function_or_method
are special-cased in the Python interpreter to allow faster calling (for example, by using METH_FASTCALL
). It is currently impossible to have a custom class with the same optimizations.
Solution: make the existing optimizations available to arbitrary classes. This is done by adding a new PyTypeObject
field tp_ccalloffset
(or can we re-use tp_print
for that?) specifying the offset of a PyCCallDef
pointer. This is a new structure holding all information needed to call a cfunction and it would be used instead of PyMethodDef
. This implements the new “C call” protocol.
For constructing cfunctions and cmethods, PyMethodDef
arrays will still be used (for example, in tp_methods
) but that will be the only remaining purpose of the PyMethodDef
structure.
Additionally, we can also make some function classes subclassable. However, this seems less important once we have tp_ccalloffset
.
Reference: PEP 580
3. cfunctions do not become methodsA cfunction like repr
does not implement __get__
to bind as a method:
>>> class X: ... meth = repr >>> x = X() >>> x.meth() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: repr() takes exactly one argument (0 given)
In this example, one would have expected that x.meth()
returns repr(x)
by applying the normal rules of methods.
This is surprising and a needless difference between cfunctions and Python functions. For the standard built-in functions, this is not really a problem since those are not meant to used as methods. But it does become a problem when one wants to implement a new cfunction with the goal of being usable as method.
Again, a solution could be to create a new class behaving just like cfunctions but which bind as methods. However, that would lose some existing optimizations for methods, such as the LOAD_METHOD
/CALL_METHOD
opcodes.
Solution: the same as the previous issue. It just shows that handling self
and __get__
should be part of the new C call protocol.
For backwards compatibility, we would keep the existing non-binding behavior of cfunctions. We would just allow it in custom classes.
Reference: PEP 580
4. Semantics of inspect.isfunctionCurrently, inspect.isfunction
returns True
only for instances of types.FunctionType
. That is, true Python functions.
A common use case for inspect.isfunction
is checking for introspection: it guarantees for example that inspect.getfile()
will work. Ideally, it should be possible for other classes to be treated as functions too.
Solution: introduce a new InspectFunction
abstract base class and use that to implement inspect.isfunction
. Alternatively, use duck typing for inspect.isfunction
(as proposed in [2]):
def isfunction(obj): return hasattr(type(obj), "__code__")5. C functions should have access to the function object
The underlying C function of a cfunction currently takes a self
argument (for bound methods) and then possibly a number of arguments. There is no way for the C function to actually access the Python cfunction object (the self
in __call__
or tp_call
). This would for example allow implementing the C call protocol for Python functions (types.FunctionType
): the C function which implements calling Python functions needs access to the __code__
attribute of the function.
This is also needed for PEP 573 where all cfunctions require access to their “parent” (the module for functions of a module or the defining class for methods).
Solution: add a new PyMethodDef
flag to specify that the C function takes an additional argument (as first argument), namely the function object.
The METH_FASTCALL
mechanism allows calling cfunctions and cmethods using a C array of Python objects instead of a tuple
. This was introduced in Python 3.6 for positional arguments only and extended in Python 3.7 with support for keyword arguments.
However, given that it is undocumented, it is presumably only supposed to be used by CPython itself.
Solution: since this is an important optimization, everybody should be encouraged to use it. Now that the implementation of METH_FASTCALL
is stable, document it!
As part of the C call protocol, we should also add a C API function
PyObject *PyCCall_FastCall(PyObject *func, PyObject *const *args, Py_ssize_t nargs, PyObject *keywords)
Reference: PEP 580
7. Allowing native C argumentsA cfunction always takes its arguments as Python objects (say, an array of PyObject
pointers). In cases where the cfunction is really wrapping a native C function (for example, coming from ctypes
or some compiler like Cython), this is inefficient: calls from C code to C code are forced to use Python objects to pass arguments.
Analogous to the buffer protocol which allows access to C data, we should also allow access to the underlying C callable.
Solution: when wrapping a C function with native arguments (for example, a C long
) inside a cfunction, we should also store a function pointer to the underlying C function, together with its C signature.
Argument Clinic could automatically do this by storing a pointer to the “impl” function.
8. ComplexityThere are a huge number of classes involved to implement all variations of methods. This is not a problem by itself, but a compounding issue.
For ordinary Python classes, the table below gives the classes for various kinds of methods. The columns refer to the class in the class __dict__
, the class for unbound methods (bound to the class) and the class for bound methods (bound to the instance):
function
function
method
Static method staticmethod
function
function
Class method classmethod
method
method
Slot method function
function
method
This is the analogous table for extension types (C classes):
kind __dict__ unbound bound Normal methodmethod_descriptor
method_descriptor
builtin_function_or_method
Static method staticmethod
builtin_function_or_method
builtin_function_or_method
Class method classmethod_descriptor
builtin_function_or_method
builtin_function_or_method
Slot method wrapper_descriptor
wrapper_descriptor
method-wrapper
There are a lot of classes involved and these two tables look very different. There is no good reason why Python methods should be treated fundamentally different from C methods. Also the features are slightly different: for example, method
supports __func__
but builtin_function_or_method
does not.
Since CPython has optimizations for calls to most of these objects, the code for dealing with them can also become complex. A good example of this is the call_function
function in Python/ceval.c
.
Solution: all these classes should implement the C call protocol. Then the complexity in the code can mostly be fixed by checking for the C call protocol (tp_ccalloffset != 0
) instead of doing type checks.
Furthermore, it should be investigated whether some of these classes can be merged and whether method
can be re-used also for bound methods of extension types (see PEP 576 for the latter, keeping in mind that this may have some minor backwards compatibility issues). This is not a goal by itself but just something to keep in mind when working on these classes.
The typical way to create a cfunction or cmethod in an extension module is by using a PyMethodDef
to define it. These are then stored in an array PyModuleDef.m_methods
(for cfunctions) or PyTypeObject.tp_methods
(for cmethods). However, because of the stable ABI (PEP 384), we cannot change the PyMethodDef
structure.
So, this means that we cannot add new fields for creating cfunctions/cmethods this way. This is probably the reason for the hack that __doc__
and __text_signature__
are stored in the same C string (with the __doc__
and __text_signature__
descriptors extracting the relevant part).
Solution: stop assuming that a single PyMethodDef
entry is sufficient to describe a cfunction/cmethod. Instead, we could add some flag which means that one of the PyMethodDef
fields is instead a pointer to an additional structure. Or, we could add a flag to use two or more consecutive PyMethodDef
entries in the array to store more data. Then the PyMethodDef
array would be used only to construct cfunctions/cmethods but it would no longer be used after that.
Right now, slot wrappers like __init__
or __lt__
only have very generic documentation, not at all specific to the class:
>>> list.__init__.__doc__ 'Initialize self. See help(type(self)) for accurate signature.' >>> list.__lt__.__doc__ 'Return self<value.'
The same happens for the signature:
>>> list.__init__.__text_signature__ '($self, /, *args, **kwargs)'
As you can see, slot wrappers do support __doc__
and __text_signature__
. The problem is that these are stored in struct wrapperbase
, which is common for all wrappers of a specific slot (for example, the same wrapperbase
is used for str.__eq__
and int.__eq__
).
Solution: rethink the slot wrapper class to allow docstrings (and text signatures) for each instance separately.
This still leaves the question of how extension modules should specify the documentation. The PyTypeObject
entries like tp_init
are just function pointers, we cannot do anything with those. One solution would be to add entries to the tp_methods
array just for adding docstrings. Such an entry could look like
{"__init__", NULL, METH_SLOTDOC, "pointer to __init__ doc goes here"}11. Static methods and class methods should be callable
Instances of staticmethod
and classmethod
should be callable. Admittedly, there is no strong use case for this, but it has occasionally been requested (see for example [1]).
Making static/class methods callable would increase consistency. First of all, function decorators typically add functionality or modify a function, but the result remains callable. This is not true for @staticmethod
and @classmethod
.
Second, class methods of extension types are already callable:
>>> fromhex = float.__dict__["fromhex"] >>> type(fromhex) <class 'classmethod_descriptor'> >>> fromhex(float, "0xff") 255.0
Third, one can see function
, staticmethod
and classmethod
as different kinds of unbound methods: they all become method
when bound, but the implementation of __get__
is slightly different. From this point of view, it looks strange that function
is callable but the others are not.
Solution: when changing the implementation of staticmethod
, classmethod
, we should consider making instances callable. Even if this is not a goal by itself, it may happen naturally because of the implementation.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4