This PEP proposes a Python-level API for the buffer protocol, which is currently accessible only to C code. This allows type checkers to evaluate whether objects implement the protocol.
MotivationThe CPython C API provides a versatile mechanism for accessing the underlying memory of an object—the buffer protocol introduced in PEP 3118. Functions that accept binary data are usually written to handle any object implementing the buffer protocol. For example, at the time of writing, there are around 130 functions in CPython using the Argument Clinic Py_buffer
type, which accepts the buffer protocol.
Currently, there is no way for Python code to inspect whether an object supports the buffer protocol. Moreover, the static type system does not provide a type annotation to represent the protocol. This is a common problem when writing type annotations for code that accepts generic buffers.
Similarly, it is impossible for a class written in Python to support the buffer protocol. A buffer class in Python would give users the ability to easily wrap a C buffer object, or to test the behavior of an API that consumes the buffer protocol. Granted, this is not a particularly common need. However, there has been a CPython feature request for supporting buffer classes written in Python that has been open since 2012.
Rationale Current optionsThere are two known workarounds for annotating buffer types in the type system, but neither is adequate.
First, the current workaround for buffer types in typeshed is a type alias that lists well-known buffer types in the standard library, such as bytes
, bytearray
, memoryview
, and array.array
. This approach works for the standard library, but it does not extend to third-party buffer types.
Second, the documentation for typing.ByteString
currently states:
This type represents the types
bytes
,
bytearray
, and
memoryview
of byte sequences.
As a shorthand for this type,
bytes
can be used to annotate arguments of any of the types mentioned above.
Although this sentence has been in the documentation since 2015, the use of bytes
to include these other types is not specified in any of the typing PEPs. Furthermore, this mechanism has a number of problems. It does not include all possible buffer types, and it makes the bytes
type ambiguous in type annotations. After all, there are many operations that are valid on bytes
objects, but not on memoryview
objects, and it is perfectly possible for a function to accept bytes
but not memoryview
objects. A mypy user reports that this shortcut has caused significant problems for the psycopg
project.
The C buffer protocol supports many options, affecting strides, contiguity, and support for writing to the buffer. Some of these options would be useful in the type system. For example, typeshed currently provides separate type aliases for writable and read-only buffers.
However, in the C buffer protocol, most of these options cannot be queried directly on the type object. The only way to figure out whether an object supports a particular flag is to actually ask for the buffer. For some types, such as memoryview
, the supported flags depend on the instance. As a result, it would be difficult to represent support for these flags in the type system.
We propose to add two Python-level special methods, __buffer__
and __release_buffer__
. Python classes that implement these methods are usable as buffers from C code. Conversely, classes implemented in C that support the buffer protocol acquire synthesized methods accessible from Python code.
The __buffer__
method is called to create a buffer from a Python object, for example by the memoryview()
constructor. It corresponds to the bf_getbuffer
C slot. The Python signature for this method is def __buffer__(self, flags: int, /) -> memoryview: ...
. The method must return a memoryview
object. If the bf_getbuffer
slot is invoked on a Python class with a __buffer__
method, the interpreter extracts the underlying Py_buffer
from the memoryview
returned by the method and returns it to the C caller. Similarly, if Python code calls the __buffer__
method on an instance of a C class that implements bf_getbuffer
, the returned buffer is wrapped in a memoryview
for consumption by Python code.
The __release_buffer__
method should be called when a caller no longer needs the buffer returned by __buffer__
. It corresponds to the bf_releasebuffer
C slot. This is an optional part of the buffer protocol. The Python signature for this method is def __release_buffer__(self, buffer: memoryview, /) -> None: ...
. The buffer to be released is wrapped in a memoryview
. When this method is invoked through CPython’s buffer API (for example, through calling memoryview.release
on a memoryview
returned by __buffer__
), the passed memoryview
is the same object as was returned by __buffer__
. It is also possible to call __release_buffer__
on a C class that implements bf_releasebuffer
.
If __release_buffer__
exists on an object, Python code that calls __buffer__
directly on the object must call __release_buffer__
on the same object when it is done with the buffer. Otherwise, resources used by the object may not be reclaimed. Similarly, it is a programming error to call __release_buffer__
without a previous call to __buffer__
, or to call it multiple times for a single call to __buffer__
. For objects that implement the C buffer protocol, calls to __release_buffer__
where the argument is not a memoryview
wrapping the same object will raise an exception. After a valid call to __release_buffer__
, the memoryview
is invalidated (as if its release()
method had been called), and any subsequent calls to __release_buffer__
with the same memoryview
will raise an exception. The interpreter will ensure that misuse of the Python API will not break invariants at the C level – for example, it will not cause memory safety violations.
inspect.BufferFlags
To help implementations of __buffer__
, we add inspect.BufferFlags
, a subclass of enum.IntFlag
. This enum contains all flags defined in the C buffer protocol. For example, inspect.BufferFlags.SIMPLE
has the same value as the PyBUF_SIMPLE
constant.
collections.abc.Buffer
We add a new abstract base classes, collections.abc.Buffer
, which requires the __buffer__
method. This class is intended primarily for use in type annotations:
def need_buffer(b: Buffer) -> memoryview: return memoryview(b) need_buffer(b"xy") # ok need_buffer("xy") # rejected by static type checkers
It can also be used in isinstance
and issubclass
checks:
>>> from collections.abc import Buffer >>> isinstance(b"xy", Buffer) True >>> issubclass(bytes, Buffer) True >>> issubclass(memoryview, Buffer) True >>> isinstance("xy", Buffer) False >>> issubclass(str, Buffer) False
In the typeshed stub files, the class should be defined as a Protocol
, following the precedent of other simple ABCs in collections.abc
such as collections.abc.Iterable
or collections.abc.Sized
.
The following is an example of a Python class that implements the buffer protocol:
import contextlib import inspect class MyBuffer: def __init__(self, data: bytes): self.data = bytearray(data) self.view = None def __buffer__(self, flags: int) -> memoryview: if flags != inspect.BufferFlags.FULL_RO: raise TypeError("Only BufferFlags.FULL_RO supported") if self.view is not None: raise RuntimeError("Buffer already held") self.view = memoryview(self.data) return self.view def __release_buffer__(self, view: memoryview) -> None: assert self.view is view # guaranteed to be true self.view.release() self.view = None def extend(self, b: bytes) -> None: if self.view is not None: raise RuntimeError("Cannot extend held buffer") self.data.extend(b) buffer = MyBuffer(b"capybara") with memoryview(buffer) as view: view[0] = ord("C") with contextlib.suppress(RuntimeError): buffer.extend(b"!") # raises RuntimeError buffer.extend(b"!") # ok, buffer is no longer held with memoryview(buffer) as view: assert view.tobytes() == b"Capybara!"Equivalent for older Python versions
New typing features are usually backported to older Python versions in the typing_extensions package. Because the buffer protocol is currently accessible only in C, this PEP cannot be fully implemented in a pure-Python package like typing_extensions
. As a temporary workaround, an abstract base class typing_extensions.Buffer
will be provided for Python versions that do not have collections.abc.Buffer
available.
After this PEP is implemented, inheriting from collections.abc.Buffer
will not be necessary to indicate that an object supports the buffer protocol. However, in older Python versions, it will be necessary to explicitly inherit from typing_extensions.Buffer
to indicate to type checkers that a class supports the buffer protocol, since objects supporting the buffer protocol will not have a __buffer__
method. It is expected that this will happen primarily in stub files, because buffer classes are necessarily implemented in C code, which cannot have types defined inline. For runtime uses, the ABC.register
API can be used to register buffer classes with typing_extensions.Buffer
.
bytes
The special case stating that bytes
may be used as a shorthand for other ByteString
types will be removed from the typing
documentation. With collections.abc.Buffer
available as an alternative, there will be no good reason to allow bytes
as a shorthand. Type checkers currently implementing this behavior should deprecate and eventually remove it.
__buffer__
and __release_buffer__
attributes
As the runtime changes in this PEP only add new functionality, there are few backwards compatibility concerns.
However, code that uses a __buffer__
or __release_buffer__
attribute for other purposes may be affected. While all dunders are technically reserved for the language, it is still good practice to ensure that a new dunder does not interfere with too much existing code, especially widely used packages. A survey of publicly accessible code found:
__buffer__
method with compatible semantics to those proposed in this PEP. A PyPy core developer expressed his support for this PEP.__buffer__
method.SupportsBuffer
protocol that would be equivalent to this PEP’s collections.abc.Buffer
.__buffer__
attribute (not method) to get an object’s buffer. This was removed in 2019 for NumPy 1.17. The behavior would have last worked in NumPy 1.16, which only supported Python 3.7 and older. Python 3.7 will have reached its end of life by the time this PEP is expected to be implemented.Thus, this PEP’s use of the __buffer__
method will improve interoperability with PyPy and not interfere with the current versions of any major Python packages.
No publicly accessible code uses the name __release_buffer__
.
bytes
special case
Separately, the recommendation to remove the special behavior for bytes
in type checkers does have a backwards compatibility impact on their users. An experiment with mypy shows that several major open source projects that use it for type checking will see new errors if the bytes
promotion is removed. Many of these errors can be fixed by improving the stubs in typeshed, as has already been done for the builtins, binascii, pickle, and re modules. A review of all usage of bytes
types in typeshed is in progress. Overall, the change improves type safety and makes the type system more consistent, so we believe the migration cost is worth it.
We will add notes pointing to collections.abc.Buffer
in appropriate places in the documentation, such as typing.python.org and the mypy cheat sheet. Type checkers may provide additional pointers in their error messages. For example, when they encounter a buffer object being passed to a function that is annotated to only accept bytes
, the error message could include a note suggesting the use of collections.abc.Buffer
instead.
An implementation of this PEP is available in the author’s fork.
Rejected Ideastypes.Buffer
An earlier version of this PEP proposed adding a new types.Buffer
type with an __instancecheck__
implemented in C so that isinstance()
checks can be used to check whether a type implements the buffer protocol. This avoids the complexity of exposing the full buffer protocol to Python code, while still allowing the type system to check for the buffer protocol.
However, that approach does not compose well with the rest of the type system, because types.Buffer
would be a nominal type, not a structural one. For example, there would be no way to represent “an object that supports both the buffer protocol and __len__
”. With the current proposal, __buffer__
is like any other special method, so a Protocol
can be defined combining it with another method.
More generally, no other part of Python works like the proposed types.Buffer
. The current proposal is more consistent with the rest of the language, where C-level slots usually have corresponding Python-level special methods.
bytearray
compatible with bytes
It has been suggested to remove the special case where memoryview
is always compatible with bytes
, but keep it for bytearray
, because the two types have very similar interfaces. However, several standard library functions (e.g., re.compile
, socket.getaddrinfo
, and most functions accepting path-like arguments) accept bytes
but not bytearray
. In most codebases, bytearray
is also not a very common type. We prefer to have users spell out accepted types explicitly (or use Protocol
from PEP 544 if only a specific set of methods is required). This aspect of the proposal was specifically discussed on the typing-sig mailing list, without any strong disagreement from the typing community.
The most frequently used distinction within buffer types is whether or not the buffer is mutable. Some functions accept only mutable buffers (e.g., bytearray
, some memoryview
objects), others accept all buffers.
An earlier version of this PEP proposed using the presence of the bf_releasebuffer
slot to determine whether a buffer type is mutable. This rule holds for most standard library buffer types, but the relationship between mutability and the presence of this slot is not absolute. For example, numpy
arrays are mutable but do not have this slot.
The current buffer protocol does not provide any way to reliably determine whether a buffer type represents a mutable or immutable buffer. Therefore, this PEP does not add type system support for this distinction. The question can be revisited in the future if the buffer protocol is enhanced to provide static introspection support. A sketch for such a mechanism exists.
AcknowledgmentsMany people have provided useful feedback on drafts of this PEP. Petr Viktorin has been particularly helpful in improving my understanding of the subtleties of the buffer protocol.
CopyrightThis document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4