- Status: Accepted
- Implementation: Done by Dag Sverre Seljebotn during Google Summer of Code 2008
This is a feature which allows convenient use of the buffer API found in Python 3. This will make it possible to easily write very fast code towards popular Python libraries such as NumPy and PIL (the Python Imaging Libary). For Python 2, the same end-result is faked through a Cython-specific mechanism.
See also the PEP-3118 (Python buffers).
The "prototype" for the syntax for declaring that a variable buf supports efficient buffer access is
#!python cdef sometype[dtype, *, ndim, mode] buf
The * means that the arguments after it must be given as keyword arguments. (This is done so that the best choice for second positional argument can be picked later).
Examples:
#!python # buf1 can hold any Python object with 1-dimensional int buffer access cdef object[int] buf1 # buf2 can hold MyType instances with 2-dimensional float buffer access in strided mode cdef MyType[float, ndim=2, mode="strided"] buf2 * **dtype** -- The datatype that the indexing operator will work with. This is not passed to the buffer API, but the acquired buffer is checked for type information in order to check if this matches. At first we'll only support native datatypes and Python objects, although support for pure struct support can be added at a later point (as the buffer API supports storing structs). * **ndim** -- The number of dimensions. Default value is 1. If set to 1, Cython can acquire non-ND access if wanted (but currently ND access is always required). * **mode** -- Can be set to either of the following values. "full" is the default.Mode PEP flags Description
full
PyBUF_INDIRECT
Can access any buffer. However it is a rather inefficient access method if the data is stored in a more efficient manner. strided
PyBUF_STRIDED
Can access any strided data. c
PyBUF_C_CONTIGUOUS
The data must be C contiguous. The last index is not multiplied by a dynamic stride (but implicitly by the item size by the C compiler), which could lead to some speedup (particularly for low-dimensional access). Otherwise behaves like strided. fortran
PyBUF_F_CONTIGUOUS
Like c, but Fortran-style array ordering is assumed and the first index is the one not multiplied by a dynamic stride.
- negative_indices -- Boolean, defaults to
True
. If set toFalse
, the buffer will not support the usual wrap-around for negative indices, i.e. all negative indices are out-of-bounds. This can make it more convenient to write code that is optimal when bounds-checking is turned off (i.e. no casting to unsigned ints is necesarry to get optimial access without any if-tests).- cast -- Boolean, defaults to
False
. Skips the check of the format string, and only relies on theitemsize
exported by the buffer to determine whether the data access is valid. Note that this only casts individual buffer items which must have the same size, so that looking up items still works reliably on a casted buffer. The main usecase for this is if the buffer exports data in a non-native endian and alignment mode: This is not supported directly by Cython, but withcast
set toTrue
it is still possible to access such buffers, as long as one makes sure by other means that the data is interpreted correctly.
The writable flag is not set explicitly -- if the function that acquires the buffer only uses the read version of the indexing operator, read only access is all that is asked for, whereas if the buffer is ever written to in the function scope then PyBUF_WRITABLE is passed.
Once the buffer access is set up on a variable, it is used for indexing operators which has exactly the right number of integer indices. If the number of integer indices are different, or slices or non-integer objects are involved, one falls back to normal Python behaviour. Example:
#!python def f(object[float, ndim=2] buf): print buf[0] # Python indexing, no. of indices != 2 a = 2 print buf[a, a] # Python indexing, object indices cdef int b = 3 print buf[b, 2] # Efficient buffer indexing: 2 int indices print buf[2, :] # Python indexing: Object indices
When a variable is reassigned, the old buffer is released and a new one acquired.
The buffer is always acquired within local function scope (upon entering and whenever the buffer variable is reassigned). Currently, only variables within local function scope can be buffers.
tests/run/bufaccess.pyx
contains a lot more examples.
Bounds checking happen by default and raises an IndexError like usual. This can be controlled using the boundscheck
compiler directive.
Negative indices will be supported; through a check for whether the index is negative or not. This has almost no extra cost if bounds checking is turned on; otherwise this has a real performance penalty.
If an unsigned
index is used the check for negative indices is not performed. So for the fastest possible access one must turn off bounds checking and use unsigned indices.
If one assigns "None" to a buffer variable, no buffer can be acquired. The buffer info is then set up so that 0 is always dereferenced on indexing, which should trigger a segfault.
Buffer acquisition and Python versionsBuffers can be used for all Python versions from 2.3 and upwards, however for some Python version emulation is needed which result in slightly different semantics.
Note that you can create your own buffer exporters in Cython by implementing __getbuffer__
and __releasebuffer__
(see tests/run/buffer.pyx).
Breakdown:
__getbuffer__
. The implementation bodies must be present (an exception is made so that such method bodies can be provided in pxd files -- the exact syntax used for this is currently considered unstable and may change).Gotcha: Currently, inheritance may fail to work correctly for this mechanism. Don't rely on having more than one __getbuffer__
in your hierarchy unless you are using Python >= 2.6.
1. Thetp_flags
field of the object is checked for the presence ofPy_TPFLAGS_HAVE_NEWBUFFER
. If present, the method of Python 3 is used. (All Cython-generated extension classes have this flag set.) 1. Otherwise, the emulation used for earlier Python versions is used.
Note that if the object has the right flag set, the emulation is never used (in particular a bf_getbuffer
of NULL
will cause a run-time exception). In effect, emulation is only used for objects coming from other non-Cython extension libraries written for earlier Python versions.
PEP-3118 is used directly, i.e. the bf_getbuffer and bf_releasebuffer slots are called and that is the only mechanism used. Objects assigned to buffer variables must have bf_getbuffer filled in or otherwise a run-time exception is raised.
Example of difference in behaviourThe following pyx file works in Python >= 2.6 towards any buffer object, while it will not work for any objects under Python 2.5:
def f(object[unsigned char, ndim=2] a): print a[0,0]
By adding two cimports, under Python <= 2.5 the code will work with objects from PIL and NumPy, but no other objects. For Python >= 2.6 it of course still works with any buffer object.
cimport numpy cimport Imaging def f(object[unsigned char, ndim=2] a): print a[0,0]
Here numpy.pxd and Imaging.pxd are assumed to have __getbuffer__
defined with an implementation body in the pxd file.
A cdef class
can if wanted provide a special __cythonbufferdefaults__
field which will provide default buffer options for that specific class. Example:
#!python cdef class MyArray: cdef __cythonbufferdefaults__ = {"ndim": 2, "mode": "strided"} def f(MyArray[int] buf): print buf[3, 4] # this is 2D strided buffer access def g(object[int] buf): print buf[3, 4] # this is inefficient Python access, ... # also when g is passed an instance of MyArray
Unfortunately, the dtype parameter cannot be set by this mechanism presently; this can be considered a known bug.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4