A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html below:

CUDA Array Interface (Version 3) — Numba 0+untagged.1510.g1e70d8c.dirty documentation

Numba CUDA Array Interface (Version 3)

The CUDA Array Interface (or CAI) is created for interoperability between different implementations of CUDA array-like objects in various projects. The idea is borrowed from the NumPy array interface.

Note

Currently, we only define the Python-side interface. In the future, we may add a C-side interface for efficient exchange of the information in compiled code.

Python Interface Specification

Note

Experimental feature. Specification may change.

The __cuda_array_interface__ attribute returns a dictionary (dict) that must contain the following entries:

The following are optional entries:

Synchronization Definitions

When discussing synchronization, the following definitions are used:

In the following example:

import cupy
from numba import cuda

@cuda.jit
def add(x, y, out):
    start = cuda.grid(1)
    stride = cuda.gridsize(1)
    for i in range(start, x.shape[0], stride):
        out[i] = x[i] + y[i]

a = cupy.arange(10)
b = a * 2
out = cupy.zeros_like(a)

add[1, 32](a, b, out)

When the add kernel is launched:

Design Motivations

Elements of the CAI design related to synchronization seek to fulfill these requirements:

  1. Producers and Consumers that exchange data through the CAI must be able to do so without data races.

  2. Requirement 1 should be met without requiring the user to be aware of any particulars of the CAI - in other words, exchanging data between Producers and Consumers that operate on data asynchronously should be correct by default.

  3. Where the User is aware of the particulars of the CAI and implementation details of the Producer and Consumer, they should be able to, at their discretion, override some of the synchronization semantics of the interface to reduce the synchronization overhead. Overriding synchronization semantics implies that:

Interface Requirements

The stream entry enables Producers and Consumers to avoid hazards when exchanging data. Expected behaviour of the Consumer is as follows:

When exporting an array through the CAI, Producers must ensure that:

Optionally, to facilitate the User relaxing conformance to synchronization semantics:

These options should not be set by default in either a Producer or a Consumer. The CAI specification does not prescribe the exact mechanism by which these options are set, or related options that Producers or Consumers might provide to allow the user further control over synchronization behavior.

Synchronization in Numba

Numba is neither strictly a Producer nor a Consumer - it may be used to implement either by a User. In order to facilitate the correct implementation of synchronization semantics, Numba exhibits the following behaviors related to synchronization of the interface:

Note

In Numba’s terminology, an array’s default stream is a property specifying the stream that Numba will enqueue asynchronous transfers in if no other stream is provided as an argument to the function invoking the transfer. It is not the same as the Default Stream in normal CUDA terminology.

Numba’s synchronization behavior results in the following intended consequences:

The User may override Numba’s synchronization behavior by setting the environment variable NUMBA_CUDA_ARRAY_INTERFACE_SYNC or the config variable CUDA_ARRAY_INTERFACE_SYNC to 0 (see GPU Support Environment Variables). When set, Numba will not synchronize on the streams of imported arrays, and it is the responsibility of the user to ensure correctness with respect to stream synchronization. Synchronization when creating a Numba CUDA Array from an object exporting the CUDA Array Interface may also be elided by passing sync=False when creating the Numba CUDA Array with numba.cuda.as_cuda_array() or numba.cuda.from_cuda_array_interface().

There is scope for Numba’s synchronization implementation to be optimized in the future, by eliding synchronizations when a kernel or driver API operation (e.g. a memcopy or memset) is launched on the same stream as an imported array.

An example launching on an array’s non-default stream

This example shows how to ensure that a Consumer can safely consume an array with a default stream when it is passed to a kernel launched in a different stream.

First we need to import Numba and a consumer library (a fictitious library named other_cai_library for this example):

from numba import cuda, int32, void
import other_cai_library

Now we’ll define a kernel - this initializes the elements of the array, setting each entry to its index:

@cuda.jit(void, int32[::1])
def initialize_array(x):
    i = cuda.grid(1)
    if i < len(x):
        x[i] = i

Next we will create two streams:

array_stream = cuda.stream()
kernel_stream = cuda.stream()

Then create an array with one of the streams as its default stream:

N = 16384
x = cuda.device_array(N, stream=array_stream)

Now we launch the kernel in the other stream:

nthreads = 256
nblocks = N // nthreads

initialize_array[nthreads, nblocks, kernel_stream](x)

If we were to pass x to a Consumer now, there is a risk that it may operate on it in array_stream whilst the kernel is still running in kernel_stream. To prevent operations in array_stream starting before the kernel launch is finished, we create an event and wait on it:

# Create event
evt = cuda.event()
# Record the event after the kernel launch in kernel_stream
evt.record(kernel_stream)
# Wait for the event in array_stream
evt.wait(array_stream)

It is now safe for other_cai_library to consume x:

other_cai_library.consume(x)
Lifetime management Data

Obtaining the value of the __cuda_array_interface__ property of any object has no effect on the lifetime of the object from which it was created. In particular, note that the interface has no slot for the owner of the data.

The User code must preserve the lifetime of the object owning the data for as long as the Consumer might use it.

Streams

Like data, CUDA streams also have a finite lifetime. It is therefore required that a Producer exporting data on the interface with an associated stream ensures that the exported stream’s lifetime is equal to or surpasses the lifetime of the object from which the interface was exported.

Lifetime management in Numba Producing Arrays

Numba takes no steps to maintain the lifetime of an object from which the interface is exported - it is the user’s responsibility to ensure that the underlying object is kept alive for the duration that the exported interface might be used.

The lifetime of any Numba-managed stream exported on the interface is guaranteed to equal or surpass the lifetime of the underlying object, because the underlying object holds a reference to the stream.

Note

Numba-managed streams are those created with cuda.default_stream(), cuda.legacy_default_stream(), or cuda.per_thread_default_stream(). Streams not managed by Numba are created from an external stream with cuda.external_stream().

Consuming Arrays

Numba provides two mechanisms for creating device arrays from objects exporting the CUDA Array Interface. Which to use depends on whether the created device array should maintain the life of the object from which it is created:

The interfaces of these functions are:

cuda.as_cuda_array(sync=True)

Create a DeviceNDArray from any object that implements the cuda array interface.

A view of the underlying GPU buffer is created. No copying of the data is done. The resulting DeviceNDArray will acquire a reference from obj.

If sync is True, then the imported stream (if present) will be synchronized.

cuda.from_cuda_array_interface(owner=None, sync=True)

Create a DeviceNDArray from a cuda-array-interface description. The owner is the owner of the underlying memory. The resulting DeviceNDArray will acquire a reference from it.

If sync is True, then the imported stream (if present) will be synchronized.

Pointer Attributes

Additional information about the data pointer can be retrieved using cuPointerGetAttribute or cudaPointerGetAttributes. Such information include:

Differences with CUDA Array Interface (Version 0)

Version 0 of the CUDA Array Interface did not have the optional mask attribute to support masked arrays.

Differences with CUDA Array Interface (Version 1)

Versions 0 and 1 of the CUDA Array Interface neither clarified the strides attribute for C-contiguous arrays nor specified the treatment for zero-size arrays.

Differences with CUDA Array Interface (Version 2)

Prior versions of the CUDA Array Interface made no statement about synchronization.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4