A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html below:

CUDA Runtime API :: CUDA Toolkit Documentation

__host__ ​cudaError_t cudaChooseDevice ( int* device, const cudaDeviceProp* prop )

Select compute-device which best matches criteria.

device
- Device with best match
prop
- Desired device properties
__host__ ​cudaError_t cudaDeviceFlushGPUDirectRDMAWrites ( cudaFlushGPUDirectRDMAWritesTarget target, cudaFlushGPUDirectRDMAWritesScope scope )

Blocks until remote writes are visible to the specified scope.

target
- The target of the operation, see cudaFlushGPUDirectRDMAWritesTarget
scope
- The scope of the operation, see cudaFlushGPUDirectRDMAWritesScope
__host__ ​ __device__ ​cudaError_t cudaDeviceGetAttribute ( int* value, cudaDeviceAttr attr, int  device )

Returns information about the device.

value
- Returned device attribute value
attr
- Device attribute to query
device
- Device number to query
__host__ ​cudaError_t cudaDeviceGetByPCIBusId ( int* device, const char* pciBusId )

Returns a handle to a compute device.

device
- Returned device ordinal
pciBusId
- String in one of the following forms: [domain]:[bus]:[device].[function] [domain]:[bus]:[device] [bus]:[device].[function] where domain, bus, device, and function are all hexadecimal values
__host__ ​ __device__ ​cudaError_t cudaDeviceGetCacheConfig ( cudaFuncCache ** pCacheConfig )

Returns the preferred cache configuration for the current device.

pCacheConfig
- Returned cache configuration
__host__ ​cudaError_t cudaDeviceGetDefaultMemPool ( cudaMemPool_t* memPool, int  device )

Returns the default mempool of a device.

__host__ ​cudaError_t cudaDeviceGetHostAtomicCapabilities ( unsigned int* capabilities, const cudaAtomicOperation ** operations, unsigned int  count, int  device )

Queries details about atomic operations supported between the device and host.

capabilities
- Returned capability details of each requested operation
operations
- Requested operations
count
- Count of requested operations and size of capabilities
device

Returns in *capabilities the details about requested atomic *operations over the the link between dev and the host. The allocated size of *operations and *capabilities must be count.

For each cudaAtomicOperation in *operations, the corresponding result in *capabilities will be a bitmask indicating which of cudaAtomicOperationCapability the link supports natively.

Returns cudaErrorInvalidDevice if dev is not valid.

Returns cudaErrorInvalidValue if *capabilities or *operations is NULL, if count is 0, or if any of *operations is not valid.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cudaDeviceGetAttribute, cudaDeviceGetP2PAtomicCapabilities, cuDeviceGeHostAtomicCapabilities

__host__ ​ __device__ ​cudaError_t cudaDeviceGetLimit ( size_t* pValue, cudaLimit limit )

Return resource limits.

pValue
- Returned size of the limit
limit
- Limit to query
__host__ ​cudaError_t cudaDeviceGetMemPool ( cudaMemPool_t* memPool, int  device )

Gets the current mempool for a device.

__host__ ​cudaError_t cudaDeviceGetNvSciSyncAttributes ( void* nvSciSyncAttrList, int  device, int  flags )

Return NvSciSync attributes that this device can support.

nvSciSyncAttrList
- Return NvSciSync attributes supported.
device
- Valid Cuda Device to get NvSciSync attributes for.
flags
- flags describing NvSciSync usage.

Returns in nvSciSyncAttrList, the properties of NvSciSync that this CUDA device, dev can support. The returned nvSciSyncAttrList can be used to create an NvSciSync that matches this device's capabilities.

If NvSciSyncAttrKey_RequiredPerm field in nvSciSyncAttrList is already set this API will return cudaErrorInvalidValue.

The applications should set nvSciSyncAttrList to a valid NvSciSyncAttrList failing which this API will return cudaErrorInvalidHandle.

The flags controls how applications intends to use the NvSciSync created from the nvSciSyncAttrList. The valid flags are:

At least one of these flags must be set, failing which the API returns cudaErrorInvalidValue. Both the flags are orthogonal to one another: a developer may set both these flags that allows to set both wait and signal specific attributes in the same nvSciSyncAttrList.

Note that this API updates the input nvSciSyncAttrList with values equivalent to the following public attribute key-values: NvSciSyncAttrKey_RequiredPerm is set to

cudaSuccess, cudaErrorDeviceUninitialized, cudaErrorInvalidValue, cudaErrorInvalidHandle, cudaErrorInvalidDevice, cudaErrorNotSupported, cudaErrorMemoryAllocation

See also:

cudaImportExternalSemaphore, cudaDestroyExternalSemaphore, cudaSignalExternalSemaphoresAsync, cudaWaitExternalSemaphoresAsync

__host__ ​cudaError_t cudaDeviceGetP2PAtomicCapabilities ( unsigned int* capabilities, const cudaAtomicOperation ** operations, unsigned int  count, int  srcDevice, int  dstDevice )

Queries details about atomic operations supported between two devices.

capabilities
- Returned capability details of each requested operation
operations
- Requested operations
count
- Count of requested operations and size of capabilities
srcDevice
- The source device of the target link
dstDevice
- The destination device of the target link
__host__ ​cudaError_t cudaDeviceGetP2PAttribute ( int* value, cudaDeviceP2PAttr attr, int  srcDevice, int  dstDevice )

Queries attributes of the link between two devices.

value
- Returned value of the requested attribute
attr
srcDevice
- The source device of the target link.
dstDevice
- The destination device of the target link.
__host__ ​cudaError_t cudaDeviceGetPCIBusId ( char* pciBusId, int  len, int  device )

Returns a PCI Bus Id string for the device.

pciBusId
- Returned identifier string for the device in the following format [domain]:[bus]:[device].[function] where domain, bus, device, and function are all hexadecimal values. pciBusId should be large enough to store 13 characters including the NULL-terminator.
len
- Maximum length of string to store in name
device
- Device to get identifier string for

Returns an ASCII string identifying the device dev in the NULL-terminated string pointed to by pciBusId. len specifies the maximum length of the string that may be returned.

See also:

cudaDeviceGetByPCIBusId, cuDeviceGetPCIBusId

__host__ ​cudaError_t cudaDeviceGetStreamPriorityRange ( int* leastPriority, int* greatestPriority )

Returns numerical values that correspond to the least and greatest stream priorities.

leastPriority
- Pointer to an int in which the numerical value for least stream priority is returned
greatestPriority
- Pointer to an int in which the numerical value for greatest stream priority is returned

Returns in *leastPriority and *greatestPriority the numerical values that correspond to the least and greatest stream priorities respectively. Stream priorities follow a convention where lower numbers imply greater priorities. The range of meaningful stream priorities is given by [*greatestPriority, *leastPriority]. If the user attempts to create a stream with a priority value that is outside the the meaningful range as specified by this API, the priority is automatically clamped down or up to either *leastPriority or *greatestPriority respectively. See cudaStreamCreateWithPriority for details on creating a priority stream. A NULL may be passed in for *leastPriority or *greatestPriority if the value is not desired.

This function will return '0' in both *leastPriority and *greatestPriority if the current context's device does not support stream priorities (see cudaDeviceGetAttribute).

See also:

cudaStreamCreateWithPriority, cudaStreamGetPriority, cuCtxGetStreamPriorityRange

__host__ ​cudaError_t cudaDeviceGetTexture1DLinearMaxWidth ( size_t* maxWidthInElements, const cudaChannelFormatDesc* fmtDesc, int  device )

Returns the maximum number of elements allocatable in a 1D linear texture for a given element size.

maxWidthInElements
- Returns maximum number of texture elements allocatable for given fmtDesc.
fmtDesc
- Texture format description.
device

Returns in maxWidthInElements the maximum number of elements allocatable in a 1D linear texture for given format descriptor fmtDesc.

See also:

cuDeviceGetTexture1DLinearMaxWidth

__host__ ​cudaError_t cudaDeviceRegisterAsyncNotification ( int  device, cudaAsyncCallback callbackFunc, void* userData, cudaAsyncCallbackHandle_t* callback )

Registers a callback function to receive async notifications.

device
- The device on which to register the callback
callbackFunc
- The function to register as a callback
userData
- A generic pointer to user data. This is passed into the callback function.
callback
- A handle representing the registered callback instance

Registers callbackFunc to receive async notifications.

The userData parameter is passed to the callback function at async notification time. Likewise, callback is also passed to the callback function to distinguish between multiple registered callbacks.

The callback function being registered should be designed to return quickly (~10ms). Any long running tasks should be queued for execution on an application thread.

Callbacks may not call cudaDeviceRegisterAsyncNotification or cudaDeviceUnregisterAsyncNotification. Doing so will result in cudaErrorNotPermitted. Async notification callbacks execute in an undefined order and may be serialized.

Returns in *callback a handle representing the registered callback instance.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cudaDeviceUnregisterAsyncNotification

__host__ ​cudaError_t cudaDeviceReset ( void )

Destroy all allocations and reset all state on the current device in the current process.

Explicitly destroys and cleans up all resources associated with the current device in the current process. It is the caller's responsibility to ensure that the resources are not accessed or passed in subsequent API calls and doing so will result in undefined behavior. These resources include CUDA types cudaStream_t, cudaEvent_t, cudaArray_t, cudaMipmappedArray_t, cudaPitchedPtr, cudaTextureObject_t, cudaSurfaceObject_t, textureReference, surfaceReference, cudaExternalMemory_t, cudaExternalSemaphore_t and cudaGraphicsResource_t. These resources also include memory allocations by cudaMalloc, cudaMallocHost, cudaMallocManaged and cudaMallocPitch. Any subsequent API call to this device will reinitialize the device.

Note that this function will reset the device immediately. It is the caller's responsibility to ensure that the device is not being accessed by any other host threads from the process when this function is called.

See also:

cudaDeviceSynchronize

__host__ ​cudaError_t cudaDeviceSetCacheConfig ( cudaFuncCache cacheConfig )

Sets the preferred cache configuration for the current device.

cacheConfig
- Requested cache configuration
__host__ ​cudaError_t cudaDeviceSetLimit ( cudaLimit limit, size_t value )

Set resource limits.

limit
- Limit to set
value
- Size of limit

Setting limit to value is a request by the application to update the current limit maintained by the device. The driver is free to modify the requested value to meet h/w requirements (this could be clamping to minimum or maximum values, rounding up to nearest element size, etc). The application can use cudaDeviceGetLimit() to find out exactly what the limit has been set to.

Setting each cudaLimit has its own specific restrictions, so each is discussed here.

See also:

cudaDeviceGetLimit, cuCtxSetLimit

__host__ ​cudaError_t cudaDeviceSetMemPool ( int  device, cudaMemPool_t memPool )

Sets the current memory pool of a device.

The memory pool must be local to the specified device. Unless a mempool is specified in the cudaMallocAsync call, cudaMallocAsync allocates from the current mempool of the provided stream's device. By default, a device's current memory pool is its default memory pool.

Note:

Use cudaMallocFromPoolAsync to specify asynchronous allocations from a device different than the one the stream runs on.

Note:

See also:

cuDeviceSetMemPool, cudaDeviceGetMemPool, cudaDeviceGetDefaultMemPool, cudaMemPoolCreate, cudaMemPoolDestroy, cudaMallocFromPoolAsync

__host__ ​ __device__ ​cudaError_t cudaDeviceSynchronize ( void )

Wait for compute device to finish.

Blocks until the device has completed all preceding requested tasks. cudaDeviceSynchronize() returns an error if one of the preceding tasks has failed. If the cudaDeviceScheduleBlockingSync flag was set for this device, the host thread will block until the device has finished its work.

Note:

See also:

cudaDeviceReset, cuCtxSynchronize

__host__ ​cudaError_t cudaDeviceUnregisterAsyncNotification ( int  device, cudaAsyncCallbackHandle_t callback )

Unregisters an async notification callback.

device
- The device from which to remove callback.
callback
- The callback instance to unregister from receiving async notifications.

Unregisters callback so that the corresponding callback function will stop receiving async notifications.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cudaDeviceRegisterAsyncNotification

__host__ ​ __device__ ​cudaError_t cudaGetDevice ( int* device )

Returns which device is currently being used.

device
- Returns the device on which the active host thread executes the device code.
__host__ ​ __device__ ​cudaError_t cudaGetDeviceCount ( int* count )

Returns the number of compute-capable devices.

count
- Returns the number of devices with compute capability greater or equal to 2.0
__host__ ​cudaError_t cudaGetDeviceFlags ( unsigned int* flags )

Gets the flags for the current device.

flags
- Pointer to store the device flags

Returns in flags the flags for the current device. If there is a current device for the calling thread, the flags for the device are returned. If there is no current device, the flags for the first device are returned, which may be the default flags. Compare to the behavior of cudaSetDeviceFlags.

Typically, the flags returned should match the behavior that will be seen if the calling thread uses a device after this call, without any change to the flags or current device inbetween by this or another thread. Note that if the device is not initialized, it is possible for another thread to change the flags for the current device before it is initialized. Additionally, when using exclusive mode, if this thread has not requested a specific device, it may use a device other than the first device, contrary to the assumption made by this function.

If a context has been created via the driver API and is current to the calling thread, the flags for that context are always returned.

Flags returned by this function may specifically include cudaDeviceMapHost even though it is not accepted by cudaSetDeviceFlags because it is implicit in runtime API flags. The reason for this is that the current context may have been created via the driver API in which case the flag is not implicit and may be unset.

See also:

cudaGetDevice, cudaGetDeviceProperties, cudaSetDevice, cudaSetDeviceFlags, cudaInitDevice, cuCtxGetFlags, cuDevicePrimaryCtxGetState

__host__ ​cudaError_t cudaGetDeviceProperties ( cudaDeviceProp* prop, int  device )

Returns information about the compute-device.

prop
- Properties for the specified device
device
- Device number to get properties for
__host__ ​cudaError_t cudaInitDevice ( int  device, unsigned int  deviceFlags, unsigned int  flags )

Initialize device to be used for GPU executions.

device
- Device on which the runtime will initialize itself.
deviceFlags
- Parameters for device operation.
flags
- Flags for controlling the device initialization.
__host__ ​cudaError_t cudaIpcCloseMemHandle ( void* devPtr )

Attempts to close memory mapped with cudaIpcOpenMemHandle.

__host__ ​cudaError_t cudaIpcGetEventHandle ( cudaIpcEventHandle_t* handle, cudaEvent_t event )

Gets an interprocess handle for a previously allocated event.

Takes as input a previously allocated event. This event must have been created with the cudaEventInterprocess and cudaEventDisableTiming flags set. This opaque handle may be copied into other processes and opened with cudaIpcOpenEventHandle to allow efficient hardware synchronization between GPU work in different processes.

After the event has been been opened in the importing process, cudaEventRecord, cudaEventSynchronize, cudaStreamWaitEvent and cudaEventQuery may be used in either process. Performing operations on the imported event after the exported event has been freed with cudaEventDestroy will result in undefined behavior.

IPC functionality is restricted to devices with support for unified addressing on Linux and Windows operating systems. IPC functionality on Windows is supported for compatibility purposes but not recommended as it comes with performance cost. Users can test their device for IPC functionality by calling cudaDeviceGetAttribute with cudaDevAttrIpcEventSupport

See also:

cudaEventCreate, cudaEventDestroy, cudaEventSynchronize, cudaEventQuery, cudaStreamWaitEvent, cudaIpcOpenEventHandle, cudaIpcGetMemHandle, cudaIpcOpenMemHandle, cudaIpcCloseMemHandle, cuIpcGetEventHandle

__host__ ​cudaError_t cudaIpcGetMemHandle ( cudaIpcMemHandle_t* handle, void* devPtr )

Gets an interprocess memory handle for an existing device memory allocation.

handle
- Pointer to user allocated cudaIpcMemHandle to return the handle in.
devPtr
- Base pointer to previously allocated device memory
__host__ ​cudaError_t cudaIpcOpenEventHandle ( cudaEvent_t* event, cudaIpcEventHandle_t handle )

Opens an interprocess event handle for use in the current process.

event
- Returns the imported event
handle
- Interprocess handle to open
__host__ ​cudaError_t cudaIpcOpenMemHandle ( void** devPtr, cudaIpcMemHandle_t handle, unsigned int  flags )

Opens an interprocess memory handle exported from another process and returns a device pointer usable in the local process.

devPtr
- Returned device pointer
handle
- cudaIpcMemHandle to open
flags
- Flags for this operation. Must be specified as cudaIpcMemLazyEnablePeerAccess
__host__ ​cudaError_t cudaSetDevice ( int  device )

Set device to be used for GPU executions.

device
- Device on which the active host thread should execute the device code.

Sets device as the current device for the calling host thread. Valid device id's are 0 to (cudaGetDeviceCount() - 1).

Any device memory subsequently allocated from this host thread using cudaMalloc(), cudaMallocPitch() or cudaMallocArray() will be physically resident on device. Any host memory allocated from this host thread using cudaMallocHost() or cudaHostAlloc() or cudaHostRegister() will have its lifetime associated with device. Any streams or events created from this host thread will be associated with device. Any kernels launched from this host thread using the <<<>>> operator or cudaLaunchKernel() will be executed on device.

This call may be made from any host thread, to any device, and at any time. This function will do no synchronization with the previous or new device, and should only take significant time when it initializes the runtime's context state. This call will bind the primary context of the specified device to the calling thread and all the subsequent memory allocations, stream and event creations, and kernel launches will be associated with the primary context. This function will also immediately initialize the runtime state on the primary context, and the context will be current on device immediately. This function will return an error if the device is in cudaComputeModeExclusiveProcess and is occupied by another process or if the device is in cudaComputeModeProhibited.

It is not required to call cudaInitDevice before using this function.

See also:

cudaGetDeviceCount, cudaGetDevice, cudaGetDeviceProperties, cudaChooseDevice, cudaInitDevice, cuCtxSetCurrent

__host__ ​cudaError_t cudaSetDeviceFlags ( unsigned int  flags )

Sets flags to be used for device executions.

flags
- Parameters for device operation

Records flags as the flags for the current device. If the current device has been set and that device has already been initialized, the previous flags are overwritten. If the current device has not been initialized, it is initialized with the provided flags. If no device has been made current to the calling thread, a default device is selected and initialized with the provided flags.

The three LSBs of the flags parameter can be used to control how the CPU thread interacts with the OS scheduler when waiting for results from the device.

See also:

cudaGetDeviceFlags, cudaGetDeviceCount, cudaGetDevice, cudaGetDeviceProperties, cudaSetDevice, cudaSetValidDevices, cudaInitDevice, cudaChooseDevice, cuDevicePrimaryCtxSetFlags

__host__ ​cudaError_t cudaSetValidDevices ( int* device_arr, int  len )

Set a list of devices that can be used for CUDA.

device_arr
- List of devices to try
len
- Number of devices in specified list

Sets a list of devices for CUDA execution in priority order using device_arr. The parameter len specifies the number of elements in the list. CUDA will try devices from the list sequentially until it finds one that works. If this function is not called, or if it is called with a len of 0, then CUDA will go back to its default behavior of trying devices sequentially from a default list containing all of the available CUDA devices in the system. If a specified device ID in the list does not exist, this function will return cudaErrorInvalidDevice. If len is not 0 and device_arr is NULL or if len exceeds the number of devices in the system, then cudaErrorInvalidValue is returned.

See also:

cudaGetDeviceCount, cudaSetDevice, cudaGetDeviceProperties, cudaSetDeviceFlags, cudaChooseDevice


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4