A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__OCCUPANCY.html below:

CUDA Driver API :: CUDA Toolkit Documentation

CUresult cuOccupancyAvailableDynamicSMemPerBlock ( size_t* dynamicSmemSize, CUfunction func, int  numBlocks, int  blockSize )

Returns dynamic shared memory available per block when launching numBlocks blocks on SM.

dynamicSmemSize
- Returned maximum dynamic shared memory
func
- Kernel function for which occupancy is calculated
numBlocks
- Number of blocks to fit on SM
blockSize
- Size of the blocks

Returns in *dynamicSmemSize the maximum size of dynamic shared memory to allow numBlocks blocks per SM.

Note that the API can also be used with context-less kernel CUkernel by querying the handle using cuLibraryGetKernel() and then passing it to the API by casting to CUfunction. Here, the context to use for calculations will be the current context.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

CUresult cuOccupancyMaxActiveBlocksPerMultiprocessor ( int* numBlocks, CUfunction func, int  blockSize, size_t dynamicSMemSize )

Returns occupancy of a function.

numBlocks
- Returned occupancy
func
- Kernel for which occupancy is calculated
blockSize
- Block size the kernel is intended to be launched with
dynamicSMemSize
- Per-block dynamic shared memory usage intended, in bytes

Returns in *numBlocks the number of the maximum active blocks per streaming multiprocessor.

Note that the API can also be used with context-less kernel CUkernel by querying the handle using cuLibraryGetKernel() and then passing it to the API by casting to CUfunction. Here, the context to use for calculations will be the current context.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cudaOccupancyMaxActiveBlocksPerMultiprocessor

CUresult cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags ( int* numBlocks, CUfunction func, int  blockSize, size_t dynamicSMemSize, unsigned int  flags )

Returns occupancy of a function.

numBlocks
- Returned occupancy
func
- Kernel for which occupancy is calculated
blockSize
- Block size the kernel is intended to be launched with
dynamicSMemSize
- Per-block dynamic shared memory usage intended, in bytes
flags
- Requested behavior for the occupancy calculator

Returns in *numBlocks the number of the maximum active blocks per streaming multiprocessor.

The Flags parameter controls how special cases are handled. The valid flags are:

Note that the API can also be with launch context-less kernel CUkernel by querying the handle using cuLibraryGetKernel() and then passing it to the API by casting to CUfunction. Here, the context to use for calculations will be the current context.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags

CUresult cuOccupancyMaxActiveClusters ( int* numClusters, CUfunction func, const CUlaunchConfig* config )

Given the kernel function (func) and launch configuration (config), return the maximum number of clusters that could co-exist on the target device in *numClusters.

numClusters
- Returned maximum number of clusters that could co-exist on the target device
func
- Kernel function for which maximum number of clusters are calculated
config
- Launch configuration for the given kernel function

If the function has required cluster size already set (see cudaFuncGetAttributes / cuFuncGetAttribute), the cluster size from config must either be unspecified or match the required size. Without required sizes, the cluster size must be specified in config, else the function will return an error.

Note that various attributes of the kernel function may affect occupancy calculation. Runtime environment may affect how the hardware schedules the clusters, so the calculated occupancy is not guaranteed to be achievable.

Note that the API can also be used with context-less kernel CUkernel by querying the handle using cuLibraryGetKernel() and then passing it to the API by casting to CUfunction. Here, the context to use for calculations will either be taken from the specified stream config->hStream or the current context in case of NULL stream.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cudaFuncGetAttributes, cuFuncGetAttribute

CUresult cuOccupancyMaxPotentialBlockSize ( int* minGridSize, int* blockSize, CUfunction func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int  blockSizeLimit )

Suggest a launch configuration with reasonable occupancy.

minGridSize
- Returned minimum grid size needed to achieve the maximum occupancy
blockSize
- Returned maximum block size that can achieve the maximum occupancy
func
- Kernel for which launch configuration is calculated
blockSizeToDynamicSMemSize
- A function that calculates how much per-block dynamic shared memory func uses based on the block size
dynamicSMemSize
- Dynamic shared memory usage intended, in bytes
blockSizeLimit
- The maximum block size func is designed to handle

Returns in *blockSize a reasonable block size that can achieve the maximum occupancy (or, the maximum number of active warps with the fewest blocks per multiprocessor), and in *minGridSize the minimum grid size to achieve the maximum occupancy.

If blockSizeLimit is 0, the configurator will use the maximum block size permitted by the device / function instead.

If per-block dynamic shared memory allocation is not needed, the user should leave both blockSizeToDynamicSMemSize and dynamicSMemSize as 0.

If per-block dynamic shared memory allocation is needed, then if the dynamic shared memory size is constant regardless of block size, the size should be passed through dynamicSMemSize, and blockSizeToDynamicSMemSize should be NULL.

Otherwise, if the per-block dynamic shared memory size varies with different block sizes, the user needs to provide a unary function through blockSizeToDynamicSMemSize that computes the dynamic shared memory needed by func for any given block size. dynamicSMemSize is ignored. An example signature is:

‎    // Take block size, returns dynamic shared memory needed
          size_t blockToSmem(int blockSize);

Note that the API can also be used with context-less kernel CUkernel by querying the handle using cuLibraryGetKernel() and then passing it to the API by casting to CUfunction. Here, the context to use for calculations will be the current context.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cudaOccupancyMaxPotentialBlockSize

CUresult cuOccupancyMaxPotentialBlockSizeWithFlags ( int* minGridSize, int* blockSize, CUfunction func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int  blockSizeLimit, unsigned int  flags )

Suggest a launch configuration with reasonable occupancy.

minGridSize
- Returned minimum grid size needed to achieve the maximum occupancy
blockSize
- Returned maximum block size that can achieve the maximum occupancy
func
- Kernel for which launch configuration is calculated
blockSizeToDynamicSMemSize
- A function that calculates how much per-block dynamic shared memory func uses based on the block size
dynamicSMemSize
- Dynamic shared memory usage intended, in bytes
blockSizeLimit
- The maximum block size func is designed to handle
flags
- Options
CUresult cuOccupancyMaxPotentialClusterSize ( int* clusterSize, CUfunction func, const CUlaunchConfig* config )

Given the kernel function (func) and launch configuration (config), return the maximum cluster size in *clusterSize.

clusterSize
- Returned maximum cluster size that can be launched for the given kernel function and launch configuration
func
- Kernel function for which maximum cluster size is calculated
config
- Launch configuration for the given kernel function

The cluster dimensions in config are ignored. If func has a required cluster size set (see cudaFuncGetAttributes / cuFuncGetAttribute),*clusterSize will reflect the required cluster size.

By default this function will always return a value that's portable on future hardware. A higher value may be returned if the kernel function allows non-portable cluster sizes.

This function will respect the compile time launch bounds.

Note that the API can also be used with context-less kernel CUkernel by querying the handle using cuLibraryGetKernel() and then passing it to the API by casting to CUfunction. Here, the context to use for calculations will either be taken from the specified stream config->hStream or the current context in case of NULL stream.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cudaFuncGetAttributes, cuFuncGetAttribute


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4