This section describes the APIs for creation and manipulation of green contexts in the CUDA driver. Green contexts are a lightweight alternative to traditional contexts, with the ability to pass in a set of resources that they should be initialized with. This allows the developer to represent distinct spatial partitions of the GPU, provision resources for them, and target them via the same programming model that CUDA exposes (streams, kernel launches, etc.).
There are 4 main steps to using these new set of APIs.
(1) Start with an initial set of resources, for example via cuDeviceGetDevResource. Only SM type is supported today.
(2) Partition this set of resources by providing them as input to a partition API, for example: cuDevSmResourceSplitByCount.
(3) Finalize the specification of resources by creating a descriptor via cuDevResourceGenerateDesc.
(4) Provision the resources and create a green context via cuGreenCtxCreate.
For CU_DEV_RESOURCE_TYPE_SM, the partitions created have minimum SM count requirements, often rounding up and aligning the minCount provided to cuDevSmResourceSplitByCount. These requirements can be queried with cuDeviceGetDevResource from step (1) above to determine the minimum partition size (sm.minSmPartitionSize) and alignment granularity (sm.smCoscheduledAlignment).
While it's recommended to use cuDeviceGetDevResource for accurate information, here is a guideline for each compute architecture:
On Compute Architecture 6.X: The minimum count is 2 SMs and must be a multiple of 2.
On Compute Architecture 7.X: The minimum count is 2 SMs and must be a multiple of 2.
On Compute Architecture 8.X: The minimum count is 4 SMs and must be a multiple of 2.
On Compute Architecture 9.0+: The minimum count is 8 SMs and must be a multiple of 8.
In the future, flags can be provided to tradeoff functional and performance characteristics versus finer grained SM partitions.
Even if the green contexts have disjoint SM partitions, it is not guaranteed that the kernels launched in them will run concurrently or have forward progress guarantees. This is due to other resources (like HW connections, see CUDA_DEVICE_MAX_CONNECTIONS) that could cause a dependency. Additionally, in certain scenarios, it is possible for the workload to run on more SMs than was provisioned (but never less). The following are two scenarios which can exhibit this behavior:
On Volta+ MPS: When CUDA_MPS_ACTIVE_THREAD_PERCENTAGE is used, the set of SMs that are used for running kernels can be scaled up to the value of SMs used for the MPS client.
On Compute Architecture 9.x: When a module with dynamic parallelism (CDP) is loaded, all future kernels running under green contexts may use and share an additional set of 2 SMs.
An opaque descriptor handle. The descriptor encapsulates multiple created and configured resources. Created via cuDevResourceGenerateDesc
Type of resource
Converts a green context into the primary context.
The API converts a green context into the primary context returned in pContext. It is important to note that the converted context pContext is a normal primary context but with the resources of the specified green context hCtx. Once converted, it can then be used to set the context current with cuCtxSetCurrent or with any of the CUDA APIs that accept a CUcontext parameter.
Users are expected to call this API before calling any CUDA APIs that accept a CUcontext. Failing to do so will result in the APIs returning CUDA_ERROR_INVALID_CONTEXT.
See also:
Get context resources.
Get the type resources available to the context represented by hCtx Note: The API is not supported on 32-bit platforms.
See also:
Generate a resource descriptor.
Generates a single resource descriptor with the set of resources specified in resources. The generated resource descriptor is necessary for the creation of green contexts via the cuGreenCtxCreate API. Resources of the same type can be passed in, provided they meet the requirements as noted below.
A successful API call must have:
A valid output pointer for the phDesc descriptor as well as a valid array of resources pointers, with the array size passed in nbResources. If multiple resources are provided in resources, the device they came from must be the same, otherwise CUDA_ERROR_INVALID_RESOURCE_CONFIGURATION is returned. If multiple resources are provided in resources and they are of type CU_DEV_RESOURCE_TYPE_SM, they must be outputs (whether result or remaining) from the same split API instance, otherwise CUDA_ERROR_INVALID_RESOURCE_CONFIGURATION is returned.
Note: The API is not supported on 32-bit platforms.
See also:
Splits CU_DEV_RESOURCE_TYPE_SM resources.
Splits CU_DEV_RESOURCE_TYPE_SM resources into nbGroups, adhering to the minimum SM count specified in minCount and the usage flags in useFlags. If result is NULL, the API simulates a split and provides the amount of groups that would be created in nbGroups. Otherwise, nbGroups must point to the amount of elements in result and on return, the API will overwrite nbGroups with the amount actually created. The groups are written to the array in result. nbGroups can be less than the total amount if a smaller number of groups is needed.
This API is used to spatially partition the input resource. The input resource needs to come from one of cuDeviceGetDevResource, cuCtxGetDevResource, or cuGreenCtxGetDevResource. A limitation of the API is that the output results cannot be split again without first creating a descriptor and a green context with that descriptor.
When creating the groups, the API will take into account the performance and functional characteristics of the input resource, and guarantee a split that will create a disjoint set of symmetrical partitions. This may lead to fewer groups created than purely dividing the total SM count by the minCount due to cluster requirements or alignment and granularity requirements for the minCount. These requirements can be queried with cuDeviceGetDevResource, cuCtxGetDevResource, and cuGreenCtxGetDevResource for CU_DEV_RESOURCE_TYPE_SM, using the minSmPartitionSize and smCoscheduledAlignment fields to determine minimum partition size and alignment granularity, respectively.
The remainder set does not have the same functional or performance guarantees as the groups in result. Its use should be carefully planned and future partitions of the remainder set are discouraged.
The following flags are supported:
CU_DEV_SM_RESOURCE_SPLIT_IGNORE_SM_COSCHEDULING : Lower the minimum SM count and alignment, and treat each SM independent of its hierarchy. This allows more fine grained partitions but at the cost of advanced features (such as large clusters on compute capability 9.0+).
CU_DEV_SM_RESOURCE_SPLIT_MAX_POTENTIAL_CLUSTER_SIZE : Compute Capability 9.0+ only. Attempt to create groups that may allow for maximally sized thread clusters. This can be queried post green context creation using cuOccupancyMaxPotentialClusterSize.
A successful API call must either have:
A valid array of result pointers of size passed in nbGroups, with input of type CU_DEV_RESOURCE_TYPE_SM. Value of minCount must be between 0 and the SM count specified in input. remaining may be NULL.
NULL passed in for result, with a valid integer pointer in nbGroups and input of type CU_DEV_RESOURCE_TYPE_SM. Value of minCount must be between 0 and the SM count specified in input. remaining may be NULL. This queries the number of groups that would be created by the API.
Note: The API is not supported on 32-bit platforms.
See also:
cuGreenCtxGetDevResource, cuCtxGetDevResource, cuDeviceGetDevResource
Get device resources.
Get the type resources available to the device. This may often be the starting point for further partitioning or configuring of resources.
Note: The API is not supported on 32-bit platforms.
See also:
Creates a green context with a specified set of resources.
This API creates a green context with the resources specified in the descriptor desc and returns it in the handle represented by phCtx. This API will retain the primary context on device dev, which will is released when the green context is destroyed. It is advised to have the primary context active before calling this API to avoid the heavy cost of triggering primary context initialization and deinitialization multiple times.
The API does not set the green context current. In order to set it current, you need to explicitly set it current by first converting the green context to a CUcontext using cuCtxFromGreenCtx and subsequently calling cuCtxSetCurrent / cuCtxPushCurrent. It should be noted that a green context can be current to only one thread at a time. There is no internal synchronization to make API calls accessing the same green context from multiple threads work.
Note: The API is not supported on 32-bit platforms.
The supported flags are:
CU_GREEN_CTX_DEFAULT_STREAM : Creates a default stream to use inside the green context. Required.
See also:
cuGreenCtxDestroy, cuCtxFromGreenCtx, cuCtxSetCurrent, cuCtxPushCurrent, cuDevResourceGenerateDesc, cuDevicePrimaryCtxRetain, cuCtxCreate
Destroys a green context.
Get green context resources.
Returns the unique Id associated with the green context supplied.
Returns in greenCtxId the unique Id which is associated with a given green context. The Id is unique for the life of the program for this instance of CUDA. If green context is supplied as NULL and the current context is set to a green context, the Id of the current green context is returned.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
Records an event.
Create a stream for use in the green context.
Creates a stream for use in the specified green context greenCtx and returns a handle in phStream. The stream can be destroyed by calling cuStreamDestroy(). Note that the API ignores the context that is current to the calling thread and creates a stream in the specified green context greenCtx.
The supported values for flags are:
CU_STREAM_NON_BLOCKING: This must be specified. It indicates that work running in the created stream may run concurrently with work in the default stream, and that the created stream should perform no implicit synchronization with the default stream.
Specifying priority affects the scheduling priority of work in the stream. Priorities provide a hint to preferentially run work with higher priority when possible, but do not preempt already-running work or provide any other functional guarantee on execution order. priority follows a convention where lower numbers represent higher priorities. '0' represents default priority. The range of meaningful numerical priorities can be queried using cuCtxGetStreamPriorityRange. If the specified priority is outside the numerical range returned by cuCtxGetStreamPriorityRange, it will automatically be clamped to the lowest or the highest number in the range.
Note:Note that this function may also return error codes from previous, asynchronous launches.
In the current implementation, only compute kernels launched in priority streams are affected by the stream's priority. Stream priorities have no effect on host-to-device and device-to-host memory operations.
See also:
cuStreamDestroy, cuGreenCtxCreatecuStreamCreate, cuStreamGetPriority, cuCtxGetStreamPriorityRange, cuStreamGetFlags, cuStreamGetDevice, cuStreamWaitEvent, cuStreamQuery, cuStreamSynchronize, cuStreamAddCallback, cudaStreamCreateWithPriority
Make a green context wait on an event.
Query the green context associated with a stream.
Returns the CUDA green context that the stream is associated with, or NULL if the stream is not associated with any green context.
The stream handle hStream can refer to any of the following:
a stream created via any of the CUDA driver APIs such as cuStreamCreate, cuStreamCreateWithPriority and cuGreenCtxStreamCreate, or their runtime API equivalents such as cudaStreamCreate, cudaStreamCreateWithFlags and cudaStreamCreateWithPriority. If during stream creation the context that was active in the calling thread was obtained with cuCtxFromGreenCtx, that green context is returned in phCtx. Otherwise, *phCtx is set to NULL instead.
special stream such as the NULL stream or CU_STREAM_LEGACY. In that case if context that is active in the calling thread was obtained with cuCtxFromGreenCtx, that green context is returned. Otherwise, *phCtx is set to NULL instead.
Passing an invalid handle will result in undefined behavior.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cuStreamDestroy, cuStreamCreate, cuStreamCreateWithPriority, cuStreamGetCtx, cuGreenCtxStreamCreate, cuStreamGetPriority, cuStreamGetFlags, cuStreamGetDevice, cuStreamWaitEvent, cuStreamQuery, cuStreamSynchronize, cuStreamAddCallback, cudaStreamCreate, cudaStreamCreateWithFlags
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4