Create a device handle.
Creates an instance of a cuDLA device which can be used to submit DLA operations. The application can create the handle in hybrid or standalone mode. In hybrid mode, the current set GPU device is used by this API to decide the association of the created DLA device handle. This function returns cudlaErrorUnsupportedOperation if the current set GPU device is a dGPU as cuDLA is not supported on dGPU presently. cuDLA supports 16 cuDLA device handles per DLA HW instance.
Destroy device handle.
Destroys the instance of the cuDLA device which was created with cudlaCreateDevice. Before destroying the handle, it is important to ensure that all the tasks submitted previously to the device are completed. Failure to do so can lead to application crashes.
In hybrid mode, cuDLA internally performs memory allocations with CUDA using the primary context. As a result, before destroying or resetting a CUDA primary context, it is mandatory that all cuDLA device initializations are destroyed.
Note:This API can return task execution errors from previous DLA task submissions.
Get cuDLA device attributes.
UVA addressing between CUDA and DLA requires special support in the underlying kernel mode drivers. Applications are expected to query the cuDLA runtime to check if the current version of cuDLA supports UVA addressing.
Note:This API can return task execution errors from previous DLA task submissions.
Get device count.
Get number of DLA devices available to use.
Gets the last asynchronous error in task execution.
The DLA tasks execute asynchronously on the DLA HW. As a result, the status of the task execution is not known at the time of task submission. The status of the task executed by the DLA HW most recently for the particular device handle can be queried using this interface.
Note that a return code of cudlaSuccess from this function does not necessarily imply that most recent task executed successfully. Since this function returns immediately, it can only report the status of the tasks at the snapshot of time when it is called. To be guaranteed of task completion, applications must synchronize on the submitted tasks in hybrid or standalone modes and then call this API to check for errors.
Get cuDLA's NvSciSync attributes.
Gets the NvSciSync's attributes in the attribute list created by the application.
cuDLA supports two types of NvSciSync object primitives -
Sync point
Deterministic semaphore cuDLA prioritizes sync point primitive over deterministic semaphore primitive by default and sets these priorities in the NvSciSync attribute list.
For Deterministic semaphore, NvSciSync attribute list used to create the NvSciSync object must have value of NvSciSyncAttrKey_RequireDeterministicFences key set to true.
cuDLA also supports Timestamp feature on NvSciSync objects. Waiter can request for this by setting NvSciSync attribute "NvSciSyncAttrKey_WaiterRequireTimestamps" as true.
In the event of failed NvSci initialization this function would return cudlaErrorUnsupportedOperation. This function can return cudlaErrorNvSci or cudlaErrorInvalidAttribute in certain cases when the underlying NvSci operation fails.
This API updates the input nvSciSyncAttrList with values equivalent to the following public attribute key-values:
NvSciSyncAttrKey_RequiredPerm is set to
NvSciSyncAccessPerm_SignalOnly if value of flag is set to CUDLA_NVSCISYNC_ATTR_WAIT.
NvSciSyncAccessPerm_WaitOnly if value of flag is set to CUDLA_NVSCISYNC_ATTR_SIGNAL.
NvSciSyncAccessPerm_WaitSignal if value of flag is set to CUDLA_NVSCISYNC_ATTR_SIGNAL | CUDLA_NVSCISYNC_ATTR_WAIT.
As NvSciSyncAttrKey_RequiredPerm is internally set by cuDLA, setting this value by the application is disallowed.
Note:Users of cuDLA can only append attributes to output attrList using NvSci API, modifying already populated values of the output attrList can result in undefined behavior.
Returns the version number of the library.
cuDLA is semantically versioned. This function will return the version as 1000000*major + 1000*minor + patch.
Imports external memory into cuDLA.
Imports the allocated external memory by registering it with DLA. After successful registration, the returned pointer can be used in a task submit.
On Tegra, cuDLA supports importing NvSciBuf objects in standalone mode only. In the event of failed NvSci initialization (either due to usage of this API in hybrid mode or an issue in the NvSci library initialization), this function would return cudlaErrorUnsupportedOperation. This function can return cudlaErrorNvSci or cudlaErrorInvalidAttribute in certain cases when the underlying NvSci operation fails.
Note:cuDLA only supports importing NvSciBuf objects of type NvSciBufType_RawBuffer or NvSciBufType_Tensor. Importing NvSciBuf object of any other type can result in an undefined behaviour.
Note:This API can return task execution errors from previous DLA task submissions.
Imports external semaphore into cuDLA.
Imports the allocated external semaphore by registering it with DLA. After successful registration, the returned pointer can be used in a task submission to signal synchronization objects.
On Tegra, cuDLA supports importing NvSciSync objects in standalone mode only. NvSciSync object primitives that cuDLA supports are sync point and deterministic semaphore.
cuDLA also supports Timestamp feature on NvSciSync objects, using which the user can get a snapshot of DLA clock at which a particular fence is signaled. At any point in time there are only 512 valid timestamp buffers that can be associated with fences. For example, If User has created 513 fences from a single NvSciSync object with timestamp enabled then the timestamp buffer associated with 1st fence is same as with 513th fence.
In the event of failed NvSci initialization (either due to usage of this API in hybrid mode or an issue in the NvSci library initialization), this function would return cudlaErrorUnsupportedOperation. This function can return cudlaErrorNvSci or cudlaErrorInvalidAttribute in certain cases when the underlying NvSci operation fails.
Note:This API can return task execution errors from previous DLA task submissions.
Registers the CUDA memory to DLA engine.
As part of registration, a system mapping is created whereby the DLA HW can access the underlying CUDA memory. The resultant mapping is available in devPtr and applications must use this mapping while referring this memory in submit operations.
This function will return cudlaErrorInvalidAddress if the pointer or size to be registered is invalid. In addition, if the input pointer was already registered, then this function will return cudlaErrorMemoryRegistered. Attempting to re-register memory does not cause any irrecoverable error in cuDLA and applications can continue to use cuDLA APIs even after this error has occurred.
Note:This API can return task execution errors from previous DLA task submissions.
Unregisters the input memory from DLA engine.
Get DLA module attributes.
Get module attributes from the loaded module. This API returns cudlaErrorInvalidDevice if the module is not loaded in any device.
Note:This API can return task execution errors from previous DLA task submissions.
Load a DLA module.
Loads the module into the current device handle.
Multiple loadables are not allowed to load onto single cuDLA device handle.
A Loadable can only be loaded once in cuDLA device handle lifecycle.
This API can return task execution errors from previous DLA task submissions.
Unload a DLA module.
Unload the module from the device handle that it was loaded into. This API returns cudlaErrorInvalidDevice if the module is not loaded into a valid device.
Note:This API can return task execution errors from previous DLA task submissions.
Set task timeout in millisecond.
Set task timeout in ms for each device handle. cuDLA sets 30 seconds as default timeout value if user doesn't explicitly set the timeout.
In case , device handle is invalid or timeout is 0 or timeout is greater than 1000 sec, this function would return cudlaErrorInvalidParam otherwise cudlaSuccess.
Note:This API can return task execution errors from previous DLA task submissions.
Submits the inference operation on DLA.
cudlaSuccess, cudlaErrorInvalidParam, cudlaErrorInvalidDevice, cudlaErrorInvalidModule, cudlaErrorCuda, cudlaErrorUmd, cudlaErrorOutOfResources, cudlaErrorInvalidAddress, cudlaErrorUnsupportedOperation, cudlaErrorInvalidAttribute, cudlaErrorNvScicudlaErrorOs
This operation takes in a sequence of tasks and submits them to the DLA HW for execution in the same sequence as they appear in the input task array. The input and output tensors (and statistics buffer if used) are assumed to be pre-registered using cudlaMemRegister (in hybrid mode) or cudlaImportExternalMemory (in standalone mode). Failure to do so can result in this function returning cudlaErrorInvalidAddress.
The stream parameter must be specified as the CUDA stream on which the DLA task is submitted for execution in hybrid mode. In standalone mode, this parameter must be passed as NULL and failure to do so will result in this function returning cudlaErrorInvalidParam.
The cudlaTask structure has a provision to specify wait and signal events that cuDLA must wait on and signal respectively as part of cudlaSubmitTask(). Each submitted task will wait for all its wait events to be signaled before beginning execution and will provide a signal event (if one is requested for during cudlaSubmitTask) that the application (or any other entity) can wait on to ensure that the submitted task has completed execution. In cuDLA 1.0, only NvSciSync fences are supported as part of wait events. Furthermore, only NvSciSync objects (registered as part of cudlaImportExternalSemaphore) can be signaled as part of signal events and the fence corresponding to the signaled event is returned as part of cudlaSubmitTask.
In standalone mode, if inputTensor and outputTensor fields are set to NULL inside the cudlaTask structure, the task submission is interpreted as an enqueue of wait and signal events that must be considered for subsequent task submission. No actual task submission is done. Multiple such subsequent task submissions with NULL fields in the input/outputTensor fields will overwrite the list of wait and signal events to be considered. In other words, the latest non-null wait events and/or latest non-null signal events before a non-null submission are considered for subsequent actual task submission. During an actual task submit in standalone mode, the effective wait events and signal events that will be considered are what the application sets using NULL data submissions and what is set for that particular task submission in the waitEvents and signalEvents fields. The wait events set as part of NULL data submission are considered as dependencies for only the first task and the signal events set as part of NULL data submission are signaled when the last task of task list is complete. All constraints that apply to waitEvents and signalEvents individually (as described below) are also applicable to the combined list.
cuDLA supports 3 kinds of fences - preFence, SOF fence and EOF fence.
preFence is the type of fence that DLA waits on to start the task execution. Use cudlaFenceType as CUDLA_NVSCISYNC_FENCE to mark a fence as preFence.
SOF(Start Of Frame) fence is the type of fence which is signaled before the task execution on DLA starts. Use cudlaFenceType as CUDLA_NVSCISYNC_FENCE_SOF to mark a fence as SOF fence.
EOF(End Of Frame) fence is the type of fence which is signaled after the task execution on DLA is complete. Use cudlaFenceType as CUDLA_NVSCISYNC_FENCE to mark a fence as EOF fence.
For wait events, applications are expected to
register their synchronization objects using cudlaImportExternalSemaphore.
create the required number of preFence placeholders using CudlaFence.
fill in the placeholders with the relevant fences from the application.
list out all the fences in cudlaWaitEvents.
For signal events, applications are expected to
register their synchronization objects using cudlaImportExternalSemaphore.
create the required number of SOF and EOF fence placeholder fences using CudlaFence.
place the registered objects and the corresponding fences in cudlaSignalEvents. In case ofdeterministic semaphore, fence is not required to be passed in cudlaSignalEvents.
When cudlaSubmitTask returns successfully, the fences present in cudlaSignalEvents can be used to wait for the particular task to be completed. cuDLA supports 1 sync point and any number of semaphores as part of cudlaSignalEvents. If more than 1 sync point is specified, cudlaErrorInvalidParam is returned.
cuDLA adheres to DLA's restriction to support 29 preFences and SOF fences combined together and 29 EOF fences per DLA Task.
During submission, users have an option to enable layerwise statistics profiling for the individual layers of the network. This option needs to be exercised by specifying additional output buffers that would contain the profiling information. Specifically,
"cudlaTask::numOutputTensors" should be the sum of value returned by cudlaModuleGetAttributes(...,CUDLA_NUM_OUTPUT_TENSORS,...) and cudlaModuleGetAttributes(...,CUDLA_NUM_OUTPUT_TASK_STATISTICS,...)
"cudlaTask::outputTensor" should contain the array of output tensors appended with array of statistics output buffer.
This function can return cudlaErrorUnsupportedOperation if
stream being used in hybrid mode is in capturing state.
application attempts to use NvSci functionalities in hybrid mode.
loading of NvSci libraries failed for a particular platform.
fence type other than CUDLA_NVSCISYNC_FENCE is specified.
waitEvents or signaEvents is not NULL in hybrid mode.
inputTensor or outputTensor is NULL in hybrid mode and the flags are not CUDLA_SUBMIT_DIAGNOSTICS_TASK.
inputTensor is NULL and outputTensor is not NULL and vice versa in standalone mode and the flags are not CUDLA_SUBMIT_DIAGNOSTICS_TASK.
inputTensor and outputTensor is NULL and number of tasks is not equal to 1 in standalone mode and the flags are not CUDLA_SUBMIT_DIAGNOSTICS_TASK.
inputTensor is not NULL or output tensor is NULL and the flags are CUDLA_SUBMIT_DIAGNOSTICS_TASK.
the effective signal events list has multiple sync points to signal.
if layerwise feature is unsupported.
if preFences, SOF fences and EOF fences limit per task is not met.
This function can return cudlaErrorNvSci or cudlaErrorInvalidAttribute in certain cases when the underlying NvSci operation fails.
This function can return cudlaErrorOs if an internal system operation fails.
Note:This API can return task execution errors from previous DLA task submissions.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4