RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://docs.nvidia.com/cuda/cudla-api/index.html below:

cuDLA API :: CUDA Toolkit Documentation

cudlaStatus cudlaCreateDevice ( const uint64_t Â device, const cudlaDevHandle*Â devHandle, const uint32_t Â flags )

Create a device handle.

device

- Device number (can be 0 or 1).

devHandle

- Pointer to hold the created cuDLA device handle.

flags

- Flags controlling device creation. Valid values for flags are:

CUDLA_CUDA_DLA - In this mode, cuDLA serves as a programming model extension of CUDA wherein DLA work can be submitted using CUDA constructs.
CUDLA_STANDALONE - In this mode, cuDLA works standalone without any interaction with CUDA.

Creates an instance of a cuDLA device which can be used to submit DLA operations. The application can create the handle in hybrid or standalone mode. In hybrid mode, the current set GPU device is used by this API to decide the association of the created DLA device handle. This function returns cudlaErrorUnsupportedOperation if the current set GPU device is a dGPU as cuDLA is not supported on dGPU presently. cuDLA supports 16 cuDLA device handles per DLA HW instance.

cudlaStatus cudlaDestroyDevice ( const cudlaDevHandleÂ devHandle )

Destroy device handle.

devHandle: - A valid device handle.

Destroys the instance of the cuDLA device which was created with cudlaCreateDevice. Before destroying the handle, it is important to ensure that all the tasks submitted previously to the device are completed. Failure to do so can lead to application crashes.

In hybrid mode, cuDLA internally performs memory allocations with CUDA using the primary context. As a result, before destroying or resetting a CUDA primary context, it is mandatory that all cuDLA device initializations are destroyed.

Note:

This API can return task execution errors from previous DLA task submissions.

cudlaStatus cudlaDeviceGetAttribute ( const cudlaDevHandleÂ devHandle, const cudlaDevAttributeTypeÂ attrib, const cudlaDevAttribute*Â pAttribute )

Get cuDLA device attributes.

devHandle: - The input cuDLA device handle.
attrib: - The attribute that is being requested.
pAttribute: - The output pointer where the attribute will be available.

UVA addressing between CUDA and DLA requires special support in the underlying kernel mode drivers. Applications are expected to query the cuDLA runtime to check if the current version of cuDLA supports UVA addressing.

Note:

This API can return task execution errors from previous DLA task submissions.

cudlaStatus cudlaDeviceGetCount ( const uint64_t*Â pNumDevices )

Get device count.

pNumDevices: - The number of DLA devices will be available in this variable upon successful completion.

Get number of DLA devices available to use.

cudlaStatus cudlaGetLastError ( const cudlaDevHandleÂ devHandle )

Gets the last asynchronous error in task execution.

devHandle: - A valid device handle.

The DLA tasks execute asynchronously on the DLA HW. As a result, the status of the task execution is not known at the time of task submission. The status of the task executed by the DLA HW most recently for the particular device handle can be queried using this interface.

Note that a return code of cudlaSuccess from this function does not necessarily imply that most recent task executed successfully. Since this function returns immediately, it can only report the status of the tasks at the snapshot of time when it is called. To be guaranteed of task completion, applications must synchronize on the submitted tasks in hybrid or standalone modes and then call this API to check for errors.

cudlaStatus cudlaGetNvSciSyncAttributes ( uint64_t*Â attrList, const uint32_t Â flags )

Get cuDLA's NvSciSync attributes.

attrList

- Attribute list created by the application.

flags

- Applications can use this flag to specify how they intend to use the NvSciSync object created from the attrList. The valid values of flags can be one of the following (or an OR of these values):

CUDLA_NVSCISYNC_ATTR_WAIT, specifies that the application intend to use the NvSciSync object created using this attribute list as a waiter in cuDLA and therefore needs cuDLA to fill waiter specific NvSciSyncAttr.
CUDLA_NVSCISYNC_ATTR_SIGNAL, specifies that the application intend to use the NvSciSync object created using this attribute list as a signaler in cuDLA and therefore needs cuDLA to fill signaler specific NvSciSyncAttr.

cudlaSuccess, The API call returned with no errors.
cudlaErrorInvalidParam, This API call failed because invalid parameter attrList was passed.
cudlaErrorUnsupportedOperation, This error code indicates that the API call failed because the operation is not supported in hybrid mode.
cudlaErrorInvalidAttribute, The API call failed as parameter attrList has invalid values.
cudlaErrorNvSci, This error code indicates error in the NvSci operation as part of the API call.
cudlaErrorNotPermittedOperation, This error code indicates that the API call is not permitted when DRIVE OS is in Operational state.
cudlaErrorUnknown, This error code indicates that an unknown error has occurred.

Gets the NvSciSync's attributes in the attribute list created by the application.

cuDLA supports two types of NvSciSync object primitives -

Sync point
Deterministic semaphore cuDLA prioritizes sync point primitive over deterministic semaphore primitive by default and sets these priorities in the NvSciSync attribute list.

For Deterministic semaphore, NvSciSync attribute list used to create the NvSciSync object must have value of NvSciSyncAttrKey_RequireDeterministicFences key set to true.

cuDLA also supports Timestamp feature on NvSciSync objects. Waiter can request for this by setting NvSciSync attribute "NvSciSyncAttrKey_WaiterRequireTimestamps" as true.

In the event of failed NvSci initialization this function would return cudlaErrorUnsupportedOperation. This function can return cudlaErrorNvSci or cudlaErrorInvalidAttribute in certain cases when the underlying NvSci operation fails.

This API updates the input nvSciSyncAttrList with values equivalent to the following public attribute key-values:

NvSciSyncAttrKey_RequiredPerm is set to

NvSciSyncAccessPerm_SignalOnly if value of flag is set to CUDLA_NVSCISYNC_ATTR_WAIT.
NvSciSyncAccessPerm_WaitOnly if value of flag is set to CUDLA_NVSCISYNC_ATTR_SIGNAL.
NvSciSyncAccessPerm_WaitSignal if value of flag is set to CUDLA_NVSCISYNC_ATTR_SIGNAL | CUDLA_NVSCISYNC_ATTR_WAIT.

As NvSciSyncAttrKey_RequiredPerm is internally set by cuDLA, setting this value by the application is disallowed.

Note:

Users of cuDLA can only append attributes to output attrList using NvSci API, modifying already populated values of the output attrList can result in undefined behavior.

cudlaStatus cudlaGetVersion ( const uint64_t*Â version )

Returns the version number of the library.

version: - cuDLA library version will be available in this variable upon successful execution.

cuDLA is semantically versioned. This function will return the version as 1000000*major + 1000*minor + patch.

cudlaStatus cudlaImportExternalMemory ( const cudlaDevHandleÂ devHandle, const cudlaExternalMemoryHandleDesc*Â desc, const uint64_t**Â devPtr, const uint32_t Â flags )

Imports external memory into cuDLA.

devHandle

- A valid device handle.

desc

- Contains description about allocated external memory.

devPtr

- The output pointer where the mapping will be available.

flags

- Application can use this flag to specify the memory access permissions of the memory that needs to be registered with DLA. The valid values of flags can be one of the following:

CUDLA_READ_WRITE_PERM, specifies that the external memory needs to be registered with DLA as read-write memory.
CUDLA_READ_ONLY_PERM, specifies that the external memory needs to be registered with DLA as read-only memory.
CUDLA_TASK_STATISTICS, specifies that the external memory needs to be registered with DLA for layerwise statistics.

Imports the allocated external memory by registering it with DLA. After successful registration, the returned pointer can be used in a task submit.

On Tegra, cuDLA supports importing NvSciBuf objects in standalone mode only. In the event of failed NvSci initialization (either due to usage of this API in hybrid mode or an issue in the NvSci library initialization), this function would return cudlaErrorUnsupportedOperation. This function can return cudlaErrorNvSci or cudlaErrorInvalidAttribute in certain cases when the underlying NvSci operation fails.

Note:

cuDLA only supports importing NvSciBuf objects of type NvSciBufType_RawBuffer or NvSciBufType_Tensor. Importing NvSciBuf object of any other type can result in an undefined behaviour.

Note:

This API can return task execution errors from previous DLA task submissions.

cudlaStatus cudlaImportExternalSemaphore ( const cudlaDevHandleÂ devHandle, const cudlaExternalSemaphoreHandleDesc*Â desc, const uint64_t**Â devPtr, const uint32_t Â flags )

Imports external semaphore into cuDLA.

devHandle: - A valid device handle.
desc: - Contains sempahore object.
devPtr: - The output pointer where the mapping will be available.
flags: - Reserved for future. Must be set to 0.

Imports the allocated external semaphore by registering it with DLA. After successful registration, the returned pointer can be used in a task submission to signal synchronization objects.

On Tegra, cuDLA supports importing NvSciSync objects in standalone mode only. NvSciSync object primitives that cuDLA supports are sync point and deterministic semaphore.

cuDLA also supports Timestamp feature on NvSciSync objects, using which the user can get a snapshot of DLA clock at which a particular fence is signaled. At any point in time there are only 512 valid timestamp buffers that can be associated with fences. For example, If User has created 513 fences from a single NvSciSync object with timestamp enabled then the timestamp buffer associated with 1st fence is same as with 513th fence.

In the event of failed NvSci initialization (either due to usage of this API in hybrid mode or an issue in the NvSci library initialization), this function would return cudlaErrorUnsupportedOperation. This function can return cudlaErrorNvSci or cudlaErrorInvalidAttribute in certain cases when the underlying NvSci operation fails.

Note:

This API can return task execution errors from previous DLA task submissions.

cudlaStatus cudlaMemRegister ( const cudlaDevHandleÂ devHandle, const uint64_t*Â ptr, const size_t Â size, const uint64_t**Â devPtr, const uint32_t Â flags )

Registers the CUDA memory to DLA engine.

devHandle

- A valid cuDLA device handle create by a previous call to cudlaCreateDevice.

ptr

- The CUDA pointer to be registered.

size

- The size of the mapping i.e the number of bytes from ptr that must be mapped.

devPtr

- The output pointer where the mapping will be available.

flags

- Applications can use this flag to control several aspects of the registration process. The valid values of flags can be one of the following (or an OR of these values):

0, default
CUDLA_TASK_STATISTICS, specifies that the external memory needs to be registered with DLA for layerwise statistics.

As part of registration, a system mapping is created whereby the DLA HW can access the underlying CUDA memory. The resultant mapping is available in devPtr and applications must use this mapping while referring this memory in submit operations.

This function will return cudlaErrorInvalidAddress if the pointer or size to be registered is invalid. In addition, if the input pointer was already registered, then this function will return cudlaErrorMemoryRegistered. Attempting to re-register memory does not cause any irrecoverable error in cuDLA and applications can continue to use cuDLA APIs even after this error has occurred.

Note:

This API can return task execution errors from previous DLA task submissions.

cudlaStatus cudlaMemUnregister ( const cudlaDevHandleÂ devHandle, const uint64_t*Â devPtr )

Unregisters the input memory from DLA engine.

devHandle: - A valid cuDLA device handle create by a previous call to cudlaCreateDevice.
devPtr: - The pointer to be unregistered.

cudlaStatus cudlaModuleGetAttributes ( const cudlaModuleÂ hModule, const cudlaModuleAttributeTypeÂ attrType, const cudlaModuleAttribute*Â attribute )

Get DLA module attributes.

hModule: - The input DLA module.
attrType: - The attribute type that is being requested.
attribute: - The output pointer where the attribute will be available.

Get module attributes from the loaded module. This API returns cudlaErrorInvalidDevice if the module is not loaded in any device.

Note:

This API can return task execution errors from previous DLA task submissions.

cudlaStatus cudlaModuleLoadFromMemory ( const cudlaDevHandleÂ devHandle, const uint8_t*Â pModule, const size_t Â moduleSize, const cudlaModule*Â hModule, const uint32_t Â flags )

Load a DLA module.

devHandle

- The input cuDLA device handle. The module will be loaded in the context of this handle.

pModule

- A pointer to an in-memory module.

moduleSize

- The size of the module.

hModule

- The address in which the loaded module handle will be available upon successful execution.

flags

- Applications can use this flag to specify how the module is going to be used. The valid values of flags can be one of the following:

CUDLA_MODULE_DEFAULT, Default value which is 0.
CUDLA_MODULE_ENABLE_FAULT_DIAGNOSTICS, Application can specify this flag to load a module that is used for performing fault diagnostics for DLA HW. With this flag set, the pModule and moduleSize parameters shall be NULL and 0 as the diagnostics module is loaded internally.

Loads the module into the current device handle.

Multiple loadables are not allowed to load onto single cuDLA device handle.
A Loadable can only be loaded once in cuDLA device handle lifecycle.

Note:

This API can return task execution errors from previous DLA task submissions.

cudlaStatus cudlaModuleUnload ( const cudlaModuleÂ hModule, const uint32_t Â flags )

Unload a DLA module.

hModule: - Handle to the loaded module.
flags: - Reserved for future. Must be set to 0.

Unload the module from the device handle that it was loaded into. This API returns cudlaErrorInvalidDevice if the module is not loaded into a valid device.

Note:

This API can return task execution errors from previous DLA task submissions.

cudlaStatus cudlaSetTaskTimeoutInMs ( const cudlaDevHandleÂ devHandle, const uint32_t Â timeout )

Set task timeout in millisecond.

devHandle: - A valid device handle.
timeout: - task timeout value in ms.

Set task timeout in ms for each device handle. cuDLA sets 30 seconds as default timeout value if user doesn't explicitly set the timeout.

In case , device handle is invalid or timeout is 0 or timeout is greater than 1000 sec, this function would return cudlaErrorInvalidParam otherwise cudlaSuccess.

Note:

This API can return task execution errors from previous DLA task submissions.

cudlaStatus cudlaSubmitTask ( const cudlaDevHandleÂ devHandle, const cudlaTask*Â ptrToTasks, const uint32_t Â numTasks, const void*Â stream, const uint32_t Â flags )

Submits the inference operation on DLA.

devHandle

- A valid cuDLA device handle.

ptrToTasks

- A list of inferencing tasks.

numTasks

- The number of tasks.

stream

- The stream on which the DLA task has to be submitted.

flags

- Applications can use this flag to control several aspects of the submission process. The valid values of flags can be one of the following (or an OR of these values):

0, default
CUDLA_SUBMIT_NOOP, specifies that the submitted task must be skipped during execution on the DLA. However, all the waitEvents and signalEvents dependencies must be satisfied. This flag is ignored when NULL data submissions are being done as in that case only the wait and signal events are internally stored for the next task submission.
CUDLA_SUBMIT_SKIP_LOCK_ACQUIRE, specifies that the submitted task is being enqueued in a device handle and that no other task is being enqueued in that device handle at that time in any other thread. This is a flag that apps can use as an optimization. Ordinarily, the cuDLA APIs acquire a global lock internally to guarantee thread safety. However, this lock causes unwanted serialization in cases where the the applications are submitting tasks to different device handles. If an application was submitting one or more tasks in multiple threads and if these submissions are to different device handles and if there is no shared data being provided as part of the task information in the respective submissions then applications can specify this flag during submission so that the internal lock acquire is skipped. Shared data also includes the input stream in hybrid mode operation. Therefore, if the same stream is being used to submit two different tasks and even if the two device handles are different, the usage of this flag is invalid.
CUDLA_SUBMIT_DIAGNOSTICS_TASK, specifies that the submitted task is to run permanent fault diagnostics for DLA HW. User can use this task to probe the state of DLA HW. With this flag set, in standalone mode user is not allowed to do event only submissions, where tensor information is NULL and only events (wait/signal or both) are present in task. This is because the task always runs on a internally loaded diagnostic module. This diagnostic module does not expect any input tensors and so input tensor memory, however user is expected to query no. of output tensors, allocate the output tensor memory and pass the same while using the submit task.

cudlaSuccess, cudlaErrorInvalidParam, cudlaErrorInvalidDevice, cudlaErrorInvalidModule, cudlaErrorCuda, cudlaErrorUmd, cudlaErrorOutOfResources, cudlaErrorInvalidAddress, cudlaErrorUnsupportedOperation, cudlaErrorInvalidAttribute, cudlaErrorNvSci cudlaErrorOs

This operation takes in a sequence of tasks and submits them to the DLA HW for execution in the same sequence as they appear in the input task array. The input and output tensors (and statistics buffer if used) are assumed to be pre-registered using cudlaMemRegister (in hybrid mode) or cudlaImportExternalMemory (in standalone mode). Failure to do so can result in this function returning cudlaErrorInvalidAddress.

The stream parameter must be specified as the CUDA stream on which the DLA task is submitted for execution in hybrid mode. In standalone mode, this parameter must be passed as NULL and failure to do so will result in this function returning cudlaErrorInvalidParam.

The cudlaTask structure has a provision to specify wait and signal events that cuDLA must wait on and signal respectively as part of cudlaSubmitTask(). Each submitted task will wait for all its wait events to be signaled before beginning execution and will provide a signal event (if one is requested for during cudlaSubmitTask) that the application (or any other entity) can wait on to ensure that the submitted task has completed execution. In cuDLA 1.0, only NvSciSync fences are supported as part of wait events. Furthermore, only NvSciSync objects (registered as part of cudlaImportExternalSemaphore) can be signaled as part of signal events and the fence corresponding to the signaled event is returned as part of cudlaSubmitTask.

In standalone mode, if inputTensor and outputTensor fields are set to NULL inside the cudlaTask structure, the task submission is interpreted as an enqueue of wait and signal events that must be considered for subsequent task submission. No actual task submission is done. Multiple such subsequent task submissions with NULL fields in the input/outputTensor fields will overwrite the list of wait and signal events to be considered. In other words, the latest non-null wait events and/or latest non-null signal events before a non-null submission are considered for subsequent actual task submission. During an actual task submit in standalone mode, the effective wait events and signal events that will be considered are what the application sets using NULL data submissions and what is set for that particular task submission in the waitEvents and signalEvents fields. The wait events set as part of NULL data submission are considered as dependencies for only the first task and the signal events set as part of NULL data submission are signaled when the last task of task list is complete. All constraints that apply to waitEvents and signalEvents individually (as described below) are also applicable to the combined list.

cuDLA supports 3 kinds of fences - preFence, SOF fence and EOF fence.

preFence is the type of fence that DLA waits on to start the task execution. Use cudlaFenceType as CUDLA_NVSCISYNC_FENCE to mark a fence as preFence.
SOF(Start Of Frame) fence is the type of fence which is signaled before the task execution on DLA starts. Use cudlaFenceType as CUDLA_NVSCISYNC_FENCE_SOF to mark a fence as SOF fence.
EOF(End Of Frame) fence is the type of fence which is signaled after the task execution on DLA is complete. Use cudlaFenceType as CUDLA_NVSCISYNC_FENCE to mark a fence as EOF fence.

For wait events, applications are expected to

register their synchronization objects using cudlaImportExternalSemaphore.
create the required number of preFence placeholders using CudlaFence.
fill in the placeholders with the relevant fences from the application.
list out all the fences in cudlaWaitEvents.

For signal events, applications are expected to

register their synchronization objects using cudlaImportExternalSemaphore.
create the required number of SOF and EOF fence placeholder fences using CudlaFence.
place the registered objects and the corresponding fences in cudlaSignalEvents. In case ofdeterministic semaphore, fence is not required to be passed in cudlaSignalEvents.

When cudlaSubmitTask returns successfully, the fences present in cudlaSignalEvents can be used to wait for the particular task to be completed. cuDLA supports 1 sync point and any number of semaphores as part of cudlaSignalEvents. If more than 1 sync point is specified, cudlaErrorInvalidParam is returned.

cuDLA adheres to DLA's restriction to support 29 preFences and SOF fences combined together and 29 EOF fences per DLA Task.

During submission, users have an option to enable layerwise statistics profiling for the individual layers of the network. This option needs to be exercised by specifying additional output buffers that would contain the profiling information. Specifically,

"cudlaTask::numOutputTensors" should be the sum of value returned by cudlaModuleGetAttributes(...,CUDLA_NUM_OUTPUT_TENSORS,...) and cudlaModuleGetAttributes(...,CUDLA_NUM_OUTPUT_TASK_STATISTICS,...)
"cudlaTask::outputTensor" should contain the array of output tensors appended with array of statistics output buffer.

This function can return cudlaErrorUnsupportedOperation if

stream being used in hybrid mode is in capturing state.
application attempts to use NvSci functionalities in hybrid mode.
loading of NvSci libraries failed for a particular platform.
fence type other than CUDLA_NVSCISYNC_FENCE is specified.
waitEvents or signaEvents is not NULL in hybrid mode.
inputTensor or outputTensor is NULL in hybrid mode and the flags are not CUDLA_SUBMIT_DIAGNOSTICS_TASK.
inputTensor is NULL and outputTensor is not NULL and vice versa in standalone mode and the flags are not CUDLA_SUBMIT_DIAGNOSTICS_TASK.
inputTensor and outputTensor is NULL and number of tasks is not equal to 1 in standalone mode and the flags are not CUDLA_SUBMIT_DIAGNOSTICS_TASK.
inputTensor is not NULL or output tensor is NULL and the flags are CUDLA_SUBMIT_DIAGNOSTICS_TASK.
the effective signal events list has multiple sync points to signal.
if layerwise feature is unsupported.
if preFences, SOF fences and EOF fences limit per task is not met.

This function can return cudlaErrorNvSci or cudlaErrorInvalidAttribute in certain cases when the underlying NvSci operation fails.

This function can return cudlaErrorOs if an internal system operation fails.

Note:

This API can return task execution errors from previous DLA task submissions.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4