Advanced cluster configurations can be used to tailor your NVIDIA Run:ai cluster deployment to meet specific operational requirements and optimize resource management. By fine-tuning these settings, you can enhance functionality, ensure compatibility with organizational policies, and achieve better control over your cluster environment. This article provides guidance on implementing and managing these configurations to adapt the NVIDIA Run:ai cluster to your unique needs.
After the NVIDIA Run:ai cluster is installed, you can adjust various settings to better align with your organization's operational needs and security requirements.
Modify Cluster ConfigurationsAdvanced cluster configurations in NVIDIA Run:ai are managed through the runaiconfig
Kubernetes Custom Resource . To edit the cluster configurations, run:
kubectl edit runaiconfig runai -n runai
To see the full runaiconfig
object structure, use:
kubectl get crds/runaiconfigs.run.ai -n runai -o yaml
The following configurations allow you to enable or disable features, control permissions, and customize the behavior of your NVIDIA Run:ai cluster:
spec.global.affinity
(object)
Sets the system nodes where NVIDIA Run:ai system-level services are scheduled. Using global.affinity will overwrite the node roles set using the Administrator CLI (runai-adm). Default: Prefer to schedule on nodes that are labeled with node-role.kubernetes.io/runai-system
spec.global.nodeAffinity.restrictScheduling
(boolean)
Enables setting node roles and restricting workload scheduling to designated nodes Default: false
spec.global.tolerations
(object)
Configure Kubernetes tolerations for NVIDIA Run:ai system-level services
spec.global.subdomainSupport
(boolean)
spec.global.devicePluginBindings
(boolean)
Instruct NVIDIA Run:ai fractions to use device plugin for host mount instead of NVIDIA Run:ai fractions using explicit host path mount configuration on the pod. See GPU fractions and dynamic GPU fractions. Default: false
spec.global.enableWorkloadOwnershipProtection
(boolean)
Prevents users within the same project from deleting workloads created by others. This enhances workload ownership security and ensures better collaboration by restricting unauthorized modifications or deletions. Default: false
spec.project-controller.createNamespaces
(boolean)
Allows Kubernetes namespace creation for new projects Default: true
spec.project-controller.createRoleBindings
(boolean)
Specifies if role bindings should be created in the project's namespace Default: true
spec.project-controller.limitRange
(boolean)
Specifies if limit ranges should be defined for projects Default: true
spec.project-controller.clusterWideSecret
(boolean)
Allows Kubernetes Secrets creation at the cluster scope. See Credentials for more details. Default: true
spec.workload-controller.additionalPodLabels
(object)
Set workload's Pod Labels in a format of key/value pairs. These labels are applied to all pods.
spec.workload-controller.failureResourceCleanupPolicy
NVIDIA Run:ai cleans the workload's unnecessary resources:
All
- Removes all resources of the failed workload
None
- Retains all resources
KeepFailing
- Removes all resources except for those that encountered issues (primarily for debugging purposes)
Default: All
spec.workload-controller.GPUNetworkAccelerationEnabled
spec.mps-server.enabled
(boolean)
spec.daemonSetsTolerations
(object)
Configure Kubernetes tolerations for NVIDIA Run:ai daemonSets / engine
spec.runai-container-toolkit.logLevel
(boolean)
Specifies the NVIDIA Run:ai-container-toolkit logging level: either 'SPAM', 'DEBUG', 'INFO', 'NOTICE', 'WARN', or 'ERROR' Default: INFO
spec.runai-container-toolkit.enabled
(boolean)
node-scale-adjuster.args.gpuMemoryToFractionRatio
(object)
A scaling-pod requesting a single GPU device will be created for every 1 to 10 pods requesting fractional GPU memory (1/gpuMemoryToFractionRatio). This value represents the ratio (0.1-0.9) of fractional GPU memory (any size) to GPU fraction (portion) conversion. Default: 0.1
spec.global.core.dynamicFractions.enabled
(boolean)
spec.global.core.swap.enabled
(boolean)
spec.global.core.swap.limits.cpuRam
(string)
Sets the CPU memory size used to swap GPU workloads Default:100Gi
spec.global.core.swap.limits.reservedGpuRam
(string)
Sets the reserved GPU memory size used to swap GPU workloads Default: 2Gi
spec.global.core.swap.biDirectional
(string)
Sets the read/write memory mode of GPU memory swap to bi-directional (fully duplex). This produces higher performance (typically +80%) vs. uni-directional (simplex) read-write operations. For more details, see GPU memory swap. Default: false
spec.global.core.swap.mode
(string)
Sets the GPU to CPU memory swap method to use UVA and optimized memory prefetch for optimized performance in some scenarios. For more details, see GPU memory swap. Default: None. The parameter is not set by default. To add this parameter set mode=mapped
.
spec.global.core.nodeScheduler.enabled
(boolean)
spec.global.core.timeSlicing.mode
(string)
Sets the GPU time-slicing mode. Possible values:
timesharing
- all pods on a GPU share the GPU compute time evenly.
strict
- each pod gets an exact time slice according to its memory fraction value.
fair
- each pod gets an exact time slice according to its memory fraction value and any unused GPU compute time is split evenly between the running pods.
Default: timesharing
spec.runai-scheduler.args.fullHierarchyFairness
(boolean)
Enables fairness between departments, on top of projects fairness Default: true
spec.runai-scheduler.args.defaultStalenessGracePeriod
Sets the timeout in seconds before the scheduler evicts a stale pod-group (gang) that went below its min-members in running state:
0s
- Immediately (no timeout)
Default: 60s
spec.pod-grouper.args.gangSchedulingKnative
(boolean)
Enables gang scheduling for inference workloads.For backward compatibility with versions earlier than v2.19, change the value to false Default: false
spec.pod-grouper.args.gangScheduleArgoWorkflow
(boolean)
Groups all pods of a single ArgoWorkflow workload into a single Pod-Group for gang scheduling Default: true
spec.runai-scheduler.args.verbosity
(int)
Configures the level of detail in the logs generated by the scheduler service Default: 4
spec.limitRange.cpuDefaultRequestCpuLimitFactorNoGpu
(string)
Sets a default ratio between the CPU request and the limit for workloads without GPU requests Default: 0.1
spec.limitRange.memoryDefaultRequestMemoryLimitFactorNoGpu
(string)
Sets a default ratio between the memory request and the limit for workloads without GPU requests Default: 0.1
spec.limitRange.cpuDefaultRequestGpuFactor
(string)
Sets a default amount of CPU allocated per GPU when the CPU is not specified Default: 100
spec.limitRange.cpuDefaultLimitGpuFactor
(int)
Sets a default CPU limit based on the number of GPUs requested when no CPU limit is specified Default: NO DEFAULT
spec.limitRange.memoryDefaultRequestGpuFactor
(string)
Sets a default amount of memory allocated per GPU when the memory is not specified Default: 100Mi
spec.limitRange.memoryDefaultLimitGpuFactor
(string)
Sets a default memory limit based on the number of GPUs requested when no memory limit is specified Default: NO DEFAULT
NVIDIA Run:ai cluster includes many different services. To simplify resource management, the configuration structure allows you to configure the containers CPU / memory resources for each service individually or group of services together.
Containers associated with the NVIDIA Run:ai Scheduler
Scheduler, StatusUpdater, MetricsExporter, PodGrouper, PodGroupAssigner, Binder
Containers associated with syncing updates between the NVIDIA Run:ai cluster and the NVIDIA Run:ai control plane
Agent, ClusterSync, AssetsSync
Containers associated with submitting NVIDIA Run:ai workloads
WorkloadController,
JobController
Apply the following configuration in order to change resources request and limit for a group of services:
spec:
global:
<service-group-name>: # schedulingServices | syncServices | workloadServices
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 512Mi
Or, apply the following configuration in order to change resources request and limit for each service individually:
spec:
<service-name>: # for example: pod-grouper
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 512Mi
For resource recommendations, see Vertical scaling.
NVIDIA Run:ai Services ReplicasBy default, all NVIDIA Run:ai containers are deployed with a single replica. Some services support multiple replicas for redundancy and performance.
To simplify configuring replicas, a global replicas configuration can be set and is applied to all supported services:
spec:
global:
replicaCount: 1 # default
This can be overwritten for specific services (if supported). Services without the replicas
configuration does not support replicas:
spec:
<service-name>: # for example: pod-grouper
replicas: 1 # default
The Prometheus instance in NVIDIA Run:ai is used for metrics collection and alerting.
The configuration scheme follows the official PrometheusSpec and supports additional custom configurations. The PrometheusSpec schema is available using the spec.prometheus.spec
configuration.
A common use case using the PrometheusSpec is for metrics retention. This prevents metrics loss during potential connectivity issues and can be achieved by configuring local temporary metrics retention. For more information, see Prometheus Storage :
spec:
prometheus:
spec: # PrometheusSpec
retention: 2h # default
retentionSize: 20GB
In addition to the PrometheusSpec schema, some custom NVIDIA Run:ai configurations are also available:
Additional labels – Set additional labels for NVIDIA Run:ai's built-in alerts sent by Prometheus.
Log level configuration – Configure the logLevel
setting for the Prometheus container.
spec:
prometheus:
logLevel: info # debug | info | warn | error
additionalAlertLabels:
- env: prod # example
NVIDIA Run:ai Managed Nodes
To include or exclude specific nodes from running workloads within a cluster managed by NVIDIA Run:ai, use the nodeSelectorTerms
flag. For additional details, see Kubernetes nodeSelector .
Label the nodes using the below:
key: Label key (e.g., zone, instance-type).
operator: Operator defining the inclusion/exclusion condition (In, NotIn, Exists, DoesNotExist).
values: List of values for the key when using In or NotIn.
The below example shows how to include NVIDIA GPUs only and exclude all other GPU types in a cluster with mixed nodes, based on product type GPU label:
spec:
global:
managedNodes:
inclusionCriteria:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu.product
operator: Exists
S3 and Git Sidecar Images
For air-gapped environments, when working with a Local Certificate Authority , it is required to replace the default sidecar images in order to use the Git and S3 data source integrations. Use the following configurations:
spec:
workload-controller:
s3FileSystemImage:
name: goofys
registry: runai.jfrog.io/op-containers-prod
tag: 3.12.24
gitSyncImage:
name: git-sync
registry: registry.k8s.io
tag: v4.4.0
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4