A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://run-ai-docs.nvidia.com/self-hosted/infrastructure-setup/advanced-setup/node-roles below:

Node Roles | Run:ai Documentation

Node Roles | Run:ai Documentation
  1. Infrastructure setup
  2. Advanced Setup
Node Roles

This article explains how to designate specific node roles in a Kubernetes cluster to ensure optimal performance and reliability in production deployments.

For optimal performance in production clusters, it is essential to avoid extensive CPU usage on GPU nodes where possible. This can be done by ensuring the following:

NVIDIA Run:ai services are scheduled on the defined node roles by applying Kubernetes Node Affinity using node labels .

To perform these tasks, make sure to install the NVIDIA Run:ai Administrator CLI.

The following node roles can be configured on the cluster:

NVIDIA Run:ai system nodes run system-level services required to operate. This can be done via the Kubectl (recommended) or via NVIDIA Run:ai Administrator CLI.

By default, NVIDIA Run:ai applies a node affinity rule to prefer nodes that are labeled with node-role.kubernetes.io/runai-system for system services scheduling. You can modify the default node affinity rule by:

Note

To set a system role for a node in your Kubernetes cluster using Kubectl, follow these steps:

  1. Use the kubectl get nodes command to list all the nodes in your cluster and identify the name of the node you want to modify.

  2. Run one of the following commands to label the node with its role:

    kubectl label nodes <node-name> node-role.kubernetes.io/runai-system=true
    kubectl label nodes <node-name> node-role.kubernetes.io/runai-system=false
NVIDIA Run:ai Administrator CLI

Note

The NVIDIA Run:ai Administrator CLI only supports the default node affinity.

To set a system role for a node in your Kubernetes cluster, follow these steps:

  1. Run the kubectl get nodes command to list all the nodes in your cluster and identify the name of the node you want to modify.

  2. Run one of the following commands to set or remove a node’s role:

    runai-adm set node-role --runai-system-worker <node-name>
    runai-adm remove node-role --runai-system-worker <node-name>

The set node-role command will label the node and set relevant cluster configurations.

NVIDIA Run:ai worker nodes run user-submitted workloads and system-level DeamonSets required to operate. This can be managed via the Kubectl (recommended) or via NVIDIA Run:ai Administrator CLI.

By default, GPU workloads are scheduled on GPU nodes based on the nvidia.com/gpu.present label. When global.nodeAffinity.restrictScheduling is set to true via the Advanced cluster configurations:

To set a worker role for a node in your Kubernetes cluster using Kubectl, follow these steps:

  1. Validate the global.nodeAffinity.restrictScheduling is set to true in the cluster’s Configurations.

  2. Use the kubectl get nodes command to list all the nodes in your cluster and identify the name of the node you want to modify.

  3. Run one of the following commands to label the node with its role. Replace the label and value (true/false) to enable or disable GPU/CPU roles as needed:

    kubectl label nodes <node-name> node-role.kubernetes.io/runai-gpu-worker=true
    kubectl label nodes <node-name> node-role.kubernetes.io/runai-cpu-worker=false
NVIDIA Run:ai Administrator CLI

To set worker role for a node in your Kubernetes cluster via NVIDIA Run:ai Administrator CLI, follow these steps:

  1. Use the kubectl get nodes command to list all the nodes in your cluster and identify the name of the node you want to modify.

  2. Run one of the following commands to set or remove a node’s role. <node-role> must be either --gpu-worker or --cpu-worker :

    runai-adm set node-role <node-role> <node-name>
    runai-adm remove node-role <node-role> <node-name>

The set node-role command will label the node and set cluster configuration global.nodeAffinity.restrictScheduling true.

Note

Use the --all flag to set or remove a role to all nodes.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4