A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/kubernetes-engine/docs/how-to/gpus-multi below:

Running multi-instance GPUs | Google Kubernetes Engine (GKE)

This page provides instructions on how to increase utilization and reduce costs by running multi-instance GPUs. With this configuration, you partition an NVIDIA A100, H100, H200, or B200 graphics processing unit (GPU) to share a single GPU across multiple containers on Google Kubernetes Engine (GKE).

Before reading this page, ensure that you're familiar with Kubernetes concepts such as Pods, nodes, deployments, and namespaces and are familiar with GKE concepts such as node pools, autoscaling, and auto-provisioning.

Introduction

Kubernetes allocates one full GPU per container even if the container only needs a fraction of the GPU for its workload, which might lead to wasted resources and cost overrun, especially if you are using the latest generation of powerful GPUs. To improve GPU utilization, multi-instance GPUs allow you to partition a single supported GPU in up to seven slices. Each slice can be allocated to one container on the node independently, for a maximum of seven containers per GPU. Multi-instance GPUs provide hardware isolation between the workloads, and consistent and predictable QoS for all containers running on the GPU.

For CUDA® applications, multi-instance GPUs are largely transparent. Each GPU partition appears as a regular GPU resource, and the programming model remains unchanged.

For more information on multi-instance GPUs, refer to the NVIDIA multi-instance GPU user guide.

Supported GPUs

The following GPU types support multi-instance GPUs:

Note: If you split NVIDIA H100, H200, or B200 GPUs with multi-instance GPUs, you can't use GPUDirect technologies (including TCPX, TCPXO, and RDMA). Multi-instance GPU partitions

The A100, H100, H200, and B200 GPU each consist of seven compute units and eight memory units, which you can partition into GPU instances of varying sizes. To configure GPU partition sizes, use the following syntax: [compute]g.[memory]gb. For example, a GPU partition size of 1g.5gb refers to a GPU instance with one compute unit (1/7th of streaming multiprocessors on the GPU), and one memory unit (5 GB). The partition size for the GPUs can be specified when you deploy an Autopilot workload or when you create a Standard cluster.

The partitioning table in the NVIDIA multi-instance GPU user guide lists all the different GPU partition sizes, along with the amount of compute and memory resources available on each GPU partition. The table also shows the number of GPU instances for each partition size that can be created on the GPU.

The following table lists the partition sizes that GKE supports:

Partition size GPU instances GPU: NVIDIA A100 (40GB) (nvidia-tesla-a100) 1g.5gb 7 2g.10gb 3 3g.20gb 2 7g.40gb 1 GPU: NVIDIA A100 (80GB) (nvidia-a100-80gb) 1g.10gb 7 2g.20gb 3 3g.40gb 2 7g.80gb 1 GPU: NVIDIA H100 (80GB) (nvidia-h100-80gb and nvidia-h100-mega-80gb) 1g.10gb 7 1g.20gb 4 2g.20gb 3 3g.40gb 2 7g.80gb 1 GPU: NVIDIA H200 (141GB) (nvidia-h200-141gb) 1g.18gb 7 1g.35gb 4 2g.35gb 3 3g.71gb 2 4g.71gb 1 7g.141gb 1 GPU: NVIDIA B200 (nvidia-b200) 1g.23gb 7 1g.45gb 4 2g.45gb 3 3g.90gb 2 4g.90gb 1 7g.180gb 1

Each GPU on each node within a node pool is partitioned the same way. For example, consider a node pool with two nodes, four GPUs on each node, and a partition size of 1g.5gb. GKE creates seven partitions of size 1g.5gb on each GPU. Since there are four GPUs on each node, there are 28 1g.5gb GPU partitions available on each node. Since there are two nodes in the node pool, a total of 56 1g.5gb GPU partitions are available in the entire node pool.

To create a GKE Standard cluster with more than one type of GPU partition, you must create multiple node pools. For example, if you want nodes with 1g.5gb and 3g.20gb GPU partitions in a cluster, you must create two node pools: one with the GPU partition size set to 1g.5gb, and the other with 3g.20gb.

A GKE Autopilot cluster automatically creates nodes with the correct partition configuration when you create workloads that require different partition sizes.

Each node is labeled with the size of GPU partitions that are available on the node. This labeling allows workloads to target nodes with the needed GPU partition size. For example, on a node with 1g.5gb GPU instances, the node is labeled as:

cloud.google.com/gke-gpu-partition-size=1g.5gb
How it works

To use multi-instance GPUs, you perform the following tasks:

  1. Create a cluster with multi-instance GPUs enabled.
  2. Manually install drivers.
  3. Verify how many GPU resources are on the node.
  4. Deploy containers using multi-instance GPUs.
Pricing

Multi-instance GPUs are exclusive to A100 GPUs, H100 GPUs, H200 GPUs, and B200 GPUs, and are subject to the corresponding GPU pricing in addition to any other products used to run your workloads. You can only attach whole GPUs to nodes in your cluster for partitioning. For GPU pricing information, refer to the GPUs pricing page.

Limitations Before you begin

Before you start, make sure that you have performed the following tasks:

Create a cluster with multi-instance GPUs enabled

If you use GKE Standard, you must enable multi-instance GPUs in the cluster. Autopilot clusters that run version 1.29.3-gke.1093000 and later enable multi-instance GPUs by default. To use multi-instance GPUs in Autopilot, see the Deploy containers using multi-instance GPU section of this page.

When you create a Standard cluster with multi-instance GPUs, you must specify gpuPartitionSize along with acceleratorType and acceleratorCount. The acceleratorType must be nvidia-tesla-a100, nvidia-a100-80gb, nvidia-h100-80gb, nvidia-h200-141gb, or nvidia-b200.

The following example shows how to create a GKE cluster with one node, and seven GPU partitions of size 1g.5gb on the node. The other steps in this page use a GPU partition size of 1g.5gb, which creates seven partitions on each GPU. You can also use any of the supported GPU partition sizes mentioned earlier.

You can use the Google Cloud CLI or Terraform.

gcloud

Create a cluster with multi-instance GPUs enabled:

gcloud container clusters create CLUSTER_NAME  \
    --project=PROJECT_ID  \
    --location CONTROL_PLANE_LOCATION  \
    --cluster-version=CLUSTER_VERSION  \
    --accelerator type=nvidia-tesla-a100,count=1,gpu-partition-size=1g.5gb,gpu-driver-version=DRIVER_VERSION  \
    --machine-type=a2-highgpu-1g  \
    --num-nodes=1

Replace the following:

Terraform

To create a cluster with multi-instance GPUs enabled using Terraform, refer to the following example:

To learn more about using Terraform, see Terraform support for GKE.

Connect to the cluster

Configure kubectl to connect to the newly created cluster:

gcloud container clusters get-credentials CLUSTER_NAME
Install drivers

If you chose to disable automatic driver installation when creating the cluster, or if you're running a GKE version earlier than 1.27.2-gke.1200, you must manually install a compatible NVIDIA driver after creation completes. Multi-instance GPUs require an NVIDIA driver version 450.80.02 or later.

After the driver is installed, multi-instance GPU mode is enabled. If you automatically installed drivers, your nodes reboot when the GPU device plugin starts to create GPU partitions. If you manually installed drivers, your nodes reboot when driver installation completes. The reboot might take a few minutes to complete.

Verify how many GPU resources are on the node

Run the following command to verify that the capacity and allocatable count of nvidia.com/gpu resources is 7:

kubectl describe nodes

Here's the output from the command:

...
Capacity:
  ...
  nvidia.com/gpu:             7
Allocatable:
  ...
  nvidia.com/gpu:             7
Deploy containers using multi-instance GPU

You can deploy up to one container per multi-instance GPU device on the node. In this example, with a partition size of 1g.5gb, there are seven multi-instance GPU partitions available on the node. As a result, you can deploy up to seven containers that request GPUs on this node.

Caution: If your GKE nodes include both multi-instance GPU nodes and non-multi-instance GPU nodes, use node anti-affinity on your Pods that don't need multi-instance GPUs. Node anti-affinity helps ensure that these Pods don't accidentally land on nodes that have multi-instance GPUs.
  1. Here's an example that starts the cuda:11.0.3-base-ubi7 container and runs nvidia-smi to print the UUID of the GPU within the container. In this example, there are seven containers, and each container receives one GPU partition. This example also sets the cloud.google.com/gke-gpu-partition-size node selector to target nodes with 1g.5gb GPU partitions.

    Autopilot
    kubectl apply -f -  <<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: cuda-simple
    spec:
      replicas: 7
      selector:
        matchLabels:
          app: cuda-simple
      template:
        metadata:
          labels:
            app: cuda-simple
        spec:
          nodeSelector:
            cloud.google.com/gke-gpu-partition-size: 1g.5gb
            cloud.google.com/gke-accelerator: nvidia-tesla-a100
            cloud.google.com/gke-accelerator-count: "1"
          containers:
          - name: cuda-simple
            image: nvidia/cuda:11.0.3-base-ubi7
            command:
            - bash
            - -c
            - |
              /usr/local/nvidia/bin/nvidia-smi -L; sleep 300
            resources:
              limits:
                nvidia.com/gpu: 1
    EOF
          

    This manifest does the following:

    Standard
    kubectl apply -f -  <<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: cuda-simple
    spec:
      replicas: 7
      selector:
        matchLabels:
          app: cuda-simple
      template:
        metadata:
          labels:
            app: cuda-simple
        spec:
          nodeSelector:
            cloud.google.com/gke-gpu-partition-size: 1g.5gb
          containers:
          - name: cuda-simple
            image: nvidia/cuda:11.0.3-base-ubi7
            command:
            - bash
            - -c
            - |
              /usr/local/nvidia/bin/nvidia-smi -L; sleep 300
            resources:
              limits:
                nvidia.com/gpu: 1
    EOF
          

    This manifest does the following:

  2. Verify that all seven Pods are running:

    kubectl get pods
    

    Here's the output from the command:

    NAME                           READY   STATUS    RESTARTS   AGE
    cuda-simple-849c47f6f6-4twr2   1/1     Running   0          7s
    cuda-simple-849c47f6f6-8cjrb   1/1     Running   0          7s
    cuda-simple-849c47f6f6-cfp2s   1/1     Running   0          7s
    cuda-simple-849c47f6f6-dts6g   1/1     Running   0          7s
    cuda-simple-849c47f6f6-fk2bs   1/1     Running   0          7s
    cuda-simple-849c47f6f6-kcv52   1/1     Running   0          7s
    cuda-simple-849c47f6f6-pjljc   1/1     Running   0          7s
    
  3. View the logs to see the GPU UUID, using the name of any Pod from the previous command:

    kubectl logs cuda-simple-849c47f6f6-4twr2
    

    Here's the output from the command:

    GPU 0: A100-SXM4-40GB (UUID: GPU-45eafa61-be49-c331-f8a2-282736687ab1)
      MIG 1g.5gb Device 0: (UUID: MIG-GPU-45eafa61-be49-c331-f8a2-282736687ab1/11/0)
    
What's next

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4