A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests below:

Resource requests in Autopilot | Google Kubernetes Engine (GKE)

This page describes how Google Kubernetes Engine (GKE) Autopilot manages the values of workload resource requests, such as CPU, memory, or ephemeral storage. This page includes the following information, which you can use to plan efficient, stable, and cost-effective workloads:

This page is for Operators and Developers who provision and configure cloud resources, and deploy workloads. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.

Before reading this page, ensure that you're familiar with Kubernetes resource management concepts.

Overview of resource requests in Autopilot

Autopilot uses the resource requests that you specify in your workload configuration to configure the nodes that run your workloads. Autopilot enforces minimum and maximum resource requests based on the compute class or the hardware configuration that your workloads use. If you don't specify requests for some containers, Autopilot assigns default values to let those containers run correctly.

When you deploy a workload in an Autopilot cluster, GKE validates the workload configuration against the allowed minimum and maximum values for the selected compute class or hardware configuration (such as GPUs). If your requests are less than the minimum, Autopilot automatically modifies your workload configuration to bring your requests within the allowed range. If your requests are greater than the maximum, Autopilot rejects your workload and displays an error message.

The following list summarizes the categories of resource requests:

How to request resources

In Autopilot, you request resources in your Pod specification. The supported minimum and maximum resources that you can request change based on the hardware configuration of the node on which the Pods run. To learn how to request specific hardware configurations, refer to the following pages:

Default resource requests

If you don't specify resource requests for some containers in a Pod, Autopilot applies default values. These defaults are suitable for many smaller workloads.

Note: We recommend that you explicitly set your resource requests for each container to meet your application requirements, as these default values might not be sufficient, or optimal.

Additionally, Autopilot applies the following default resource requests regardless of the selected compute class or hardware configuration:

For more information about Autopilot cluster limits, see Quotas and limits.

Default requests for compute classes

Autopilot applies the following default values to resources that are not defined in the Pod specification for Pods that run on compute classes. If you only set one of the requests and leave the other blank, GKE uses the CPU:memory ratio defined in the Minimum and maximum requests section to set the missing request to a value that complies with the ratio.

Compute class Resource Default request General-purpose (default) CPU 0.5 vCPU Memory 2 GiB Accelerator See the Default resources for accelerators section. Balanced CPU 0.5 vCPU Memory 2 GiB Performance CPU Memory Ephemeral storage Scale-Out CPU 0.5 vCPU Memory 2 GiB Default requests for accelerators

In version 1.29.4-gke.1427000 and later, Autopilot doesn't enforce default requests for accelerators. To learn more, see Pricing.

The following table describes the default values that GKE assigns to Pods that don't specify values in the requests field of the Pod specification. This table applies to Pods that run on versions earlier than 1.29.4-gke.1427000 that use the Accelerator compute class, which is the recommended way to run accelerators in Autopilot clusters.

Accelerator Resource Total default request NVIDIA B200 GPUs
nvidia-b200 No default requests enforced. NVIDIA H200 (141GB) GPUs
nvidia-h200-141gb No default requests enforced. NVIDIA H100 Mega (80GB) GPUs
nvidia-h100-mega-80gb CPU Memory Ephemeral storage NVIDIA H100 (80GB) GPUs
nvidia-h100-80gb CPU Memory Ephemeral storage NVIDIA A100 (40GB) GPUs
nvidia-tesla-a100 CPU Memory NVIDIA A100 (80GB) GPUs
nvidia-a100-80gb CPU Memory Ephemeral storage NVIDIA L4 GPUs
nvidia-l4 CPU Memory NVIDIA T4 GPUs
nvidia-tesla-t4 CPU Memory TPU Trillium (v6e)
tpu-v6e-slice (single-host) CPU All topologies: 1 mCPU Memory All topologies: 1 MiB TPU Trillium (v6e)
tpu-v6e-slice (multi-host) CPU All topologies: 1 mCPU Memory All topologies: 1 MiB TPU v5e
tpu-v5-lite-podslice (multi-host) CPU All topologies: 1 mCPU Memory All topologies: 1 MiB TPU v5p
tpu-v5p-slice CPU All topologies: 1 mCPU Memory All topologies: 1 MiB TPU v4
tpu-v4-podslice CPU All topologies: 1 mCPU Memory All topologies: 1 MiB Supported GPUs without the Accelerator compute class

If you don't use the Accelerator compute class, only the following GPUs are supported. The default resource requests for these GPUs are the same as in the Accelerator compute class:

Minimum and maximum resource requests

The total resources requested by your deployment configuration should be within the supported minimum and maximum values that Autopilot allows. The following conditions apply:

Minimums and maximums for compute classes

The following table describes the minimum, maximum, and allowed CPU-to-memory ratio for each compute class that Autopilot supports:

Compute class CPU:memory ratio (vCPU:GiB) Resource Minimum Maximum General-purpose (default) Between 1:1 and 1:6.5 CPU

The value depends on whether your cluster supports bursting, as follows:

To check whether your cluster supports bursting, see Bursting availability in GKE.

30 vCPU Memory

The value depends on whether your cluster supports bursting, as follows:

To check whether your cluster supports bursting, see Bursting availability in GKE.

110 GiB Accelerator See Minimums and maximums for accelerators Balanced Between 1:1 and 1:8 CPU 0.25 vCPU

222 vCPU

If minimum CPU platform selected:

Memory 0.5 GiB

851 GiB

If minimum CPU platform selected:

Performance N/A CPU 0.001 vCPU Memory 1 MiB Ephemeral storage 10 MiB

In GKE version 1.29.3-gke.1038000 and later, you can specify a maximum ephemeral storage request of 56 Ti.

The C4D machine series is available with version 1.33.0-gke.1439000 or later and supports requests of up to 56 Ti with or without Local SSD.

For versions earlier than 1.29.3-gke.1038000, the following limits apply:

Scale-Out Exactly 1:4 CPU 0.25 vCPU Memory 1 GiB

To learn how to request compute classes in your Autopilot Pods, refer to Choose compute classes for Autopilot Pods.

Minimums and maximums for accelerators

The following sections describe the minimum, maximum, and allowed CPU-to-memory ratio for Pods that use hardware accelerators like GPUs and TPUs.

Unless specified, the maximum ephemeral storage supported is 122 GiB in versions 1.28.6-gke.1369000 or later, and 1.29.1-gke.1575000 or later. For earlier versions, the maximum ephemeral storage supported is 10 GiB.

Minimums and maximums for the Accelerator compute class

The following table shows the minimum and maximum resource requests for Pods that use the Accelerator compute class, which is the recommended way to run accelerators with GKE Autopilot clusters. In the Accelerator compute class, GKE doesn't enforce CPU-to-memory request ratios.

Accelerator type Resource Minimum Maximum NVIDIA B200
nvidia-B200 CPU No minimum requests enforced Memory No minimum requests enforced Ephemeral storage No minimum requests enforced NVIDIA H200 (141GB)
nvidia-h200-141gb CPU No minimum requests enforced Memory No minimum requests enforced Ephemeral storage No minimum requests enforced NVIDIA H100 Mega (80GB)
nvidia-h100-mega-80gb CPU Memory Ephemeral storage NVIDIA H100 (80GB)
nvidia-h100-80gb CPU Memory Ephemeral storage NVIDIA A100 (40GB)
nvidia-tesla-a100 CPU 0.001 vCPU

The sum of CPU requests of all DaemonSets that run on an A100 GPU node must not exceed 2 vCPU.

Memory 1 MiB

The sum of memory requests of all DaemonSets that run on an A100 GPU node must not exceed 14 GiB.

NVIDIA A100 (80GB)
nvidia-a100-80gb CPU 0.001 vCPU

The sum of CPU requests of all DaemonSets that run on an A100 (80GB) GPU node must not exceed 2 vCPU.

Memory 1 MiB

The sum of memory requests of all DaemonSets that run on an A100 (80GB) GPU node must not exceed 14 GiB.

Ephemeral storage 512 MiB NVIDIA L4
nvidia-l4 CPU 0.001 vCPU

The sum of CPU requests of all DaemonSets that run on an L4 GPU node must not exceed 2 vCPU.

Memory 1 MiB

The sum of memory requests of all DaemonSets that run on an L4 GPU node must not exceed 14 GiB.

NVIDIA Tesla T4
nvidia-tesla-t4 CPU 0.001 vCPU Memory 1 MiB TPU v5e
tpu-v5-lite-podslice CPU 0.001 vCPU Memory 1 MiB Ephemeral storage 10 MiB 56 TiB TPU v5p
tpu-v5p-slice CPU 0.001 vCPU 280 vCPU Memory 1 MiB 448 GiB Ephemeral storage 10 MiB 56 TiB TPU v4
tpu-v4-podslice CPU 0.001 vCPU 240 vCPU Memory 1 MiB 407 GiB Ephemeral storage 10 MiB 56 TiB Note: All A100 (80GB) GPU nodes use local SSDs for node boot disks at fixed sizes based on the number of GPUs. You're billed separately for the attached Local SSDs. For details, see Autopilot pricing. A100 (40GB) GPUs don't use local SSDs for node boot disks.

To learn how to request GPUs in your Autopilot Pods, refer to Deploy GPU workloads in Autopilot.

Minimums and maximums for GPUs without a compute class

The following table shows the minimum and maximum resource requests for Pods that don't use the Accelerator compute class:

GPU type CPU:memory ratio (vCPU:GiB) Resource Minimum Maximum NVIDIA A100 (40GB)
nvidia-tesla-a100 Not enforced CPU

The sum of CPU requests of all DaemonSets that run on an A100 GPU node must not exceed 2 vCPU.

Memory

The sum of memory requests of all DaemonSets that run on an A100 GPU node must not exceed 14 GiB.

NVIDIA A100 (80GB)
nvidia-a100-80gb Not enforced CPU

The sum of CPU requests of all DaemonSets that run on an A100 (80GB) GPU node must not exceed 2 vCPU.

Memory

The sum of memory requests of all DaemonSets that run on an A100 (80GB) GPU node must not exceed 14 GiB.

Ephemeral storage NVIDIA L4
nvidia-l4 CPU

The sum of CPU requests of all DaemonSets that run on an L4 GPU node must not exceed 2 vCPU.

Memory

The sum of memory requests of all DaemonSets that run on an L4 GPU node must not exceed 14 GiB.

NVIDIA Tesla T4
nvidia-tesla-t4 Between 1:1 and 1:6.25 CPU 0.5 vCPU Memory 0.5 GiB Note: All A100 (80GB) GPU nodes use local SSDs for node boot disks at fixed sizes based on the number of GPUs. You're billed separately for the attached Local SSDs. For details, see Autopilot pricing. This doesn't apply to A100 (40GB) GPUs.

To learn how to request GPUs in your Autopilot Pods, refer to Deploy GPU workloads in Autopilot.

Resource requests for workload separation and extended duration

Autopilot lets you manipulate Kubernetes scheduling and eviction behavior using methods such as the following:

If your specified requests are less than the minimums, the behavior of Autopilot changes based on the method that you used, as follows:

The following table describes the default requests and the minimum resource requests that you can specify. If a configuration or compute class isn't in this table, Autopilot doesn't enforce special minimum or default values.

Compute class Resource Default Minimum General-purpose CPU 0.5 vCPU 0.5 vCPU Memory 2 GiB 0.5 GiB Balanced CPU 2 vCPU 1 vCPU Memory 8 GiB 4 GiB Scale-Out CPU 0.5 vCPU 0.5 vCPU Memory 2 GiB 2 GiB Init containers

Init containers run in serial and must complete before the application containers start. If you don't specify resource requests for your Autopilot init containers, GKE allocates the total resources available to the Pod to each init container. This behavior is different than in GKE Standard, where each init container can use any unallocated resources available on the node on which the Pod is scheduled.

Unlike application containers, GKE recommends that you don't specify resource requests for Autopilot init containers, so that each container gets the full resources available to the Pod. If you request less resources than the defaults, you constrain your init container. If you request more resources than the Autopilot defaults, you might increase your bill for the lifetime of the Pod.

Setting resource limits in Autopilot

Kubernetes lets you set both requests and limits for resources in your Pod specification. The behavior of your Pods changes depending on whether your limits are different than your requests, as described in the following table:

Values set Autopilot behavior requests equal to limits Pods use the Guaranteed QoS class. Note: Ephemeral storage limits must always be explicitly set equal to requests. GKE modifies your Pods to enforce this rule. requests set, limits not set

The behavior depends on whether your cluster supports bursting, as follows:

To check whether your cluster supports bursting, see Bursting availability in GKE.

requests not set, limits set Autopilot sets requests to the value of limits, which is the default Kubernetes behavior.

Before:

resources:
  limits:
    cpu: "400m"

After:

resources:
  requests:
    cpu: "400m"
  limits:
    cpu: "400m"
requests less than limits

The behavior depends on whether your cluster supports bursting, as follows:

To check whether your cluster supports bursting, see Bursting availability in GKE.

requests greater than limits Autopilot sets requests to the value of limits.

Before:

resources:
  requests:
    cpu: "450m"
  limits:
    cpu: "400m"

After:

resources:
  requests:
    cpu: "400m"
  limits:
    cpu: "400m"
requests not set, limits not set

Autopilot sets requests to the default values for the compute class or hardware configuration.

The behavior for limits depends on whether your cluster supports bursting, as follows:

To check whether your cluster supports bursting, see Bursting availability in GKE.

In most situations, set adequate resource requests and equal limits for your workloads.

For workloads that temporarily need more resources than their steady-state, like during boot up or during higher traffic periods, set your limits higher than your requests to let the Pods burst. For details, see Configure Pod bursting in GKE.

Automatic resource management in Autopilot

If your specified resource requests for your workloads are outside of the allowed ranges, or if you don't request resources for some containers, Autopilot modifies your workload configuration to comply with the allowed limits. Autopilot calculates resource ratios and the resource scale up requirements after applying default values to containers with no request specified.

By default, when Autopilot automatically scales a resource up to meet a minimum or default resource value, GKE allocates the extra capacity to the first container in the Pod manifest. In GKE version 1.27.2-gke.2200 and later, you can tell GKE to allocate the extra resources to a specific container by adding the following to the annotations field in your Pod manifest:

autopilot.gke.io/primary-container: "CONTAINER_NAME"

Replace CONTAINER_NAME with the name of the container.

Resource modification examples

The following example scenario shows how Autopilot modifies your workload configuration to meet the requirements of your running Pods and containers.

Single container with < 0.05 vCPU Container number Original request Modified request 1 CPU: 30 mCPU
Memory: 0.5 GiB
Ephemeral storage: 10 MiB
CPU: 50 mCPU
Memory: 0.5 GiB
Ephemeral storage: 10 MiB
Multiple containers with total CPU < 0.05 vCPU Container number Original requests Modified requests 1 CPU: 10 mCPU
Memory: 0.5 GiB
Ephemeral storage: 10 MiB CPU: 30 mCPU
Memory: 0.5 GiB
Ephemeral storage: 10 MiB 2 CPU: 10 mCPU
Memory: 0.5 GiB
Ephemeral storage: 10 MiB CPU: 10 mCPU
Memory: 0.5 GiB
Ephemeral storage: 10 MiB 3 CPU: 10 mvCPU
Memory: 0.5 GiB
Ephemeral storage: 10 MiB CPU: 10 mCPU
Memory: 0.5 GiB
Ephemeral storage: 10 MiB Total Pod resources CPU: 50 mCPU
Memory: 1.5 GiB
Ephemeral storage: 30 MiB Single container with memory too low for requested CPU

In this example, the memory is too low for the amount of CPU (1 vCPU:1 GiB minimum). The minimum allowed ratio for CPU to memory is 1:1. If the ratio is lower than that, the memory request is increased.

Container number Original request Modified request 1 CPU: 4 vCPU
Memory: 1 GiB
Ephemeral storage: 10 MiB CPU: 4 vCPU
Memory: 4 GiB
Ephemeral storage: 10 MiB Total Pod resources CPU: 4 vCPU
Memory: 4 GiB
Ephemeral storage: 10 MiB What's next

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4