This page shows you how to scale your deployments in Google Kubernetes Engine (GKE) by automatically adjusting your resources using metrics like resource allocation, load balancer traffic, custom metrics, or multiple metrics simultaneously. This page also provides step-by-step instructions for configuring a Horizontal Pod Autoscaler (HPA) profile, including how to view, delete, clean, and troubleshoot your HPA object. A Deployment is a Kubernetes API object that lets you run multiple replicas of Pods that are distributed among the nodes in a cluster.
This page is for Operators and Developers who manage application scaling in GKE and want to understand how to dynamically optimize performance and maintain cost efficiency through horizontal Pod autoscaling. To learn more about common roles and example tasks referenced in Google Cloud content, see Common GKE user roles and tasks.
Before you beginBefore you start, make sure that you have performed the following tasks:
gcloud components update
. Note: For existing gcloud CLI installations, make sure to set the compute/region
property. If you use primarily zonal clusters, set the compute/zone
instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location
. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.HorizontalPodAutoscaler
objects
When you use the Google Cloud console, HorizontalPodAutoscaler
objects are created using the autoscaling/v2
API.
When you use kubectl
to create or view information about a Horizontal Pod Autoscaler, you can specify either the autoscaling/v1
API or the autoscaling/v2
API.
apiVersion: autoscaling/v1
is the default, and lets you autoscale based only on CPU utilization. To autoscale based on other metrics, using apiVersion: autoscaling/v2
is recommended. The example in Create the example Deployment uses apiVersion: autoscaling/v1
.
apiVersion: autoscaling/v2
is recommended for creating new HorizontalPodAutoscaler
objects. It lets you autoscale based on multiple metrics, including custom or external metrics. All other examples in this page use apiVersion: autoscaling/v2
.
To check which API versions are supported, use the kubectl api-versions
command.
You can specify which API to use when viewing details about a Horizontal Pod Autoscaler that uses apiVersion: autoscaling/v2
.
Before you can create a Horizontal Pod Autoscaler, you must create the workload it monitors. The examples in this page apply different Horizontal Pod Autoscaler configurations to the following nginx
Deployment. Separate examples show a Horizontal Pod Autoscaler based on resource utilization, based on a custom or external metric, and based on multiple metrics.
Save the following to a file named nginx.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
resources:
# You must specify requests for CPU to autoscale
# based on CPU utilization
requests:
cpu: "250m"
This manifest specifies a value for CPU requests. If you want to autoscale based on a resource's utilization as a percentage, you must specify requests for that resource. If you don't specify requests, you can autoscale based only on the absolute value of the resource's utilization, such as milliCPUs for CPU utilization.
To create the Deployment, apply the nginx.yaml
manifest:
kubectl apply -f nginx.yaml
The Deployment has spec.replicas
set to 3, so three Pods are deployed. You can verify this using the kubectl get deployment nginx
command.
Each of the examples in this page applies a different Horizontal Pod Autoscaler to an example nginx Deployment.
Autoscaling based on resources utilizationThis example creates HorizontalPodAutoscaler
object to autoscale the nginx
Deployment when CPU utilization surpasses 50%, and ensures that there is always a minimum of 1 replica and a maximum of 10 replicas.
You can create a Horizontal Pod Autoscaler that targets CPU using the Google Cloud console, the kubectl apply
command, or for average CPU only, the kubectl autoscale
command.
apiVersion: autoscaling/v1
. For more information about the available APIs, see API versions for HorizontalPodAutoscaler
objects. Console
Go to the Workloads page in the Google Cloud console.
Click the name of the nginx
Deployment.
Click list Actions > Autoscale.
Specify the following values:
Click Done.
Click Autoscale.
kubectl apply
Save the following YAML manifest as a file named nginx-hpa.yaml
:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: nginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
To create the HPA, apply the manifest using the following command:
kubectl apply -f nginx-hpa.yaml
kubectl autoscale
To create a HorizontalPodAutoscaler
object that only targets average CPU utilization, you can use the kubectl autoscale
command:
kubectl autoscale deployment nginx --cpu-percent=50 --min=1 --max=10
Note: You can combine the --dry-run
and -o yaml
flags to print a YAML manifest for a Horizontal Pod Autoscaler without actually creating it.
To get a list of Horizontal Pod Autoscalers in the cluster, use the following command:
kubectl get hpa
The output is similar to the following:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx 0%/50% 1 10 3 61s
To get details about the Horizontal Pod Autoscaler, you can use the Google Cloud console or the kubectl
command.
Go to the Workloads page in the Google Cloud console.
Click the name of the nginx
Deployment.
View the Horizontal Pod Autoscaler configuration in the Autoscaler section.
View more details about autoscaling events in the Events tab.
kubectl get
To get details about the Horizontal Pod Autoscaler, you can use kubectl get hpa
with the -o yaml
flag. The status
field contains information about the current number of replicas and any recent autoscaling events.
kubectl get hpa nginx -o yaml
The output is similar to the following:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
annotations:
autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ScaleDownStabilized","message":"recent
recommendations were higher than current one, applying the highest recent recommendation"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ValidMetricFound","message":"the
HPA was able to successfully calculate a replica count from cpu resource utilization
(percentage of request)"},{"type":"ScalingLimited","status":"False","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"DesiredWithinRange","message":"the
desired count is within the acceptable range"}]'
autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":0,"currentAverageValue":"0"}}]'
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"spec":{"maxReplicas":10,"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"nginx"},"targetCPUUtilizationPercentage":50}}
creationTimestamp: "2019-10-30T19:42:43Z"
name: nginx
namespace: default
resourceVersion: "220050"
selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/nginx
uid: 70d1067d-fb4d-11e9-8b2a-42010a8e013f
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
targetCPUUtilizationPercentage: 50
status:
currentCPUUtilizationPercentage: 0
currentReplicas: 3
desiredReplicas: 3
Before following the remaining examples in this page, delete the HPA:
kubectl delete hpa nginx
When you delete a Horizontal Pod Autoscaler, the number of replicas of the Deployment remain the same. A Deployment does not automatically revert back to its state before the Horizontal Pod Autoscaler was applied.
You can learn more about deleting a Horizontal Pod Autoscaler.
Autoscaling based on load balancer trafficTraffic-based autoscaling is a capability of GKE that integrates traffic utilization signals from load balancers to autoscale Pods.
Using traffic as an autoscaling signal might be helpful since traffic is a leading indicator of load that is complementary to CPU and memory. Built-in integration with GKE ensures that the setup is easy and that autoscaling reacts to traffic spikes quickly to meet demand.
Traffic-based autoscaling is enabled by the Gateway controller and its global traffic management capabilities. To learn more, see Traffic-based autoscaling.
Autoscaling based on load balancer traffic is only available for Gateway workloads.
RequirementsTraffic-based autoscaling has the following requirements:
gke-l7-global-external-managed
, gke-l7-regional-external-managed
, gke-l7-rilb
, or the gke-l7-gxlb
GatewayClass.Traffic-based autoscaling has the following limitations:
gke-l7-global-external-managed-mc
, gke-l7-regional-external-managed-mc
, gke-l7-rilb-mc
, and gke-l7-gxlb-mc
).LoadBalancer
.maxRatePerEndpoint
field, allow sufficient time (usually one minute, but potentially up to 15 minutes in large clusters) for the load balancer to be updated with this change, before configuring the Horizontal Pod Autoscaler with traffic-based metrics. This ensures your service won't temporarily experience a situation where your cluster tries to autoscale based on metrics emitted by a load balancer still undergoing configuration.The following exercise uses the HorizontalPodAutoscaler
to autoscale the store-autoscale
Deployment based on the traffic it receives. A Gateway accepts ingress traffic from the internet for the Pods. The autoscaler compares traffic signals from the Gateway with the per-Pod traffic capacity that is configured on the store-autoscale
Service resource. By generating traffic to the Gateway, you influence the number of Pods deployed.
The following diagram demonstrates how traffic-based autoscaling works:
To deploy traffic-based autoscaling, perform the following steps:
For Standard clusters, confirm that the GatewayClasses are installed in your cluster. For Autopilot clusters, the GatewayClasses are installed by default.
kubectl get gatewayclass
The output confirms that the GKE GatewayClass resources are ready to use in your cluster:
NAME CONTROLLER ACCEPTED AGE
gke-l7-global-external-managed networking.gke.io/gateway True 16h
gke-l7-regional-external-managed networking.gke.io/gateway True 16h
gke-l7-gxlb networking.gke.io/gateway True 16h
gke-l7-rilb networking.gke.io/gateway True 16h
If you don't see this output, enable the Gateway API in your GKE cluster.
Deploy the sample application and Gateway load balancer to your cluster:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-networking-recipes/master/gateway/docs/store-autoscale.yaml
The sample application creates:
GCPBackendPolicy
setting maxRatePerEndpoint
set to 10
. To learn more about Gateway capabilities, see GatewayClass capabilities.store-autoscale
Service.The Service capacity is a critical element when using traffic-based autoscaling because it determines the amount of per-Pod traffic that triggers an autoscaling event. It is configured using a maxRatePerEndpoint
field on a GCPBackendPolicy associated with the Service, which defines the maximum traffic a Service should receive in requests per second, per Pod. Service capacity is specific to your application.
For more information, see Determining your Service's capacity.
Save the following manifest as hpa.yaml
:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: store-autoscale
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: store-autoscale
minReplicas: 1
maxReplicas: 10
metrics:
- type: Object
object:
describedObject:
kind: Service
name: store-autoscale
metric:
name: "autoscaling.googleapis.com|gclb-capacity-fullness"
target:
averageValue: 70
type: AverageValue
Note: If you previously used the autoscaling.googleapis.com|gclb-capacity-utilization
metric name, we recommend that you switch to the autoscaling.googleapis.com|gclb-capacity-fullness
metric name instead.
This manifest describes a HorizontalPodAutoscaler
with the following properties:
minReplicas
and maxReplicas
: sets the minimum and maximum number of replicas for this Deployment. In this configuration, the number of Pods can scale from 1 to 10 replicas.describedObject.name: store-autoscale
: the reference to the store-autoscale
Service that defines the traffic capacity.scaleTargetRef.name: store-autoscale
: the reference to the store-autoscale
Deployment that defines the resource that is scaled by the Horizontal Pod Autoscaler.averageValue: 70
: target average value of capacity utilization. This gives the Horizontal Pod Autoscaler a growth margin so that the running Pods can process excess traffic while new Pods are being created.The Horizontal Pod Autoscaler results in the following traffic behavior:
maxRatePerEndpoint=10
.You can also deploy a traffic generator to validate traffic-based autoscaling behavior.
At 30 RPS, the Deployment is scaled to 5 replicas so that each replica ideally receives 6 RPS of traffic, which would be 60% utilization per Pod. This is under the 70% target utilization and so the Pods are scaled appropriately. Depending on traffic fluctuations, the number of autoscaled replicas might also fluctuate. For a more detailed description of how the number of replicas is computed, see Autoscaling behavior.
Autoscaling based on a custom or external metricTo create horizontal Pod autoscalers for custom metrics and external metrics, see Optimize Pod autoscaling based on metrics.
Autoscaling based on multiple metricsThis example creates a Horizontal Pod Autoscaler that autoscales based on CPU utilization and a custom metric named packets_per_second
.
If you followed the previous example and still have a Horizontal Pod Autoscaler named nginx
, delete it before following this example.
This example requires apiVersion: autoscaling/v2
. For more information about the available APIs, see API versions for HorizontalPodAutoscaler
objects.
Before you can autoscale based on a custom metric, you must create the custom metric and configure your workload to export the metric to Cloud Monitoring. For this reason, the packets_per_second
metric in the manifest below is included for illustration, but commented out. See custom metrics and the Monitoring documentation for creating custom metrics.
Save this YAML manifest as a file named nginx-multiple.yaml
:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 100Mi
# Uncomment these lines if you create the custom packets_per_second metric and
# configure your app to export the metric.
# - type: Pods
# pods:
# metric:
# name: packets_per_second
# target:
# type: AverageValue
# averageValue: 100
Apply the YAML manifest:
kubectl apply -f nginx-multiple.yaml
When created, the Horizontal Pod Autoscaler monitors the nginx
Deployment for average CPU utilization, average memory utilization, and (if you uncommented it) the custom packets_per_second
metric. The Horizontal Pod Autoscaler autoscales the Deployment based on the metric whose value would create the larger autoscale event.
GKE uses a high-performance architecture for Horizontal Pod Autoscaling (HPA) that provides faster reaction times for scaling decisions and supports up to 1,000 HorizontalPodAutoscaler
objects in a cluster. The Performance HPA profile is enabled by default for the following cluster configurations:
You can also enable the Performance HPA profile on existing clusters if they meet the requirements.
A Standard cluster is exempt from auto-enablement of the Performance HPA profile if it meets all of the following conditions:
To enable the Performance HPA profile, verify that your Autopilot and Standard clusters meet the following requirements:
roles/autoscaling.metricsWriter
role assigned.To create a new Autopilot cluster that has the Performance HPA profile enabled, use the following command:
gcloud container clusters create-auto CLUSTER_NAME \
--location=LOCATION \
--project=PROJECT_ID \
--cluster-version=CLUSTER_VERSION \
--hpa-profile=performance
Replace:
CLUSTER_NAME
: The name for the cluster you're creating.LOCATION
: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.PROJECT_ID
: Your Google Cloud project ID.CLUSTER_VERSION
: GKE version 1.31 or later.To create a new Standard cluster with Performance HPA profile enabled, use the following command:
gcloud container clusters create CLUSTER_NAME \
--location=LOCATION \
--project=PROJECT_ID \
--cluster-version=CLUSTER_VERSION \
--hpa-profile=performance
Replace:
CLUSTER_NAME
: The name for the cluster you're creating.LOCATION
: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.PROJECT_ID
: Your Google Cloud project ID.CLUSTER_VERSION
: GKE version 1.31 or later.To enable the Performance HPA profile in an existing cluster, use the following command:
gcloud container clusters update CLUSTER_NAME \
--location=LOCATION \
--project=PROJECT_ID \
--hpa-profile=performance
Replace:
CLUSTER_NAME
: The name of the cluster.LOCATION
: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.PROJECT_ID
: Your Google Cloud project ID.gke-metrics-agent
resource requests, and triggers a simultaneous restart of its Pods. This may cause temporary disruption on resource-constrained nodes due to Pod rescheduling. Disable the Performance HPA profile
To disable Performance HPA profile in a cluster, use the following command:
gcloud container clusters update CLUSTER_NAME \
--location=LOCATION \
--project=PROJECT_ID \
--hpa-profile=none
Replace:
CLUSTER_NAME
: The name of the cluster.LOCATION
: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.PROJECT_ID
: Your Google Cloud project ID.To view a Horizontal Pod Autoscaler's configuration and statistics, use the following command:
kubectl describe hpa HPA_NAME
Replace HPA_NAME
with the name of your HorizontalPodAutoscaler
object.
If the Horizontal Pod Autoscaler uses apiVersion: autoscaling/v2
and is based on multiple metrics, the kubectl describe hpa
command only shows the CPU metric. To see all metrics, use the following command instead:
kubectl describe hpa.v2.autoscaling HPA_NAME
Replace HPA_NAME
with the name of your HorizontalPodAutoscaler
object.
Each Horizontal Pod Autoscaler's current status is shown in Conditions
field, and autoscaling events are listed in the Events
field.
Events: Reason
is listed as HpaProfilePerformance
.
The output is similar to the following:
Name: nginx
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"s...
CreationTimestamp: Tue, 05 May 2020 20:07:11 +0000
Reference: Deployment/nginx
Metrics: ( current / target )
resource memory on pods: 2220032 / 100Mi
resource cpu on pods (as a percentage of request): 0% (0) / 50%
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events: <none>
Deleting a Horizontal Pod Autoscaler
You can delete a Horizontal Pod Autoscaler using the Google Cloud console or the kubectl delete
command.
To delete the nginx
Horizontal Pod Autoscaler:
Go to the Workloads page in the Google Cloud console.
Click the name of the nginx
Deployment.
Click list Actions > Autoscale.
Click Delete.
kubectl delete
To delete the nginx
Horizontal Pod Autoscaler, use the following command:
kubectl delete hpa nginx
When you delete a Horizontal Pod Autoscaler, the Deployment or (or other deployment object) remains at its existing scale, and does not revert back to the number of replicas in the Deployment's original manifest. To manually scale the Deployment back to three Pods, you can use the kubectl scale
command:
kubectl scale deployment nginx --replicas=3
Cleaning up
Delete the Horizontal Pod Autoscaler, if you have not done so:
kubectl delete hpa nginx
Delete the nginx
Deployment:
kubectl delete deployment nginx
Optionally, delete the cluster.
This section shows troubleshooting steps for problems using Horizontal Pod Autoscaling.
Horizontal Pod Autoscaler displays aunable to fetch pod metrics for pod
error
When you set up a Horizontal Pod Autoscaler, you might see warning messages like the following:
unable to fetch pod metrics for pod
It's normal to see this message when the metrics server starts up. However, if you continue to see the warnings and you notice that Pods are not scaling for your workload, ensure you have specified resource requests for each container in your workload. To use resource utilization percentage targets with horizontal Pod autoscaling, you must configure requests for that resource for each container running in each Pod in the workload. Otherwise, the Horizontal Pod Autoscaler cannot perform the calculations it needs to, and takes no action related to that metric.
Horizontal Pod Autoscaler displays amultiple services selecting the same target of...
event
A Horizontal Pod Autoscaler displays a multiple services selecting the same target of <hpa>: <services>
error if it detects that you are using traffic-based autoscaling with multiple services associated with the target of the Horizontal Pod Autoscaler (typically a Deployment).
Traffic-based autoscaling only supports configurations where exactly one service is associated with the autoscaled resource, see Autoscaling based on load balancer traffic. The error message lists the services that have been found.
To resolve the issue, ensure only one service is associated with the Horizontal Pod Autoscaler.
What's nextRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4