A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://cloud.google.com/kubernetes-engine/docs/how-to/horizontal-pod-autoscaling below:

Configuring horizontal Pod autoscaling | Google Kubernetes Engine (GKE)

This page shows you how to scale your deployments in Google Kubernetes Engine (GKE) by automatically adjusting your resources using metrics like resource allocation, load balancer traffic, custom metrics, or multiple metrics simultaneously. This page also provides step-by-step instructions for configuring a Horizontal Pod Autoscaler (HPA) profile, including how to view, delete, clean, and troubleshoot your HPA object. A Deployment is a Kubernetes API object that lets you run multiple replicas of Pods that are distributed among the nodes in a cluster.

This page is for Operators and Developers who manage application scaling in GKE and want to understand how to dynamically optimize performance and maintain cost efficiency through horizontal Pod autoscaling. To learn more about common roles and example tasks referenced in Google Cloud content, see Common GKE user roles and tasks.

Before you begin

Before you start, make sure that you have performed the following tasks:

API versions for HorizontalPodAutoscaler objects

When you use the Google Cloud console, HorizontalPodAutoscaler objects are created using the autoscaling/v2 API.

When you use kubectl to create or view information about a Horizontal Pod Autoscaler, you can specify either the autoscaling/v1 API or the autoscaling/v2 API.

To check which API versions are supported, use the kubectl api-versions command.

You can specify which API to use when viewing details about a Horizontal Pod Autoscaler that uses apiVersion: autoscaling/v2.

Create the example Deployment

Before you can create a Horizontal Pod Autoscaler, you must create the workload it monitors. The examples in this page apply different Horizontal Pod Autoscaler configurations to the following nginx Deployment. Separate examples show a Horizontal Pod Autoscaler based on resource utilization, based on a custom or external metric, and based on multiple metrics.

Save the following to a file named nginx.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
        resources:
          # You must specify requests for CPU to autoscale
          # based on CPU utilization
          requests:
            cpu: "250m"

This manifest specifies a value for CPU requests. If you want to autoscale based on a resource's utilization as a percentage, you must specify requests for that resource. If you don't specify requests, you can autoscale based only on the absolute value of the resource's utilization, such as milliCPUs for CPU utilization.

To create the Deployment, apply the nginx.yaml manifest:

kubectl apply -f nginx.yaml

The Deployment has spec.replicas set to 3, so three Pods are deployed. You can verify this using the kubectl get deployment nginx command.

Each of the examples in this page applies a different Horizontal Pod Autoscaler to an example nginx Deployment.

Autoscaling based on resources utilization

This example creates HorizontalPodAutoscaler object to autoscale the nginx Deployment when CPU utilization surpasses 50%, and ensures that there is always a minimum of 1 replica and a maximum of 10 replicas.

You can create a Horizontal Pod Autoscaler that targets CPU using the Google Cloud console, the kubectl apply command, or for average CPU only, the kubectl autoscale command.

Note: This example uses apiVersion: autoscaling/v1. For more information about the available APIs, see API versions for HorizontalPodAutoscaler objects. Console
  1. Go to the Workloads page in the Google Cloud console.

    Go to Workloads

  2. Click the name of the nginx Deployment.

  3. Click list Actions > Autoscale.

  4. Specify the following values:

  5. Click Done.

  6. Click Autoscale.

kubectl apply

Save the following YAML manifest as a file named nginx-hpa.yaml:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

To create the HPA, apply the manifest using the following command:

kubectl apply -f nginx-hpa.yaml
kubectl autoscale

To create a HorizontalPodAutoscaler object that only targets average CPU utilization, you can use the kubectl autoscale command:

kubectl autoscale deployment nginx --cpu-percent=50 --min=1 --max=10
Note: You can combine the --dry-run and -o yaml flags to print a YAML manifest for a Horizontal Pod Autoscaler without actually creating it.

To get a list of Horizontal Pod Autoscalers in the cluster, use the following command:

kubectl get hpa

The output is similar to the following:

NAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   0%/50%    1         10        3          61s

To get details about the Horizontal Pod Autoscaler, you can use the Google Cloud console or the kubectl command.

Console
  1. Go to the Workloads page in the Google Cloud console.

    Go to Workloads

  2. Click the name of the nginx Deployment.

  3. View the Horizontal Pod Autoscaler configuration in the Autoscaler section.

  4. View more details about autoscaling events in the Events tab.

kubectl get

To get details about the Horizontal Pod Autoscaler, you can use kubectl get hpa with the -o yaml flag. The status field contains information about the current number of replicas and any recent autoscaling events.

kubectl get hpa nginx -o yaml

The output is similar to the following:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ScaleDownStabilized","message":"recent
      recommendations were higher than current one, applying the highest recent recommendation"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ValidMetricFound","message":"the
      HPA was able to successfully calculate a replica count from cpu resource utilization
      (percentage of request)"},{"type":"ScalingLimited","status":"False","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"DesiredWithinRange","message":"the
      desired count is within the acceptable range"}]'
    autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":0,"currentAverageValue":"0"}}]'
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"spec":{"maxReplicas":10,"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"nginx"},"targetCPUUtilizationPercentage":50}}
  creationTimestamp: "2019-10-30T19:42:43Z"
  name: nginx
  namespace: default
  resourceVersion: "220050"
  selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/nginx
  uid: 70d1067d-fb4d-11e9-8b2a-42010a8e013f
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  targetCPUUtilizationPercentage: 50
status:
  currentCPUUtilizationPercentage: 0
  currentReplicas: 3
  desiredReplicas: 3

Before following the remaining examples in this page, delete the HPA:

kubectl delete hpa nginx

When you delete a Horizontal Pod Autoscaler, the number of replicas of the Deployment remain the same. A Deployment does not automatically revert back to its state before the Horizontal Pod Autoscaler was applied.

You can learn more about deleting a Horizontal Pod Autoscaler.

Autoscaling based on load balancer traffic

Traffic-based autoscaling is a capability of GKE that integrates traffic utilization signals from load balancers to autoscale Pods.

Using traffic as an autoscaling signal might be helpful since traffic is a leading indicator of load that is complementary to CPU and memory. Built-in integration with GKE ensures that the setup is easy and that autoscaling reacts to traffic spikes quickly to meet demand.

Traffic-based autoscaling is enabled by the Gateway controller and its global traffic management capabilities. To learn more, see Traffic-based autoscaling.

Autoscaling based on load balancer traffic is only available for Gateway workloads.

Requirements

Traffic-based autoscaling has the following requirements:

Limitations

Traffic-based autoscaling has the following limitations:

Deploy traffic-based autoscaling

The following exercise uses the HorizontalPodAutoscaler to autoscale the store-autoscale Deployment based on the traffic it receives. A Gateway accepts ingress traffic from the internet for the Pods. The autoscaler compares traffic signals from the Gateway with the per-Pod traffic capacity that is configured on the store-autoscale Service resource. By generating traffic to the Gateway, you influence the number of Pods deployed.

The following diagram demonstrates how traffic-based autoscaling works:

To deploy traffic-based autoscaling, perform the following steps:

  1. For Standard clusters, confirm that the GatewayClasses are installed in your cluster. For Autopilot clusters, the GatewayClasses are installed by default.

    kubectl get gatewayclass
    

    The output confirms that the GKE GatewayClass resources are ready to use in your cluster:

    NAME                               CONTROLLER                  ACCEPTED   AGE
    gke-l7-global-external-managed     networking.gke.io/gateway   True       16h
    gke-l7-regional-external-managed   networking.gke.io/gateway   True       16h
    gke-l7-gxlb                        networking.gke.io/gateway   True       16h
    gke-l7-rilb                        networking.gke.io/gateway   True       16h
    

    If you don't see this output, enable the Gateway API in your GKE cluster.

  2. Deploy the sample application and Gateway load balancer to your cluster:

    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-networking-recipes/master/gateway/docs/store-autoscale.yaml
    

    The sample application creates:

    The Service capacity is a critical element when using traffic-based autoscaling because it determines the amount of per-Pod traffic that triggers an autoscaling event. It is configured using a maxRatePerEndpoint field on a GCPBackendPolicy associated with the Service, which defines the maximum traffic a Service should receive in requests per second, per Pod. Service capacity is specific to your application.

    For more information, see Determining your Service's capacity.

  3. Save the following manifest as hpa.yaml:

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: store-autoscale
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: store-autoscale
      minReplicas: 1
      maxReplicas: 10
      metrics:
      - type: Object
        object:
          describedObject:
            kind: Service
            name: store-autoscale
          metric:
            name: "autoscaling.googleapis.com|gclb-capacity-fullness"
          target:
            averageValue: 70
            type: AverageValue
    
    Note: If you previously used the autoscaling.googleapis.com|gclb-capacity-utilization metric name, we recommend that you switch to the autoscaling.googleapis.com|gclb-capacity-fullness metric name instead.

    This manifest describes a HorizontalPodAutoscaler with the following properties:

    Note: A Deployment or a Service cannot be referenced by more than one Horizontal Pod Autoscaler. If this condition is not met, the Horizontal Pod Autoscaler stops autoscaling and errors appear in the Horizontal Pod Autoscaler events.

The Horizontal Pod Autoscaler results in the following traffic behavior:

You can also deploy a traffic generator to validate traffic-based autoscaling behavior.

At 30 RPS, the Deployment is scaled to 5 replicas so that each replica ideally receives 6 RPS of traffic, which would be 60% utilization per Pod. This is under the 70% target utilization and so the Pods are scaled appropriately. Depending on traffic fluctuations, the number of autoscaled replicas might also fluctuate. For a more detailed description of how the number of replicas is computed, see Autoscaling behavior.

Autoscaling based on a custom or external metric

To create horizontal Pod autoscalers for custom metrics and external metrics, see Optimize Pod autoscaling based on metrics.

Autoscaling based on multiple metrics

This example creates a Horizontal Pod Autoscaler that autoscales based on CPU utilization and a custom metric named packets_per_second.

If you followed the previous example and still have a Horizontal Pod Autoscaler named nginx, delete it before following this example.

This example requires apiVersion: autoscaling/v2. For more information about the available APIs, see API versions for HorizontalPodAutoscaler objects.

Before you can autoscale based on a custom metric, you must create the custom metric and configure your workload to export the metric to Cloud Monitoring. For this reason, the packets_per_second metric in the manifest below is included for illustration, but commented out. See custom metrics and the Monitoring documentation for creating custom metrics.

Save this YAML manifest as a file named nginx-multiple.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 100Mi
  # Uncomment these lines if you create the custom packets_per_second metric and
  # configure your app to export the metric.
  # - type: Pods
  #   pods:
  #     metric:
  #       name: packets_per_second
  #     target:
  #       type: AverageValue
  #       averageValue: 100

Apply the YAML manifest:

kubectl apply -f nginx-multiple.yaml

When created, the Horizontal Pod Autoscaler monitors the nginx Deployment for average CPU utilization, average memory utilization, and (if you uncommented it) the custom packets_per_second metric. The Horizontal Pod Autoscaler autoscales the Deployment based on the metric whose value would create the larger autoscale event.

Configure the Performance HPA profile

GKE uses a high-performance architecture for Horizontal Pod Autoscaling (HPA) that provides faster reaction times for scaling decisions and supports up to 1,000 HorizontalPodAutoscaler objects in a cluster. The Performance HPA profile is enabled by default for the following cluster configurations:

You can also enable the Performance HPA profile on existing clusters if they meet the requirements.

A Standard cluster is exempt from auto-enablement of the Performance HPA profile if it meets all of the following conditions:

Requirements

To enable the Performance HPA profile, verify that your Autopilot and Standard clusters meet the following requirements:

Enable the Performance HPA profile in a new cluster Autopilot

To create a new Autopilot cluster that has the Performance HPA profile enabled, use the following command:

gcloud container clusters create-auto CLUSTER_NAME \
    --location=LOCATION \
    --project=PROJECT_ID \
    --cluster-version=CLUSTER_VERSION \
    --hpa-profile=performance

Replace:

Standard

To create a new Standard cluster with Performance HPA profile enabled, use the following command:

gcloud container clusters create CLUSTER_NAME \
    --location=LOCATION \
    --project=PROJECT_ID \
    --cluster-version=CLUSTER_VERSION \
    --hpa-profile=performance

Replace:

Enable the Performance HPA profile in an existing cluster

To enable the Performance HPA profile in an existing cluster, use the following command:

gcloud container clusters update CLUSTER_NAME \
    --location=LOCATION \
    --project=PROJECT_ID \
    --hpa-profile=performance

Replace:

Note: The Performance HPA profile enhances monitoring by increasing the gke-metrics-agent resource requests, and triggers a simultaneous restart of its Pods. This may cause temporary disruption on resource-constrained nodes due to Pod rescheduling. Disable the Performance HPA profile

To disable Performance HPA profile in a cluster, use the following command:

gcloud container clusters update CLUSTER_NAME \
    --location=LOCATION \
    --project=PROJECT_ID \
    --hpa-profile=none

Replace:

Viewing details about a Horizontal Pod Autoscaler

To view a Horizontal Pod Autoscaler's configuration and statistics, use the following command:

kubectl describe hpa HPA_NAME

Replace HPA_NAME with the name of your HorizontalPodAutoscaler object.

If the Horizontal Pod Autoscaler uses apiVersion: autoscaling/v2 and is based on multiple metrics, the kubectl describe hpa command only shows the CPU metric. To see all metrics, use the following command instead:

kubectl describe hpa.v2.autoscaling HPA_NAME

Replace HPA_NAME with the name of your HorizontalPodAutoscaler object.

Each Horizontal Pod Autoscaler's current status is shown in Conditions field, and autoscaling events are listed in the Events field.

Note: If you've enabled the Performance HPA profile, Events: Reason is listed as HpaProfilePerformance.

The output is similar to the following:

Name:                                                  nginx
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           kubectl.kubernetes.io/last-applied-configuration:
                                                         {"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"s...
CreationTimestamp:                                     Tue, 05 May 2020 20:07:11 +0000
Reference:                                             Deployment/nginx
Metrics:                                               ( current / target )
  resource memory on pods:                             2220032 / 100Mi
  resource cpu on pods  (as a percentage of request):  0% (0) / 50%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from memory resource
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:                                                <none>
Deleting a Horizontal Pod Autoscaler

You can delete a Horizontal Pod Autoscaler using the Google Cloud console or the kubectl delete command.

Console

To delete the nginx Horizontal Pod Autoscaler:

  1. Go to the Workloads page in the Google Cloud console.

    Go to Workloads

  2. Click the name of the nginx Deployment.

  3. Click list Actions > Autoscale.

  4. Click Delete.

kubectl delete

To delete the nginx Horizontal Pod Autoscaler, use the following command:

kubectl delete hpa nginx

When you delete a Horizontal Pod Autoscaler, the Deployment or (or other deployment object) remains at its existing scale, and does not revert back to the number of replicas in the Deployment's original manifest. To manually scale the Deployment back to three Pods, you can use the kubectl scale command:

kubectl scale deployment nginx --replicas=3
Cleaning up
  1. Delete the Horizontal Pod Autoscaler, if you have not done so:

    kubectl delete hpa nginx
    
  2. Delete the nginx Deployment:

    kubectl delete deployment nginx
    
  3. Optionally, delete the cluster.

Troubleshooting

This section shows troubleshooting steps for problems using Horizontal Pod Autoscaling.

Horizontal Pod Autoscaler displays a unable to fetch pod metrics for pod error

When you set up a Horizontal Pod Autoscaler, you might see warning messages like the following:

unable to fetch pod metrics for pod

It's normal to see this message when the metrics server starts up. However, if you continue to see the warnings and you notice that Pods are not scaling for your workload, ensure you have specified resource requests for each container in your workload. To use resource utilization percentage targets with horizontal Pod autoscaling, you must configure requests for that resource for each container running in each Pod in the workload. Otherwise, the Horizontal Pod Autoscaler cannot perform the calculations it needs to, and takes no action related to that metric.

Horizontal Pod Autoscaler displays a multiple services selecting the same target of... event

A Horizontal Pod Autoscaler displays a multiple services selecting the same target of <hpa>: <services> error if it detects that you are using traffic-based autoscaling with multiple services associated with the target of the Horizontal Pod Autoscaler (typically a Deployment).

Traffic-based autoscaling only supports configurations where exactly one service is associated with the autoscaled resource, see Autoscaling based on load balancer traffic. The error message lists the services that have been found.

To resolve the issue, ensure only one service is associated with the Horizontal Pod Autoscaler.

What's next

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4