This page discusses how automatic and manual upgrades work on Google Kubernetes Engine (GKE) Standard clusters, including links to more information about related tasks and settings. You can use this information to keep your clusters updated for stability and security with minimal disruptions to your workloads.
For a general overview of cluster upgrades, see About GKE cluster upgrades. For information on how cluster upgrades work specifically for Autopilot, see Autopilot cluster upgrades.
How cluster and node pool upgrades workThis section discusses what happens in your cluster during automatic or manual upgrades. For auto-upgrades, GKE initiates the auto-upgrade. GKE observes automatic and manual upgrades across all GKE clusters, and intervenes if problems are observed.
To upgrade a cluster, GKE updates the version the control plane and nodes are running. Clusters are upgraded to either a newer minor version (for example, 1.24 to 1.25) or newer patch version (for example, 1.24.2-gke.100 to 1.24.5-gke.200). For more information, see GKE versioning and support.
Note: A cluster's control plane and nodes do not necessarily run the same version at all times, however they must adhere to the GKE version skew policy. In this topic, cluster upgrade and control plane upgrade are used interchangeably, and are differentiated from node upgrades. To learn more about how versions work, see Versioning.If you enroll your cluster in a release channel, nodes run the same version of GKE as the cluster, except during a brief period (typically a few days, depending on the current release) between completing the cluster's control plane upgrade and starting the node pool upgrade, or if the control plane was manually upgraded. Check the release notes for more information.
Cluster upgradesThis section discusses what to expect when GKE auto-upgrades your cluster or you initiate a manual upgrade.
Zonal clusters have only a single control plane. During the upgrade, your workloads continue to run, but you cannot deploy new workloads, modify existing workloads, or make other changes to the cluster's configuration until the upgrade is complete.
Regional clusters have multiple replicas of the control plane, and only one replica is upgraded at a time, in an undefined order. During the upgrade, the cluster remains highly available, and each control plane replica is unavailable only while it is being upgraded.
If you configure a maintenance window or exclusion, it is honored if possible.
Node pool upgradesThis section discusses what to expect when GKE auto-upgrades your node pool or you initiate a manual node pool upgrade.
GKE automatically upgrades one node pool at a time in a cluster. Alternatively, you can manually upgrade one or more node pools in parallel. By default, nodes within a node pool are upgraded one at a time in an arbitrary order. In a node pool spread across multiple zones, upgrades take place zone-by-zone. Within a zone, the nodes will be upgraded in an undefined order.
With GKE node pool upgrades, you can choose between two configurable, built-in upgrade strategies where you can tune the upgrade process based on your cluster environment's needs. To learn more about surge and blue-green upgrade strategies, see Upgrade strategies.
During a node pool upgrade, you can't make changes to the cluster configuration unless you cancel the upgrade.
GKE honors maintenance windows and exclusions during automatic upgrades when possible. Manual upgrades bypass your configured maintenance windows and exclusions.
Note: Cluster Autoscaler scale-up events may still occur during a node pool upgrade. For a multi-zone node pool, a node may be created running the older node version if the zone's nodes have not yet been upgraded. How nodes are upgradedDuring a node pool upgrade, how the nodes are upgraded depends on the node pool upgrade strategy and how you configure it. However, the basic steps remain consistent. To upgrade a node, GKE removes Pods from the node so that it can be upgraded.
When a node is upgraded, the following happens with the Pods:
The node pool upgrade process may take up to a few hours depending on the upgrade strategy, the number of nodes, and their workload configurations.
Considerations affecting node upgrade durationConfigurations that can cause a node upgrade to take longer to complete include:
GKE offers built-in configurable strategies which determine how the node pool is upgraded. To learn more about types of changes that use a node upgrade strategy, see When GKE uses surge upgrades and When GKE uses blue-green upgrades.
Surge upgradesBy default, the surge upgrade strategy is used for node pool upgrades. Surge upgrades use a rolling method to upgrade nodes. This strategy is best for applications that can handle incremental, non-disruptive changes. With this strategy, nodes are upgraded in a rolling window. With the settings you can change how many nodes can be upgraded at once, and how disruptive the upgrades can be, finding the optimal balance of speed and disruption for your environment's needs.
Blue-green upgradesThe alternative approach is blue-green upgrades, where two sets of environments (the original and new environments) are maintained at once, making rolling back as easy as possible. Blue-green is more resource intensive and better for applications that are more sensitive to changes. With this strategy, workloads are gradually migrated from the original "blue" environment to the new "green" environment, and given soak time to validate them with the new configuration. If needed, the workloads can be quickly rolled back to the existing "blue" environment.
To learn more about how the node upgrade strategies work, see Node upgrade strategies.
Resource requirements for node upgrade strategiesSurge upgrades create extra nodes if maxSurge
is set to more than 0, and blue-green upgrades temporarily double the number of nodes in a node pool. This requires additional resources, which is subject to Compute Engine quota, resource availability, and reservation capacity. If your node pool doesn't have sufficient resources, upgrades can take longer or fail.
To learn more about how to ensure your project has enough resources for node upgrades, and what to do if your environment is resource-constrained, see Ensure resources for node upgrades.
Upgrading automaticallyWhen you create a Standard cluster, by default, auto-upgrade is enabled on the cluster and its node pools.
GKE is responsible for securing your cluster's control plane, and upgrades your clusters when a new GKE version is selected for auto-upgrade. Infrastructure security is high priority for GKE, and as such control planes are upgraded on a regular basis, and cannot be disabled. However, you can apply maintenance windows and exclusions to temporarily suspend upgrades for control planes and nodes.
As part of the GKE shared responsibility model, you are responsible for securing your nodes, containers, and Pods. Node auto-upgrade is enabled by default. Although it is not recommended, you can disable node auto-upgrade. Opting out of node auto-upgrades does not block your cluster's control plane upgrade. If you opt out of node auto-upgrades you are responsible for ensuring that the cluster's nodes run a version compatible with the cluster's version, and that the version adheres to the GKE version skew policy.
For more control over when an auto-upgrade can occur (or must not occur), you can configure maintenance windows and exclusions.
A cluster's node pools can be no more than two minor versions behind the control plane version, to maintain compatibility with the cluster API. The node pool version also determines the versions of software packages installed on each node. It is recommended to keep node pools updated to the cluster version.
If you enroll your cluster in a release channel, nodes always run the same version of GKE as the cluster itself, except during a brief period (typically a few days, depending on the current release) between completing the cluster's control plane upgrade and beginning to upgrade a given node pool. Check the release notes for more information.
How versions are selected for auto-upgradeNew GKE versions are released regularly, but a version is not selected for auto-upgrade right away. When a GKE version has accumulated enough cluster usage to prove stability over time, GKE selects it as an auto-upgrade target for clusters running a subset of older versions. To get auto-upgrade targets for a specific cluster, see Get information about a cluster's upgrades.
New auto-upgrade targets are announced in the release notes. Until an available version is selected for auto-upgrade, you can upgrade to it manually. Occasionally, a version is selected for cluster auto-upgrade and node auto-upgrade during different weeks.
Soon after a new minor version becomes generally available, the oldest available minor version typically becomes unsupported. Clusters running minor versions that become unsupported are automatically upgraded to the next minor version.
Within a minor version (such as v1.14.x), clusters can be automatically upgraded to a new patch release.
Release channels allow you to control your cluster and node pool version based on a version's stability rather than managing the version directly.
Note: Node auto-upgrade is not available for Alpha clusters. Also, alpha clusters cannot be enrolled in release channels. Factors that affect version rollout timingTo ensure the stability and reliability of clusters on new versions, GKE follows certain practices during version rollouts.
These practices include, but are not limited to:
By default, auto-upgrades can occur at any time to preserve infrastructure security. Auto-upgrades are minimally disruptive, especially for regional clusters. However, some workloads may require finer-grained control. You can configure maintenance windows and exclusions to manage when auto-upgrades can and must not occur.
Note: If you configure maintenance windows and exclusions, the upgrade does not occur until the current time is within a maintenance window. If a maintenance window expires before the upgrade completes, an attempt is made to pause it. During the next occurrence maintenance window, an attempt is made to resume the upgrade. Upgrading manuallyYou can request to manually upgrade your cluster or its node pools to an available and compatible version at any time. Manual upgrades bypass any configured maintenance windows and maintenance exclusions.
Note: You cannot upgrade your cluster more than one minor version at a time. For example, you can upgrade a cluster from version 1.12.x to 1.13.x, but not directly from 1.11.x to 1.13.x. For more information, see Versioning and upgrades.When you manually upgrade a cluster, its availability depends on whether the cluster is regional or not:
For zonal clusters, the control plane is unavailable while it is being upgraded. For the most part, workloads run normally but cannot be modified during the upgrade.
For regional clusters, one replica of the control plane is unavailable at a time while it is upgraded, but the cluster remains highly available during the upgrade.
You can manually initiate a node upgrade to a version compatible with the control plane.
How GKE responds to auto-upgrade failureNode pool auto-upgrades can fail because of issues with the underlying Compute Engine instances, or because of issues with Kubernetes. For example, auto-upgrades fail in the following situations:
maxSurge
setting exceeds your Compute Engine resource quota.When issues occur with individual node upgrades, GKE retries the upgrade a few times, with an increasing interval between retries. If nodes in the node pool fail to upgrade, GKE does not roll back the upgraded nodes. Instead, GKE tries the node pool auto-upgrade again until all the nodes are successfully upgraded.
If your node upgrades fail because your surge node requests exceed your Compute Engine quota, GKE reduces the number of concurrent surge nodes to attempt to meet the quota and continue the upgrade.
Note: GKE maintains the existing node capacity during a surge upgrade, as long asmaxSurge
is more than 0
and maxUnavailable=0
. Your workloads continue to run even when the node upgrade fails. Receiving upgrade notifications
GKE publishes notifications about events relevant to your cluster, such as version upgrades and security bulletins, to Pub/Sub, providing you with a channel to receive information from GKE about your clusters.
For more information, see Receiving cluster notifications.
Check upgrade logsGKE logs control plane and node pool upgrade events to Cloud Logging by default. Upgrade events log provides visibility into the upgrade process, and includes valuable information for troubleshooting if needed.
Control plane upgrade logsCluster upgrade events can be queried using the following filter:
resource.type="gke_cluster" protoPayload.metadata.operationType=~"(UPDATE_CLUSTER|UPGRADE_MASTER)" resource.labels.cluster_name="CLUSTER_NAME"
These logs are recorded as structured logging formats. You can use the following fields for the details of the upgrade events:
Field Description protoPayload.metadata.operationType There are two types of cluster upgrade events:UPGRADE_MASTER
and UPDATE_CLUSTER
.
UPGRADE_MASTER
changes the Kubernetes control plane version.
UPDATE_CLUSTER
means an update not changing the Kubernetes control plane version.
google.container.v1.ClusterManager.UpdateCluster
: manual control plane upgrade
google.container.internal.ClusterManagerInternal.UpdateClusterInternal
: automatic control plane upgrade
google.container.v1.ClusterManager.PatchCluster
: cluster configuration change.
MASTER_UPGRADE
operation type, and contains the previous control plane version used before the upgrade. protoPayload.metadata.currentMasterVersion This field is used only for the MASTER_UPGRADE
operation type, and contains the new control plane version number used after the upgrade.
Node pool upgrade logs
Use the following query to view node pool upgrade events:
resource.type="gke_nodepool" protoPayload.metadata.operationType="UPGRADE_NODES" resource.labels.cluster_name="CLUSTER_NAME"
Use the following field for details about the upgrade event:
protoPayload.methodName
field shows whether the upgrade was triggered manually or triggered automatically as follows.
google.container.v1.ClusterManager.UpdateNodePool
: manual node pool upgradegoogle.container.internal.ClusterManagerInternal.UpdateClusterInternal
: automatic node pool upgradeGKE runs system workloads on worker nodes to support specific capabilities for clusters. For example, the gke-metadata-server
system workload supports Workload Identity Federation for GKE. GKE is responsible for the health of these workloads. To learn more about these components, refer to the documentation for the associated capabilities.
When new features or fixes become available for a component, GKE indicates the patch version in which they are included. To obtain the latest version of a component, refer to the associated documentation or release notes for instructions on upgrading your control plane or nodes to the appropriate version.
What's nextRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4