A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/kubernetes-engine/docs/troubleshooting/network-isolation below:

Troubleshoot network isolation in GKE | GKE networking

This page shows you how to resolve issues with Google Kubernetes Engine (GKE) network isolation.

GKE cluster not running

Deleting the firewall rules that allow ingress traffic from the cluster control plane to nodes on port 10250, or deleting the default route to the default internet gateway, causes a cluster to stop functioning. If you delete the default route, you must ensure traffic to necessary Google Cloud services is routed. For more information, see custom routing.

Timeout when creating a cluster
Symptoms
Clusters created in version 1.28 or earlier with private nodes require a peering route between VPCs. However, only one peering operation can happen at a time. If you attempt to create multiple clusters with the preceding characteristics at the same time, cluster creation may time out.
Resolution

Use one of the following solutions:

VPC Network Peering connection is accidentally deleted
Symptoms

When you accidentally delete a VPC Network Peering connection, the cluster goes in a repair state and all nodes show an UNKNOWN status. You won't be able to perform any operation on the cluster since reachability to the control plane is disconnected. When you inspect the control plane, logs will display an error similar to the following:

error checking if node NODE_NAME is shutdown: unimplemented
Potential causes

You accidentally deleted the VPC Network Peering connection.

Resolution

  1. Create a new GKE cluster with a version that predates the PSC switch and its specific configurations. This action is necessary to force the re-creation of the VPC peering connection, which will restore the old cluster to its normal operation.
  2. Monitor the original cluster status.
  3. Delete the temporarily created GKE cluster.
Private Service Connect endpoint and forwarding rule are accidentally deleted
Symptoms

When you accidentally delete a Private Service Connect endpoint or forwarding rule, the cluster goes into a repair state and all nodes show an UNKNOWN status. You won't be able to perform any operation on the cluster since access to the control plane is disconnected. When you inspect the control plane, logs will display an error similar to the following:

error checking if node NODE_NAME is shutdown: unimplemented
Potential causes

You accidentally deleted the Private Service Connect endpoint or forwarding rule. Both resources are named gke-[cluster-name]-[cluster-hash:8]-[uuid:8]-pe and permit the control plane and nodes to privately connect.

Resolution

Rotate your control plane IP address.

Cluster overlaps with active peer
Symptoms

Attempting to create a cluster without an external endpoint returns an error similar to the following:

Google Compute Engine: An IP range in the peer network overlaps with an IP
range in an active peer of the local network.
Potential causes

You chose an overlapping control plane CIDR.

Resolution

Use one of the following solutions:

Can't reach control plane of a cluster with no external endpoint

Increase the likelihood that your cluster control plane is reachable by implementing any of the cluster endpoint access configuration. For more information, see access to cluster endpoints.

Symptoms

After creating a cluster with no external endpoint, attempting to run kubectl commands against the cluster returns an error similar to one of the following:

Unable to connect to the server: dial tcp [IP_ADDRESS]: connect: connection
timed out.
Unable to connect to the server: dial tcp [IP_ADDRESS]: i/o timeout.
Potential causes

kubectl is unable to talk to the cluster control plane.

Resolution

Use one of the following solutions:

Can't create cluster due to overlapping IPv4 CIDR block
Symptoms

gcloud container clusters create returns an error similar to the following:

The given master_ipv4_cidr 10.128.0.0/28 overlaps with an existing network
10.128.0.0/20.
Potential causes

You specified a control plane CIDR block that overlaps with an existing subnet in your VPC.

Resolution

Specify a CIDR block for --master-ipv4-cidr that does not overlap with an existing subnet.

Can't create cluster due to services range already in use by another cluster
Symptoms

Attempting to create a cluster returns an error similar to the following:

Services range [ALIAS_IP_RANGE] in network [VPC_NETWORK], subnetwork
[SUBNET_NAME] is already used by another cluster.
Potential causes

The following configurations might cause this error:

Resolution

Follow these steps:

  1. Check if the services range is in use by an existing cluster. You can use the gcloud container clusters list command with the filter flag to search for the cluster. If there is an existing cluster using the services ranges, you must delete that cluster or create a new services range.
  2. If the services range is not in use by an existing cluster, then manually remove the metadata entry that matches the services range you want to use.
Can't create a subnet
Symptoms

When you attempt to create a cluster with an automatic subnet, or to create a custom subnet, you might encounter the any of the following errors:

An IP range in the peer network overlaps
with an IP range in one of the active peers of the local network.
Error: Error waiting for creating GKE cluster: Invalid value for field
PrivateClusterConfig.MasterIpv4CidrBlock: x.x.x.x/28 conflicts with an
existing subnet in one of the peered VPCs.
Potential causes

The control plane CIDR range you specified overlaps with another IP range in the cluster. This subnet creation error can also occur if you're attempting to reuse the master-ipv4-cidr CIDRs used in a recently deleted cluster.

Resolution

Try using a different CIDR range.

Can't pull image from public Docker Hub
Symptoms

A Pod running in your cluster displays a warning in kubectl describe:

Failed to pull image: rpc error: code = Unknown desc = Error response
from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled
while waiting for connection (Client.Timeout exceeded while awaiting
headers)
Potential causes

Nodes with private IP addresses only need additional configuration to meet the internet access requirements. However, the nodes can access Google Cloud APIs and services, including Artifact Registry, if you have enabled Private Google Access and met its network requirements.

Resolution

Use one of the following solutions:

API request that triggers admission webhook timing out
Symptoms

An API request that triggers an admission webhook configured to use a service with a targetPort other than 443 times out, causing the request to fail:

Error from server (Timeout): request did not complete within requested timeout 30s
Potential causes

By default, the firewall does not allow TCP connections to nodes except on ports 443 (HTTPS) and 10250 (kubelet). An admission webhook attempting to communicate with a Pod on a port other than 443 will fail if there is not a custom firewall rule that permits the traffic.

Resolution

Add a firewall rule for your specific use case.

Can't create cluster due to health check failing
Symptoms

After creating a Standard cluster with private node pools, it gets stuck at the health check step and reports an error similar to one of the following:

All cluster resources were brought up, but only 0 of 2 have registered.
All cluster resources were brought up, but: 3 nodes out of 4 are unhealthy
Potential causes

The following configurations might cause this error:

Resolution

Use one of the following solutions:

kubelet Failed to create pod sandbox
Symptoms

After creating a cluster with private nodes, it reports an error similar to one of the following:

Warning  FailedCreatePodSandBox  12s (x9 over 4m)      kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = Error response from daemon: Get https://registry.k8s.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Potential causes

The calico-node or netd Pod can't reach *.gcr.io.

Resolution

Ensure you have completed the required setup for Container Registry or Artifact Registry.

Private nodes created but not joining the cluster

For clusters using nodes with private IP addresses only, often when using custom routing and third-party network appliances on the VPC, the default route (0.0.0.0/0) is redirected to the appliance instead of the default internet gateway. In addition to the control plane connectivity, you need to ensure that the following destinations are reachable:

Configure Private Google Access for all three domains. This best practice allows the new nodes to startup and join the cluster while keeping the internet bound traffic restricted.

Workloads on GKE clusters unable to access internet

Pods running in nodes with private IP addresses can't access the internet. For example, after running the apt update command from the Pod exec shell, it reports an error similar to the following:

0% [Connecting to deb.debian.org (199.232.98.132)] [Connecting to security.debian.org (151.101.130.132)]

If subnet secondary IP address range used for Pods in the cluster is not configured on Cloud NAT gateway, the Pods can't connect to the internet as they don't have an external IP address configured for Cloud NAT gateway.

Ensure you configure the Cloud NAT gateway to apply at least the following subnet IP address ranges for the subnet that your cluster uses:

To learn more, see how to add secondary subnet IP range used for Pods.

Direct IP access can't be disabled for public clusters
Symptoms

After disabling the IP address endpoint, you see an error message similar to the following:

Direct IP access can't be disabled for public clusters
Potential causes

Your cluster uses legacy network.

Resolution

Migrate your clusters to Private Service Connect. For more information about the status of the migration, contact support .

Direct IP access can't be disabled for clusters in the middle of PSC migration
Symptoms

After disabling the IP address endpoint, you see an error message similar to the following:

Direct IP access can't be disabled for public clusters
Potential causes

Your cluster uses legacy network.

Resolution

Use one of the following solutions:

Control plane internal endpoint can't be enabled
Symptoms

When attempting to enable the internal endpoint of your cluster's control plane, you see error messages similar to the following:

private_endpoint_enforcement_enabled can't be enabled when envoy is disabled
private_endpoint_enforcement_enabled is unsupported. Please upgrade to the minimum support version
Potential causes

Your cluster needs to do IP address rotation or a version update.

Resolution

Use one of the following solutions:

Cluster creation fails when organization policies are defined
Symptoms

When attempting to create a cluster, you see an error message similar to the following:

compute.disablePrivateServiceConnectCreationForConsumers violated for projects
Potential causes

The cluster endpoint or backend is blocked by a consumer organization policy.

Resolution

Allow instances to create endpoints with the compute.restrictPrivateServiceConnectProducer constraint by completing the steps in Consumer-side organization policies.

The Private Service Connect endpoint might leak during cluster deletion
Symptoms

After creating a cluster, you might see one of the following symptoms:

Potential causes

On GKE clusters that use Private Service Connect, GKE deploys a Private Service Connect endpoint by using a forwarding rule that allocates an internal IP address to access the cluster's control plane in a control plane's network. To protect the communication between the control plane and the nodes by using Private Service Connect, GKE keeps the endpoint invisible, and you can't see it on Google Cloud console or gcloud CLI.

Resolution

To prevent leaking the Private Service Connect endpoint before cluster deletion, complete the following steps:

  1. Assign the Kubernetes Engine Service Agent role to the GKE service account.
  2. Ensure that the compute.forwardingRules.* and compute.addresses.* permissions are not explicitly denied from GKE service account.

If you see the Private Service Connect endpoint leaked, contact support .

Unable to parse the cluster's authorized network
Symptoms

You can't create a cluster in version 1.29 or later. An error message similar to the following appears:

Unable to parse cluster.master_ipv4_cidr "" into a valid IP address and mask
Potential causes

Your Google Cloud project uses private IP-based webhooks. Therefore, you are unable to create a cluster with Private Service Connect. Instead, your cluster uses VPC Network Peering which parses the master-ipv4-cidr flag.

Resolution

Use one of the following solutions:

Unable to define internal IP address range in clusters with public nodes
Symptoms

You can't define an internal IP address range by using the --master-ipv4-cidr flag. An error message similar to the following appears:

ERROR: (gcloud.container.clusters.create) Cannot specify --master-ipv4-cidr
  without --enable-private-nodes
Potential causes

You are defining an internal IP address range for the control plane with the master-ipv4-cidr flag in a cluster without the enable-private-nodes flag enabled. To create a cluster with master-ipv4-cidr defined, you must configure your cluster to provision nodes with internal IP addresses (private nodes) by using the enable-private-nodes flag.

Resolution

Use one of the following solutions:

Unable to schedule public workloads on Autopilot clusters
Symptoms
On Autopilot clusters, if your cluster uses private nodes only, you can't schedule workloads in public Pods using the cloud.google.com/private-node=false nodeSelector.
Potential causes
The configuration of the private-node flag set as false in the Pod's nodeSelector is only available in clusters in version 1.30.3 or later.
Resolution
Upgrade your cluster to 1.30 version or later.
Access to the DNS-based endpoint is disabled
Symptoms

Attempting to run kubectl commands against the cluster returns an error similar to the following:

couldn't get current server API group list:
control_plane_endpoints_config.dns_endpoint_config.allow_external_traffic is
disabled
Potential causes

DNS-based access has been disabled on your cluster.

Resolution

Enable access to the control plane by using the DNS-based endpoint of the control plane. To learn more, see Modify the control plane access.

Nodes fail to allocate IP address during scaling
Symptoms

Attempting to expand subnet's primary IP address range to the list of authorized networks returns an error similar to the following:

 authorized networks fields cannot be mutated if direct IP access is disabled
Potential causes

You have disabled the cluster IP-based endpoint.

Resolution

Disable and enable the cluster IP-based endpoint by using the enable-ip-access flag.

Too many CIDR blocks

gcloud returns the following error when attempting to create or update a cluster with more than 50 CIDR blocks:

ERROR: (gcloud.container.clusters.update) argument --master-authorized-networks: too many args

To resolve this issue, try the following:

Unable to connect to the server

kubectl commands time out due to incorrectly configured CIDR blocks:

Unable to connect to the server: dial tcp MASTER_IP: getsockopt: connection timed out

When you create or update a cluster, ensure that you specify the correct CIDR blocks.

Nodes can access public container images despite network isolation
Symptoms

You might observe that in a GKE cluster configured for network isolation, pulling a common public image like redis works, but pulling a less common or private image fails.

This behavior is expected due to GKE's default configuration and doesn't indicate that GKE has bypassed your network isolation.

Potential causes

This behavior occurs because of two features working together:

When you try to pull an image like redis, your node uses the private path from Private Google Access to connect to mirror.gcr.io. Because redis is a very common image, it exists in the cache, and the pull succeeds. However, if you request an image that isn't in this public cache, the pull fails because your isolated node has no other way to reach its original source.

Resolution

If an image that you need isn't available in the mirror.gcr.io cache, host it in your own private Artifact Registry repository. Your network-isolated nodes can access this repository using Private Google Access.

What's next

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4