This page shows you how to resolve storage-related issues on your Google Kubernetes Engine (GKE) clusters.
Error 400: Cannot attach RePD to an optimized VMRegional persistent disks are restricted from being used with memory-optimized machines or compute-optimized machines.
Consider using a non-regional persistent disk storage class if using a regional persistent disk is not a hard requirement. If using a regional persistent disk is a hard requirement, consider scheduling strategies such as taints and tolerations to ensure that the Pods that need regional persistent disks are scheduled on a node pool that are not optimized machines.
Troubleshooting issues with disk performanceThe performance of the boot disk is important because the boot disk for GKE nodes is not only used for the operating system but also for the following:
/tmp
.emptyDir
volumes, unless the node uses local SSD.Disk performance is shared for all disks of the same disk type on a node. For example, if you have a 100 GB pd-standard
boot disk and a 100 GB pd-standard
PersistentVolume with lots of activity, the performance of the boot disk is that of a 200 GB disk. Also, if there is a lot of activity on the PersistentVolume, this impacts the performance of the boot disk as well.
If you encounter messages similar to the following on your nodes, these could be symptoms of low disk performance:
INFO: task dockerd:2314 blocked for more than 300 seconds.
fs: disk usage and inodes count on following dirs took 13.572074343s
PLEG is not healthy: pleg was last seen active 6m46.842473987s ago; threshold is 3m0s
To help resolve such issues, review the following:
emptyDir
volumes.fsGroup
setting
One issue that can cause PersistentVolume
mounting to fail is a Pod that is configured with the fsGroup
setting. Normally, mounts automatically retry and the mount failure resolves itself. However, if the PersistentVolume
has a large number of files, kubelet will attempt to change ownership on each file on the filesystem, which can increase volume mount latency.
Unable to attach or mount volumes for pod; skipping pod ... timed out waiting for the condition
To confirm if a failed mount error is due to the fsGroup
setting, you can check the logs for the Pod. If the issue is related to the fsGroup
setting, you see the following log entry:
Setting volume ownership for /var/lib/kubelet/pods/POD_UUID and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
If the PersistentVolume
does not mount within a few minutes, try the following steps to resolve this issue:
[fsGroup]
setting.fsGroupChangePolicy
to OnRootMismatch
.For more information, refer to containerd issue #4604.
Affected GKE node versions: 1.18, 1.19, 1.20.0 to 1.20.15-gke.2100, 1.21.0 to 1.21.9-gke.2000, 1.21.10 to 1.21.10-gke.100, 1.22.0 to 1.22.6-gke.2000, 1.22.7 to 1.22.7-gke.100, 1.23.0 to 1.23.3-gke.700, 1.23.4 to 1.23.4-gke.100
The following example errors might be displayed in the k8s_node container-runtime
logs:
Error: failed to reserve container name "container-name-abcd-ef12345678-91011_default_12131415-1234-5678-1234-12345789012_0": name "container-name-abcd-ef12345678-91011_default_12131415-1234-5678-1234-12345789012_0" is reserved for "1234567812345678123456781234567812345678123456781234567812345678"
Mitigation
restartPolicy:Always
or restartPolicy:OnFailure
in your PodSpec.This issue is fixed in containerd 1.6.0+. GKE versions with this fix are 1.20.15-gke.2100+, 1.21.9-gke.2000+, 1.21.10-gke.100+, 1.22.6-gke.2000+, 1.22.7-gke.100+, 1.23.3-gke.1700+ and 1.23.4-gke.100+
Volume expansion changes not reflecting in the container file systemWhen performing volume expansion, always make sure to update the PersistentVolumeClaim. Changing a PersistentVolume directly can result in volume expansion not happening. This could lead to one of the following scenarios:
If a PersistentVolume object is modified directly, both the PersistentVolume and PersistentVolumeClaim values are updated to a new value, but the file system size is not reflected in the container and is still using the old volume size.
If a PersistentVolume object is modified directly, followed by updates to the PersistentVolumeClaim where the status.capacity
field is updated to a new size, this can result in changes to the PersistentVolume but not the PersistentVolumeClaim or the container file system.
To resolve this issue, complete the following steps:
spec.resources.requests.storage
to a value that is higher than was used in the PersistentVolume.After these changes, PersistentVolume, PersistentVolumeClaim and container file system should be automatically resized by the kubelet.
Verify if the changes are reflected in the Pod.
kubectl exec POD_NAME -- /bin/bash -c "df -h"
Replace POD_NAME
with the Pod attached to PersistentVolumeClaim.
You might encounter the following error when creating a cluster or a node pool that uses Local SSD:
The selected machine type (c3-standard-22-lssd) has a fixed number of local SSD(s): 4. The EphemeralStorageLocalSsdConfig's count field should be left unset or set to 4, but was set to 1.
In the error message, you might see LocalNvmeSsdBlockConfig
instead of EphemeralStorageLocalSsdConfig
depending on which you specified.
This error occurs when the number of Local SSD disks specified does not match the number of Local SSD disks included with the machine type.
To resolve this issue, specify a number of Local SSD disks that matches the machine type that you want. For third generation machine series, you must omit the Local SSD count
flag and the correct value will be configured automatically.
You might encounter the ZONE_RESOURCE_POOL_EXHAUSTED
error or similar Compute Engine resource errors when trying to provision Hyperdisk Balanced disks as your node's boot or attached disks in a Hyperdisk Storage Pool.
This happens when you're trying to create a GKE cluster or node pool in a zone that's running low on resources, for example:
c3-standard-4
.To resolve this issue:
If you can't find a solution to your problem in the documentation, see Get support for further help, including advice on the following topics:
google-kubernetes-engine
tag to search for similar issues. You can also join the #kubernetes-engine
Slack channel for more community support.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4