This guide covers how to simplify and accelerate the loading of AI/ML model weights on Google Kubernetes Engine (GKE) using Hyperdisk ML. The Compute Engine Persistent Disk CSI driver is the primary way for you to access Hyperdisk ML storage with GKE clusters.
OverviewHyperdisk ML is a high performance storage solution that can be used to scale out your applications. It provides high aggregate throughput to many virtual machines concurrently, making it ideal if you want to run AI/ML workloads that need access to large amounts of data.
When enabled in read-only-many mode, you can use Hyperdisk ML to accelerate the loading of model weights by up to 11.9X relative to loading directly from a model registry. This acceleration is made possible by the Google Cloud Hyperdisk architecture that allows scaling to 2,500 concurrent nodes at 1.2 TiB/s. This lets you drive better load times and reduce Pod over-provisioning for your AI/ML inference workloads.
The high level steps to create and use Hyperdisk ML are as follows:
Hyperdisk ML disks are only available in a single zone. Optionally, you can use the Hyperdisk ML multi-zone feature to dynamically link multiple zonal disks that contain the same content in a single logical PersistentVolumeClaim and PersistentVolume. Zonal disks referenced by the multi-zone feature must be located in the same region. For example, if your regional cluster is created in us-central1
, the multi-zone disks must be located in the same region (for example, us-central1-a
, us-central1-b
).
A common use case for AI/ML inference is to run Pods across zones for improved accelerator availability and cost efficiency with Spot VMs. Since Hyperdisk ML is zonal, if your inference server runs many Pods across zones, GKE automatically clone the disks across zones to ensure your data follows your application.
Multi-zone Hyperdisk ML volumes have the following limitations:
To learn more, see Create a multi-zone ReadOnlyMany Hyperdisk ML volume from a VolumeSnapshot.
Before you beginBefore you start, make sure you have performed the following tasks:
gcloud components update
. Note: For existing gcloud CLI installations, make sure to set the compute/region
and compute/zone
properties. By setting default locations, you can avoid errors in gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location
.To use Hyperdisk ML volumes in GKE, your clusters must meet the following requirements:
To get access to the Gemma models for deployment to GKE, you must first sign the license consent agreement then generate a Hugging Face access token.
Sign the license consent agreementYou must sign the consent agreement to use Gemma. Follow these instructions:
To access the model through Hugging Face, you'll need a Hugging Face token.
Follow these steps to generate a new token if you don't have one already:
Read
.You can serve LLMs on GPUs in a GKE Autopilot or Standard cluster. We recommend that you use a Autopilot cluster for a fully managed Kubernetes experience. To choose the GKE mode of operation that's the best fit for your workloads, see Choose a GKE mode of operation.
AutopilotIn Cloud Shell, run the following command:
gcloud container clusters create-auto hdml-gpu-l4 \
--project=PROJECT \
--location=CONTROL_PLANE_LOCATION \
--release-channel=rapid \
--cluster-version=1.30.2-gke.1394000
Replace the following values:
us-east4
for L4 GPU.GKE creates an Autopilot cluster with CPU and GPU nodes as requested by the deployed workloads.
Configure kubectl
to communicate with your cluster:
gcloud container clusters get-credentials hdml-gpu-l4 \
--location=CONTROL_PLANE_LOCATION
In Cloud Shell, run the following command to create a Standard cluster and node pools:
gcloud container clusters create hdml-gpu-l4 \
--location=CONTROL_PLANE_LOCATION \
--num-nodes=1 \
--machine-type=c3-standard-44 \
--release-channel=rapid \
--cluster-version=CLUSTER_VERSION \
--node-locations=ZONES \
--project=PROJECT
gcloud container node-pools create gpupool \
--accelerator type=nvidia-l4,count=2,gpu-driver-version=latest \
--location=CONTROL_PLANE_LOCATION \
--project=PROJECT \
--node-locations=ZONES \
--cluster=hdml-gpu-l4 \
--machine-type=g2-standard-24 \
--num-nodes=2
Replace the following values:
--location
flag. For zonal clusters, --node-locations
must contain the cluster's primary zone.The cluster creation might take several minutes.
Configure kubectl
to communicate with your cluster:
gcloud container clusters get-credentials hdml-gpu-l4
To use Hyperdisk ML, you pre-cache data in a disk image, and create a Hyperdisk ML volume for read access by your workload in GKE. This approach (also called data hydration) ensures that your data is available when your workload needs it.
To copy the data from Cloud Storage to pre-cache a Persistent Disk disk image, follow these steps:
Create a StorageClass that supports Hyperdisk MLSave the following StorageClass manifest in a file named hyperdisk-ml.yaml
.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hyperdisk-ml
parameters:
type: hyperdisk-ml
provisioned-throughput-on-create: "2400Mi"
provisioner: pd.csi.storage.gke.io
allowVolumeExpansion: false
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
mountOptions:
- read_ahead_kb=4096
Tip: If you want to tune the readahead value, add the read_ahead_kb
parameter to the mountOptions
field.Create the StorageClass by running this command:
kubectl create -f hyperdisk-ml.yaml
Save the following PersistentVolumeClaim manifest in a file named producer-pvc.yaml
. You'll use the StorageClass you created earlier. Make sure that your disk has sufficient capacity to store your data.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: producer-pvc
spec:
storageClassName: hyperdisk-ml
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 300Gi
Create the PersistentVolumeClaim by running this command:
kubectl create -f producer-pvc.yaml
This section shows an example of creating a Kubernetes Job that provisions a disk and downloads the Gemma 7B instruction tuned model from Hugging Face onto the mounted Google Cloud Hyperdisk volume.
To access the Gemma LLM that the examples in this guide uses, create a Kubernetes Secret that contains the Hugging Face token:
kubectl create secret generic hf-secret \
--from-literal=hf_api_token=HF_TOKEN\
--dry-run=client -o yaml | kubectl apply -f -
Replace HF_TOKEN with the Hugging Face token you generated earlier.
Save the following example manifest as producer-job.yaml
:
apiVersion: batch/v1
kind: Job
metadata:
name: producer-job
spec:
template: # Template for the Pods the Job will create
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/compute-class
operator: In
values:
- "Performance"
- matchExpressions:
- key: cloud.google.com/machine-family
operator: In
values:
- "c3"
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- "ZONE"
containers:
- name: copy
resources:
requests:
cpu: "32"
limits:
cpu: "32"
image: huggingface/downloader:0.17.3
command: [ "huggingface-cli" ]
args:
- download
- google/gemma-1.1-7b-it
- --local-dir=/data/gemma-7b
- --local-dir-use-symlinks=False
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-secret
key: hf_api_token
volumeMounts:
- mountPath: "/data"
name: volume
restartPolicy: Never
volumes:
- name: volume
persistentVolumeClaim:
claimName: producer-pvc
parallelism: 1 # Run 1 Pods concurrently
completions: 1 # Once 1 Pods complete successfully, the Job is done
backoffLimit: 4 # Max retries on failure
Replace ZONE with the compute zone where you want the Hyperdisk to be created. If you're using it with the Deployment example, ensure it is a zone that has G2 machine capacity.
Create the Job by running this command:
kubectl apply -f producer-job.yaml
It might take a few minutes for the Job to finish copying data to the Persistent Disk volume. When the Job completes provisioning, its status is marked "Complete".
To check the progress of your Job status, run the following command:
kubectl get job producer-job
Once the Job is complete, you can clean up the Job by running this command:
kubectl delete job producer-job
This section covers the steps for creating a ReadOnlyMany (ROM) PersistentVolume and PersistentVolumeClaim pair from a pre-existing Google Cloud Hyperdisk volume. To learn more, see Using pre-existing persistent disks as PersistentVolumes.
In GKE version 1.30.2-gke.1394000 and later, GKE automatically converts the access mode of a READ_WRITE_SINGLE
Google Cloud Hyperdisk volume to READ_ONLY_MANY
.
If you are using a pre-existing Google Cloud Hyperdisk volume on an earlier version of GKE, you must modify the access mode manually by running the following command:
gcloud compute disks update HDML_DISK_NAME \
--zone=ZONE \
--access-mode=READ_ONLY_MANY
Replace the following values:
Create a PersistentVolume and PersistentVolumeClaim pair, referencing the disk you previously populated.
Save the following manifest as hdml-static-pv.yaml
:
apiVersion: v1
kind: PersistentVolume
metadata:
name: hdml-static-pv
spec:
storageClassName: "hyperdisk-ml"
capacity:
storage: 300Gi
accessModes:
- ReadOnlyMany
claimRef:
namespace: default
name: hdml-static-pvc
csi:
driver: pd.csi.storage.gke.io
volumeHandle: projects/PROJECT/zones/ZONE/disks/DISK_NAME
fsType: ext4
readOnly: true
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.gke.io/zone
operator: In
values:
- ZONE
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: default
name: hdml-static-pvc
spec:
storageClassName: "hyperdisk-ml"
volumeName: hdml-static-pv
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 300Gi
Replace the following values:
Create the PersistentVolume and PersistentVolumeClaim resources by running this command:
kubectl apply -f hdml-static-pv.yaml
This section covers the steps for creating a multi-zone Hyperdisk ML volume in ReadOnlyMany access mode. You use a VolumeSnapshot for a pre-existing Persistent Disk disk image. To learn more, see Back up Persistent Disk storage using volume snapshots.
To create the multi-zone Hyperdisk ML volume, follow these steps:
Create a VolumeSnapshot of your diskSave the following manifest as a file called disk-image-vsc.yaml
.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: disk-image-vsc
driver: pd.csi.storage.gke.io
deletionPolicy: Delete
parameters:
snapshot-type: images
Create the VolumeSnapshotClass by running the following command:
kubectl apply -f disk-image-vsc.yaml
Save the following manifest as a file called my-snapshot.yaml
. You'll reference the PersistentVolumeClaim you created earlier in Create a ReadWriteOnce (RWO) PersistentVolumeClaim.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: my-snapshot
spec:
volumeSnapshotClassName: disk-image-vsc
source:
persistentVolumeClaimName: producer-pvc
Create the VolumeSnapshot by running the following command:
kubectl apply -f my-snapshot.yaml
When the VolumeSnapshot is marked "Ready", run the following command to create the Hyperdisk ML volume:
kubectl wait --for=jsonpath='{.status.readyToUse}'=true \
--timeout=300s volumesnapshot my-snapshot
If you want copies of your data to be accessible in more than one zone, specify the enable-multi-zone-provisioning
parameter in your StorageClass, which creates disks in the zones you specified in the allowedTopologies
field.
To create the StorageClass, follow these steps:
Save the following manifest as a file called hyperdisk-ml-multi-zone.yaml
.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hyperdisk-ml-multi-zone
parameters:
type: hyperdisk-ml
provisioned-throughput-on-create: "4800Mi"
enable-multi-zone-provisioning: "true"
provisioner: pd.csi.storage.gke.io
allowVolumeExpansion: false
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowedTopologies:
- matchLabelExpressions:
- key: topology.gke.io/zone
values:
- ZONE_1
- ZONE_2
mountOptions:
- read_ahead_kb=8192
Replace ZONE_1, ZONE_2, ..., ZONE_N with the zones where your storage can be accessed.
This example sets the volumeBindingMode to Immediate
, allowing GKE to provision the PersistentVolumeClaim prior to any consumer referencing it.
Create the StorageClass by running the following command:
kubectl apply -f hyperdisk-ml-multi-zone.yaml
The next step is to create a PersistentVolumeClaim that references the StorageClass.
GKE uses the content of the disk image specified to automatically provision a Hyperdisk ML volume in each zone specified in your snapshot.
To create the PersistentVolumeClaim, follow these steps:
Save the following manifest as a file called hdml-consumer-pvc.yaml
.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: hdml-consumer-pvc
spec:
dataSource:
name: my-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadOnlyMany
storageClassName: hyperdisk-ml-multi-zone
resources:
requests:
storage: 300Gi
Create the PersistentVolumeClaim by running the following command:
kubectl apply -f hdml-consumer-pvc.yaml
When using Pods with PersistentVolumes, we recommend that you use a workload controller (such as a Deployment or StatefulSet).
If you want to use a pre-existing PersistentVolume in ReadOnlyMany mode with a Deployment, refer to Use persistent disks with multiple readers.
To create and test your Deployment, follow these steps:
Save the following example manifest as vllm-gemma-deployment
.
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-gemma-deployment
spec:
replicas: 2
selector:
matchLabels:
app: gemma-server
template:
metadata:
labels:
app: gemma-server
ai.gke.io/model: gemma-7b
ai.gke.io/inference-server: vllm
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: topology.kubernetes.io/zone
containers:
- name: inference-server
image: us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:latest
resources:
requests:
cpu: "2"
memory: "25Gi"
ephemeral-storage: "25Gi"
nvidia.com/gpu: 2
limits:
cpu: "2"
memory: "25Gi"
ephemeral-storage: "25Gi"
nvidia.com/gpu: 2
command: ["python3", "-m", "vllm.entrypoints.api_server"]
args:
- --model=$(MODEL_ID)
- --tensor-parallel-size=2
env:
- name: MODEL_ID
value: /models/gemma-7b
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /models
name: gemma-7b
volumes:
- name: dshm
emptyDir:
medium: Memory
- name: gemma-7b
persistentVolumeClaim:
claimName: CLAIM_NAME
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-l4
---
apiVersion: v1
kind: Service
metadata:
name: llm-service
spec:
selector:
app: gemma-server
type: ClusterIP
ports:
- protocol: TCP
port: 8000
targetPort: 8000
Replace CLAIM_NAME with one of these values:
hdml-static-pvc
: if you are using a Hyperdisk ML volume from a existing Google Cloud Hyperdisk.hdml-consumer-pvc
: if you are using a Hyperdisk ML volume from a VolumeSnapshot disk image.Run the following command to wait for the inference server to be available:
kubectl wait --for=condition=Available --timeout=700s deployment/vllm-gemma-deployment
To test that your vLLM server is up and running, follow these steps:
Run the following command to set up port forwarding to the model:
kubectl port-forward service/llm-service 8000:8000
Run a curl
command to send a request to the model:
USER_PROMPT="I'm new to coding. If you could only recommend one programming language to start with, what would it be and why?"
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d @- <<EOF
{
"prompt": "<start_of_turn>user\n${USER_PROMPT}<end_of_turn>\n",
"temperature": 0.90,
"top_p": 1.0,
"max_tokens": 128
}
EOF
The following output shows an example of the model response:
{"predictions":["Prompt:\n<start_of_turn>user\nI'm new to coding. If you could only recommend one programming language to start with, what would it be and why?<end_of_turn>\nOutput:\nPython is often recommended for beginners due to its clear, readable syntax, simple data types, and extensive libraries.\n\n**Reasons why Python is a great language for beginners:**\n\n* **Easy to read:** Python's syntax is straightforward and uses natural language conventions, making it easier for beginners to understand the code.\n* **Simple data types:** Python has basic data types like integers, strings, and lists that are easy to grasp and manipulate.\n* **Extensive libraries:** Python has a vast collection of well-documented libraries covering various tasks, allowing beginners to build projects without reinventing the wheel.\n* **Large supportive community:**"]}
If you have workloads that perform sequential I/O, they may benefit from tuning the readahead value. This typically applies to inference or training workloads that need to load AI/ML model weights into memory. Most workloads with sequential I/O typically see a performance improvement with a readahead value of 1024 KB or higher.
Tune the readahead value for new volumesYou can specify this option by adding read_ahead_kb
to the mountOptions
field on your StorageClass. The following example shows how you can tune the readahead value to 4096 KB. This will apply to new dynamically provisioned PersistentVolumes created using the hyperdisk-ml
StorageClass.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hyperdisk-ml
parameters:
type: hyperdisk-ml
provisioner: pd.csi.storage.gke.io
allowVolumeExpansion: false
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
mountOptions:
- read_ahead_kb=4096
Note: The StorageClass mountOptions
are copied to the PersistentVolume when it is created. In order to apply the mount options to your existing PersistentVolume, you must either delete and re-create the PersistentVolume, or modify the mountOptions
field of the PersistentVolume directly. Tune the readahead value for existing volumes
For statically provisioned volumes, or pre-existing PersistentVolumes, you can specify this option by adding read_ahead_kb
to the spec.mountOptions
field. The following example shows how you can tune the readahead value to 4096 KB.
apiVersion: v1
kind: PersistentVolume
name: DISK_NAME
spec:
accessModes:
- ReadOnlyMany
capacity:
storage: 300Gi
csi:
driver: pd.csi.storage.gke.io
fsType: ext4
readOnly: true
volumeHandle: projects/PROJECT/zones/ZONE/disks/DISK_NAME
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.gke.io/zone
operator: In
values:
- ZONE
storageClassName: hyperdisk-ml
mountOptions:
- read_ahead_kb=4096
Note: The mountOptions
are only applied at volume mount time, once a pod is bound to a node. In order to apply the mount options to a workload that is already running, you must delete and recreate all consumer pods.
Replace the following values:
This section shows how you can use Flexible I/O Tester (FIO) to benchmark the performance of your Hyperdisk ML volumes for reading pre-existing data . You can use these metrics to evaluate your volume's performance for specific workloads and configurations.
Save the following example manifest as benchmark-job.yaml
:
apiVersion: batch/v1
kind: Job
metadata:
name: benchmark-job
spec:
template: # Template for the Pods the Job will create
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/compute-class
operator: In
values:
- "Performance"
- matchExpressions:
- key: cloud.google.com/machine-family
operator: In
values:
- "c3"
containers:
- name: fio
resources:
requests:
cpu: "32"
image: litmuschaos/fio
args:
- fio
- --filename
- /models/gemma-7b/model-00001-of-00004.safetensors:/models/gemma-7b/model-00002-of-00004.safetensors:/models/gemma-7b/model-00003-of-00004.safetensors:/models/gemma-7b/model-00004-of-00004.safetensors:/models/gemma-7b/model-00004-of-00004.safetensors
- --direct=1
- --rw=read
- --readonly
- --bs=4096k
- --ioengine=libaio
- --iodepth=8
- --runtime=60
- --numjobs=1
- --name=read_benchmark
volumeMounts:
- mountPath: "/models"
name: volume
restartPolicy: Never
volumes:
- name: volume
persistentVolumeClaim:
claimName: hdml-static-pvc
parallelism: 1 # Run 1 Pods concurrently
completions: 1 # Once 1 Pods complete successfully, the Job is done
backoffLimit: 1 # Max retries on failure
Replace CLAIM_NAME with the name of your PersistentVolumeClaim (for example, hdml-static-pvc
).
Create the Job by running the following command:
kubectl apply -f benchmark-job.yaml.
Use kubectl
logs to view the output of the fio
tool:
kubectl logs benchmark-job-nrk88 -f
The output looks similar to the following:
read_benchmark: (g=0): rw=read, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, iodepth=8
fio-2.2.10
Starting 1 process
read_benchmark: (groupid=0, jobs=1): err= 0: pid=32: Fri Jul 12 21:29:32 2024
read : io=18300MB, bw=2407.3MB/s, iops=601, runt= 7602msec
slat (usec): min=86, max=1614, avg=111.17, stdev=64.46
clat (msec): min=2, max=33, avg=13.17, stdev= 1.08
lat (msec): min=2, max=33, avg=13.28, stdev= 1.06
clat percentiles (usec):
| 1.00th=[11072], 5.00th=[12352], 10.00th=[12608], 20.00th=[12736],
| 30.00th=[12992], 40.00th=[13120], 50.00th=[13248], 60.00th=[13376],
| 70.00th=[13504], 80.00th=[13632], 90.00th=[13888], 95.00th=[14016],
| 99.00th=[14400], 99.50th=[15296], 99.90th=[22144], 99.95th=[25728],
| 99.99th=[33024]
bw (MB /s): min= 2395, max= 2514, per=100.00%, avg=2409.79, stdev=29.34
lat (msec) : 4=0.39%, 10=0.31%, 20=99.15%, 50=0.15%
cpu : usr=0.28%, sys=8.08%, ctx=4555, majf=0, minf=8203
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=99.8%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=4575/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=8
Run status group 0 (all jobs):
READ: io=18300MB, aggrb=2407.3MB/s, minb=2407.3MB/s, maxb=2407.3MB/s, mint=7602msec, maxt=7602msec
Disk stats (read/write):
nvme0n2: ios=71239/0, merge=0/0, ticks=868737/0, in_queue=868737, util=98.72%
To monitor the provisioned performance of your Hyperdisk ML volume, see Analyze provisioned IOPS and throughput in the Compute Engine documentation.
To update the provisioned throughput or IOPS of an existing Hyperdisk ML volume, or to learn about additional Google Cloud Hyperdisk parameters you can specify in your StorageClass, refer to Scale your storage performance using Google Cloud Hyperdisk.
TroubleshootingThis section provides troubleshooting guidance to resolve issues with Hyperdisk ML volumes on GKE.
The disk access mode cannot be updatedThe following error occurs when a Hyperdisk ML volume is already being used by and attached by a node in ReadWriteOnce access mode.
AttachVolume.Attach failed for volume ... Failed to update access mode:
failed to set access mode for zonal volume ...
'Access mode cannot be updated when the disk is attached to instance(s).'., invalidResourceUsage
GKE automatically updates the Hyperdisk ML volume's accessMode from READ_WRITE_SINGLE
to READ_ONLY_MANY
, when it is used by a ReadOnlyMany access mode PersistentVolume. This update is done when the disk is attached to a new node.
To resolve this issue, delete all Pods that are referencing the disk using a PersistentVolume in ReadWriteOnce mode. Wait for the disk to be detached, and then re-create the workload that consumes the PersistentVolume in ReadOnlyMany mode.
The disk cannot be attached withREAD_WRITE
mode
The following error indicates that GKE attempted to attach a Hyperdisk ML volume in READ_ONLY_MANY
access mode to a GKE node using ReadWriteOnce access mode.
AttachVolume.Attach failed for volume ...
Failed to Attach: failed cloud service attach disk call ...
The disk cannot be attached with READ_WRITE mode., badRequest
GKE automatically updates the Hyperdisk ML volume's accessMode from READ_WRITE_SINGLE
to READ_ONLY_MANY
, when it is used by a ReadOnlyMany access mode PersistentVolume. However, GKE won't automatically update the access mode from READ_ONLY_MANY
to READ_WRITE_SINGLE
. This is a safety mechanism to ensure that multi-zone disks are not written to by accident, as this could result in diverging content between multi-zone disks.
To resolve this issue, we recommend that you follow the Pre-cache data to a Persistent Disk disk image workflow if you need updated content. If you need more control over the Hyperdisk ML volume's access mode and other settings, see Modify the settings for a Google Cloud Hyperdisk volume.
Quota exceeded - Insufficient throughput quotaThe following error indicates that there was insufficient Hyperdisk ML throughput quota at the time of disk provisioning.
failed to provision volume with StorageClass ... failed (QUOTA_EXCEEDED): Quota 'HDML_TOTAL_THROUGHPUT' exceeded
To resolve this issue, see Disk Quotas to learn more about Hyperdisk quota and how to increase the disk quota in your project.
For additional troubleshooting guidance, refer to Scale your storage performance with Google Cloud Hyperdisk.
What's nextRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4