A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://cloud.google.com/storage/docs/cloud-storage-fuse/performance below:

Performance tuning best practices | Cloud Storage

This document provides guidance on how you can improve Cloud Storage FUSE using key Cloud Storage FUSE features and configurations to achieve maximum throughput and optimal performance, especially for artificial intelligence and machine learning (AI/ML) workloads such as training, serving, and checkpointing.

Considerations

Before you apply the configurations we recommend in this page, consider the following:

Use buckets with hierarchical namespace enabled

Always use buckets with hierarchical namespace enabled. Hierarchical namespace organizes your data into a hierarchical file system structure, which makes operations within the bucket more efficient, resulting in quicker response times and fewer overall list calls for every operation.

The benefits of hierarchical namespace include the following:

To learn how to create a bucket with hierarchical namespace enabled, see Create buckets with hierarchical namespace enabled. To learn how to mount a hierarchical namespace-enabled bucket, see Mount buckets with hierarchical namespace enabled. Hierarchical namespace is supported on Google Kubernetes Engine versions 1.31.1-gke.2008000 or later.

Perform a directory-specific mount

If you want to access a specific directory within a bucket, you can mount only the specific directory using the only-dir mount option instead of mounting the entire bucket. Performing a directory-specific mount accelerates list calls and reduces the overall number of list and stat calls by limiting the number of directories to traverse when resolving a filename, because LookUpInode calls and bucket or directory access requests automatically generate list and stat calls for each file or directory in the path.

To mount a specific directory, use one of the following methods:

Google Kubernetes Engine

Use the following mount configuration with the Cloud Storage FUSE CSI driver for Google Kubernetes Engine:

volumeHandle: BUCKET_NAME
    - only-dir:DIRECTORY_NAME

Replace the following:

Compute Engine

Run the gcsfuse --only-dir command to mount a specific directory on a Compute Engine virtual machine:

gcsfuse --only-dir DIRECTORY_NAME BUCKET_NAME MOUNT_POINT

Replace the following:

For more information on how to perform a directory mount, see Mount a directory within a bucket.

To improve performance for repeat reads, you can configure Cloud Storage FUSE to cache a large amount of metadata and bypass metadata expiration, which avoids repeated metadata requests to Cloud Storage and significantly improves performance.

Increasing metadata cache values is beneficial for workloads with repeat reads to avoid repetitive Cloud Storage calls and for read-only volumes where an infinite TTL can be set.

Consider the following before you increase metadata cache values:

Use the following instructions to configure Cloud Storage FUSE to cache a large amount of metadata and to bypass metadata expiration:

gcsfuse options
gcsfuse --metadata-cache-ttl-secs=-1 \
      --stat-cache-max-size-mb=-1 \
      --type-cache-max-size-mb=-1 \
      BUCKET_NAME MOUNT_POINT

Replace the following:

Configuration file
metadata-cache:
stat-cache-max-size-mb: -1
ttl-secs: -1
type-cache-max-size-mb: -1
Google Kubernetes Engine
  mountOptions:
      - metadata-cache:ttl-secs:-1
      - metadata-cache:stat-cache-max-size-mb:-1
      - metadata-cache:type-cache-max-size-mb:-1
Compute Engine
gcsfuse --metadata-cache-ttl-secs=-1 \
      --stat-cache-max-size-mb=-1 \
      --type-cache-max-size-mb=-1 \
      BUCKET_NAME MOUNT_POINT

Replace the following:

Before you run a workload, we recommend that you pre-populate the metadata cache, which significantly improves performance and substantially reduces the number of metadata calls to Cloud Storage, particularly if the implicit-dirs field or --implicit-dirs option is used. The Cloud Storage FUSE CSI driver for GKE provides an API that handles pre-populating the metadata cache, see Use metadata prefetch to pre-populate the metadata cache.

Note: Running the ls -R command can pre-populate the metadata cache if the application doesn't do this itself. If the application accesses the entire directory structure from the current location to where ls -R is run from, this can quickly populate the entire metadata cache and list cache, if enabled.

To pre-populate the metadata cache, use one of the following methods:

Google Kubernetes Engine

Set the gcsfuseMetadataPrefetchOnMount CSI volume attribute flag to true:

On Google Kubernetes Engine versions 1.32.1-gke.1357001 or later, you can enable metadata prefetch for a given volume using the gcsfuseMetadataPrefetchOnMount configuration option in the volumeAttributes field of your PersistentVolume definition. The initContainer method isn't needed when you use the gcsfuseMetadataPrefetchOnMount configuration option.

  apiVersion: v1
  kind: PersistentVolume
  metadata:
    name: training-bucket-pv
  spec:
    ...
    csi:
      volumeHandle: BUCKET_NAME
      volumeAttributes:
        ...
        gcsfuseMetadataPrefetchOnMount: "true"
  

Replace the following:

Init container resources may vary depending on bucket contents and hierarchical layout, so consider setting custom metadata prefetch sidecar resources for higher limits.

Linux

Manually run the ls -R command on the Cloud Storage FUSE mount point to recursively list all files and pre-populate the metadata cache:

ls -R MOUNT_POINT > /dev/null

Replace the following:

MOUNT_POINT: the path to your Cloud Storage FUSE mount point.

Compute Engine

Manually run the ls -R command on the Cloud Storage FUSE mount point to recursively list all files and pre-populate the metadata cache:

ls -R MOUNT_POINT > /dev/null

Replace the following:

MOUNT_POINT: the path to your Cloud Storage FUSE mount point.

Enable file caching and parallel downloads

File caching lets you store frequently accessed file data locally on your machine, speeding up repeat reads and reducing Cloud Storage costs. When you enable file caching, parallel downloads are automatically enabled as well. Parallel downloads utilize multiple workers to download a file in parallel using the file cache directory as a prefetch buffer, resulting in nine times faster model load time.

To learn how to enable and configure file caching and parallel downloads, see Enable and configure file caching behavior. To use a sample configuration, see Sample configuration for enabling file caching and parallel downloads.

Cloud GPUs and Cloud TPU considerations for using file caching and parallel downloads

The file cache can be hosted on Local SSDs, RAM, Persistent Disk, or Google Cloud Hyperdisk with the following guidance. In all cases, the data, or individual large file, must fit within the file cache directory's available capacity which is controlled using the max-size-mb field or the --file-cache-max-size-mb option.

Cloud GPUs considerations

Local SSDs are ideal for training data and checkpoint downloads. Cloud GPUs machine types include SSD capacity which can be used, such as A4 machine types that include 12 TiBs of SSD.

Cloud TPU considerations

Cloud TPU don't support Local SSDs. If you use file caching on Cloud TPU without modification, the default location used is the boot volume, which isn't recommended and results in poor performance.

Instead of the boot volume, we recommend using a RAM disk, which is preferred for its performance and no incremental cost. However, a RAM disk is often constrained in size and most useful for serving model weights or checkpoint downloads depending on the size of the checkpoint and the available RAM. Additionally, we recommend using Persistent Disk and Google Cloud Hyperdisk for caching purposes.

Sample configuration for enabling file caching and parallel downloads

By default, the file cache uses a Local SSD if the ephemeral-storage-local-ssd mode is enabled for the Google Kubernetes Engine node. If no Local SSD is available, for example, on Cloud TPU machines, the file cache uses the Google Kubernetes Engine node's boot disk, which is not recommended. In this case, you can use a RAM disk as the cache directory, but consider the amount of RAM available for file caching versus what is needed by the pod.

gcsfuse options
gcsfuse --file-cache-max-size-mb= -1 \
      --file-cache-cache-file-for-range-read= true \
      --file-cache-enable-parallel-downloads= true \
      BUCKET_NAME

Replace the following:

Configuration file
file-cache:
  max-size-mb: -1
  cache-file-for-range-read: true
  enable-parallel-downloads: true
Cloud GPUs
mountOptions:
    - file-cache:max-size-mb:-1
    - file-cache:cache-file-for-range-read:true
    - file-cache:enable-parallel-downloads:true

# RAM disk file cache if LSSD not available. Uncomment to use
# volumes:
#   - name: gke-gcsfuse-cache
#     emptyDir:
#       medium: Memory
Cloud TPU
mountOptions:
    - file-cache:max-size-mb:-1
    - file-cache:cache-file-for-range-read:true
    - file-cache:enable-parallel-downloads:true

volumes:
    - name: gke-gcsfuse-cache
      emptyDir:
        medium: Memory
Compute Engine
gcsfuse --file-cache-max-size-mb: -1 \
      --file-cache-cache-file-for-range-read: true \
      --file-cache-enable-parallel-downloads: true \
      BUCKET_NAME MOUNT_POINT

Replace the following:

Disable negative stat cache entries

By default, Cloud Storage FUSE caches negative stat entries, meaning entries for files that don't exist, with a TTL of five seconds. In workloads where files are frequently created or deleted, such as distributed checkpointing, these cached entries can become stale quickly, which leads to performance issues. To avoid this, we recommend that you disable the negative stat cache for training, serving, and checkpointing workloads using the negative-ttl-secs field or the --metadata-cache-negative-ttl-secs option.

Note: Disabling the negative stat cache requires Cloud Storage FUSE version 2.8, available on Google Kubernetes Engine versions 1.32.1-gke.1200000 or later.

Use the following instructions to disable the negative stat cache:

gcsfuse option
gcsfuse --metadata-cache-negative-ttl-secs= 0 \
  BUCKET_NAME

Replace the following:

Configuration file
metadata-cache:
 negative-ttl-secs: 0
Google Kubernetes Engine
mountOptions:
    - metadata-cache:negative-ttl-secs:0
Compute Engine
gcsfuse --metadata-cache-negative-ttl-secs: 0 \
  BUCKET_NAME MOUNT_POINT

Replace the following:

Enable streaming writes

Streaming writes upload data directly to Cloud Storage as it's written, which reduces latency and disk space usage. This is particularly beneficial for large, sequential writes such as checkpoints. Streaming writes are enabled by default on Cloud Storage FUSE version 3.0 and later.

Note: Streaming writes are designed for sequential writes to a new, single file only. If you modify existing files, or perform out-of-order writes, it can cause Cloud Storage FUSE to automatically revert to the existing behavior of staging writes to a temporary file on disk.

If streaming writes aren't enabled by default, use the following instructions to enable them. Enabling streaming writes requires Cloud Storage FUSE version 3.0 which is available on Google Kubernetes Engine versions 1.32.1-gke.1729000 or later.

gcsfuse option
gcsfuse --enable-streaming-writes= true \
  BUCKET_NAME

Replace the following:

Configuration file
write:
 enable-streaming-writes: true
Google Kubernetes Engine
mountOptions:
    - write:enable-streaming-writes:true
Compute Engine
gcsfuse --enable-streaming-writes: true \
  BUCKET_NAME MOUNT_POINT

Replace the following:

Enable buffered reads

Buffered reads can improve sequential read performance for large files by two to five times by asynchronously prefetching parts of a Cloud Storage object into an in-memory buffer. This allows subsequent reads to be served from the buffer instead of requiring network calls.

Consider the following before you enable buffered reads:

To enable buffered reads, use the following instructions:

CLI options
gcsfuse --enable-buffered-read= true \
  BUCKET_NAME

Replace the following:

Configuration file
write:
 enable-buffered-read: true
Note: We recommend using the --read-global-max-blocks option or read:global-max-blocks field to specify the maximum number of blocks available for buffered reads across all file handles. Increase kernel read-ahead size

For workloads that primarily involve sequential reads of large files such as serving and checkpoint restores, increasing the read-ahead size can significantly enhance performance. This can be done using the read_ahead_kb Linux kernel parameter on your local machine. We recommend that you increase the read_ahead_kb kernel parameter to 1 MB instead of using the default amount of 128 KB that's set on most Linux distributions. For Compute Engine instances, either sudo or root permissions are required to successfully increase the kernel parameter.

Note: Increasing the read-ahead size requires GKE versions 1.32.1-gke.1200000 or later.

To increase the read_ahead_kb kernel parameter to 1 MB for a specific Cloud Storage FUSE mounted directory, use the following instructions. Your bucket must be mounted to Cloud Storage FUSE before you run the command, otherwise, the kernel parameter doesn't increase.

Google Kubernetes Engine
mountOptions:
    - read_ahead_kb=1024
Compute Engine
export MOUNT_POINT=/path/to/mount/point
echo 1024 | sudo tee /sys/class/bdi/0:$(stat -c "%d" $MOUNT_POINT)/read_ahead_kb

Replace the following:

Disable Security Token Service to avoid redundant checks

The Cloud Storage FUSE CSI driver for Google Kubernetes Engine has access checks to ensure pod recoverability due to user misconfiguration of workload identity bindings between the bucket and GKE service account, which can hit default Security Token Service API quotas at scale. This can be disabled by setting the skipCSIBucketAccessCheck volume attribute of the Persistent Volume CSI driver. We recommend that you make sure the GKE service account has the right access to the target Cloud Storage bucket to avoid mount failures for the pod.

Additionally, the Security Token Service quota must be increased beyond the default value of 6000 if a Google Kubernetes Engine cluster consists of more than 6,000 nodes, which can result in 429 errors if not increased in large scale deployments. The Security Token Service quota must be increased through the Quotas and limits page. We recommend that you keep the quota equal to the number of mounts, for example, if there are 10,000 mounts in the cluster, the quota should be increased to 10000.

To set the skipCSIBucketAccessCheck volume attribute, see the following sample configuration:

  volumeAttributes:
      - skipCSIBucketAccessCheck: "true"
   
Other performance considerations

Beyond the primary optimizations discussed, several other factors can significantly impact the overall performance of Cloud Storage FUSE. The following sections describe additional performance considerations we recommend considering when you use Cloud Storage FUSE.

Increase the rename limit for non-HNS buckets

Checkpointing workloads should always be done with a bucket that has hierarchical namespace enabled because of atomic and faster renames and higher QPS for reads and writes. However, if you accept the risk of directory renames not being atomic and taking longer, you can use the rename-dir-limit field or the --rename-dir-limit option if you're performing checkpointing using buckets without hierarchical namespace to specify a limit on the number of files or operations involved in a directory rename operation at any given time.

We recommend specifying this setting to a high value to prevent checkpointing failures. Because Cloud Storage FUSE uses a flat namespace and objects are immutable, a directory rename operation involves renaming and deleting all individual files within the directory. You can control the number of files affected by a rename operation by setting the rename-dir-limit gcsfuse option.

Use the following instructions to set the rename-dir-limit configuration option:

gcsfuse option
gcsfuse --rename-dir-limit= 200000 \
  BUCKET_NAME

Replace the following:

Configuration file
file-system:
 rename-dir-limit: 200000
Google Kubernetes Engine
mountOptions:
    - rename-dir-limit=200000
Compute Engine
gcsfuse --rename-dir-limit: 200000 \
  BUCKET_NAME MOUNT_POINT

Replace the following:

Kernel list caching

The list cache is a cache for directory and file list, or ls, responses that improves the speed of list operations. Unlike the stat and type caches, which are managed by Cloud Storage FUSE, the list cache is kept in the kernel's page cache and is controlled by the kernel based on memory availability.

Enabling kernel list caching is most beneficial for the following use cases:

Enabling kernel list caching should be done with caution and should be used only if the file system is truly read-only with no expected directory content changes during the execution of a job. This is because with this flag, the local application never sees updates, especially if the TTL is set to -1.

For example, Client 1 lists directoryA, which causes directoryA to be a resident in the kernel list cache. Client 2 creates fileB under directoryA in the Cloud Storage bucket. Client 1 continuously checks for fileB in directoryA, which is essentially checking the kernel list cache entry and never goes over the network. Client 1 doesn't see that a new file is in the directory because the list of files is continuously served from the local kernel list cache. Client 1 then times out and the program is broken.

Use the following instruction to enable list caching:

gcsfuse option
gcsfuse --kernel-list-cache-ttl-secs= -1 \
  BUCKET_NAME

Replace the following:

Configuration file
file-system:
 kernel-list-cache-ttl-secs: -1
Google Kubernetes Engine
mountOptions:
    - file-system:kernel-list-cache-ttl-secs:-1
Compute Engine
gcsfuse --kernel-list-cache-ttl-secs: -1 \
  BUCKET_NAME MOUNT_POINT

Replace the following:

When you use the file-system:kernel-list-cache-ttl-secs mount option, the values mean the following:

Use JAX persistent compilation (JIT) cache with Cloud Storage FUSE

JAX supports Just-In-Time (JIT) cache, an optional persistent compilation cache that stores compiled function artifacts. When you use this cache, you can significantly speed up subsequent script executions by avoiding redundant compilation steps.

To enable JIT caching, you must meet the following requirements:

What's next

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.5