This document provides an overview of Cloud Storage FUSE, a FUSE adapter that lets you mount and access Cloud Storage buckets as local file systems, so applications can read and write objects in your bucket using standard file system semantics.
This documentation always reflects the latest version of Cloud Storage FUSE. For details on the latest version, see Cloud Storage FUSE releases on GitHub.
Cloud Storage FUSE is an open source product supported by Google. Cloud Storage FUSE uses FUSE and Cloud Storage APIs to transparently expose buckets as locally mounted folders on your file system.
To use Cloud Storage FUSE, you install the Cloud Storage FUSE package on a machine with a compatible operating system such as Linux or Windows Subsystem for Linux, ensure proper Google Cloud authentication and permissions, and then execute the gcsfuse
command to mount a specific Cloud Storage bucket to a local directory.
Cloud Storage FUSE is integrated with other Google Cloud services. For example, the Cloud Storage FUSE CSI driver lets you use the Google Kubernetes Engine (GKE) API to consume buckets as volumes, so you can read from and write to Cloud Storage from within your Kubernetes pods. For more information on other integrations, see Integrations.
How Cloud Storage FUSE worksCloud Storage FUSE works by translating object storage names into a directory-like structure, interpreting the slash character (/
) in object names as a directory separator. Objects with the same common prefix are treated as files in the same directory, allowing applications to interact with the mounted bucket like a file system. Objects can also be organized into a logical file system structure using hierarchical namespace, which lets you organize objects into folders.
Cloud Storage FUSE can be run from anywhere with connectivity to Cloud Storage, including Google Kubernetes Engine, Compute Engine VMs, or on-premises systems.
Cloud Storage FUSE for machine learningCloud Storage FUSE is ideal for use cases where Cloud Storage has the right performance and scalability characteristics for an application that requires file system semantics. For example, Cloud Storage FUSE is useful for machine learning (ML) projects because it provides a way to store data, models, checkpoints, and logs directly in Cloud Storage. For more information, see Cloud Storage FUSE for ML workloads.
Cloud Storage FUSE is a common choice for developers looking to store and access ML training and model data as objects in Cloud Storage. Cloud Storage FUSE provides several benefits for developing ML projects:
Cloud Storage FUSE lets you mount Cloud Storage buckets as a local file system so your applications can access training and model data using standard file system semantics. This means that you can avoid the cost of rewriting or refactoring your application's code when using Cloud Storage to store ML data.
From training to inference, Cloud Storage FUSE lets you use the built-in high scalability, performance, and cost effectiveness of Cloud Storage, so you can run your ML workloads at scale.
Cloud Storage FUSE lets you start training jobs quickly by providing compute resources with direct access to data in Cloud Storage, so you don't need to download training data to the compute resource.
For more information, see Frameworks, operating systems, and architectures supported by Cloud Storage FUSE.
Frameworks, operating systems, and architecturesCloud Storage FUSE has been validated with the following frameworks:
TensorFlow V2.x
TensorFlow V1.x
PyTorch V2.x
PyTorch V1.x
JAX 0.4.x
Cloud Storage FUSE supports the following operating systems and architectures:
Rocky Linux 8.9 or later
Ubuntu 18.04 or later
Debian 10 or later
CentOS 7.9 or later
RHEL 7.9 or later
SLES 15 or later
x86_64
ARM64
Cloud Storage FUSE integrates with the following Google Cloud products:
For a list of Google Cloud products that are integrated with Cloud Storage generally, see Integration with Google Cloud services and tools.
CachingCloud Storage FUSE offers four types of caching to help increase performance and reduce cost: file caching, stat caching, type caching, and list caching. For more information about these caches, see Overview of caching.
Directory semanticsCloud Storage offers buckets with a flat namespace and buckets with hierarchical namespace enabled. By default, Cloud Storage FUSE can infer explicitly-defined directories, also known as folders, in buckets with hierarchical namespace enabled but it can't infer implicitly-defined directories in buckets with a flat namespace, including simulated folders and managed folders.
Explicitly defined directories are folders that are represented by their own objects in Cloud Storage buckets. Implicitly defined directories are directories that don't have their own corresponding objects in Cloud Storage buckets.
For example, say you mount a bucket named my-bucket
, which contains an object named my-directory/my-object.txt
, where my-directory/
is a simulated folder. When you run ls
on the bucket mount point, by default, Cloud Storage FUSE cannot access the simulated directory my-bucket/my-directory/
nor the object my-object.txt
within it. To enable Cloud Storage FUSE to infer the simulated folder and the object within it, include the --implicit-dirs
gcsfuse
option or the implicit-dirs
configuration file field as part of your gcsfuse mount
command when mounting a flat namespace bucket.
If you need to store and access your data using a file system, use buckets with hierarchical namespace enabled. To learn how to create such buckets, see Create buckets with hierarchical namespace enabled.
For more information about directory semantics, including how to mount buckets with implicitly-defined directories, see Files and directories in the Cloud Storage FUSE GitHub documentation.
Cloud Storage FUSE retry strategiesBy default, failed requests from Cloud Storage FUSE to Cloud Storage are retried with exponential backoff up to a specified maximum backoff duration, which has a value of 30s
(30 seconds) by default. Once the backoff duration exceeds the specified maximum duration, the retry continues with the specified maximum duration. You can use the --max-retry-sleep
option or the gcs-retries:max-retry-sleep
field as part of a gcsfuse mount
call to specify the backoff duration.
GET
or READ
requests
When you perform a GET
or READ
request with Cloud Storage FUSE, a timeout period is applied. If the request exceeds the timeout period, Cloud Storage FUSE cancels the request and retries using an exponential backoff algorithm.
The timeout is dynamic and is based on the 99th percentile latency of past successful or canceled GET
or READ
requests, with a 1.5-second minimum. This ensures that only the slowest 1% of requests, those exceeding the 99th percentile historical latency, are retried.
Large file writes are uploaded in chunks. To help reduce tail end write latencies, if a chunk-level write operation stalls or fails, Cloud Storage FUSE attempts a retry after 10 seconds. A maximum of four retry operations are performed for each stalled chunk.
Cloud Storage FUSE operations associated with Cloud Storage operationsWhen you perform an operation using Cloud Storage FUSE, you also perform the Cloud Storage operations associated with the Cloud Storage FUSE operation. The following table describes common Cloud Storage FUSE commands and their associated Cloud Storage JSON API operations. You can display information about the Cloud Storage FUSE operations by setting the --log-severity
option or the logging:severity
field to TRACE
in your gcsfuse
command.
gcsfuse --log-severity=TRACE example-bucket mp
Objects.list (to check credentials) cd mp
n/a ls mp
Objects.list("") mkdir subdir
Objects.get("subdir")
Objects.get("subdir/")
Objects.insert("subdir/")
cp ~/local.txt subdir/
Objects.get("subdir/local.txt")
Objects.get("subdir/local.txt/")
Objects.insert("subdir/local.txt"), to create an empty object
Objects.insert("subdir/local.txt"), when closing after done writing
rm -rf subdir
Objects.list("subdir")
Objects.list("subdir/")
Objects.delete("subdir/local.txt")
Objects.list("subdir/")
Objects.delete("subdir/")
MetricsCloud Storage offers in-depth metrics which can help you optimize Cloud Storage FUSE performance and costs. To learn more about metrics for Cloud Storage FUSE, see Cloud Storage FUSE Metrics.
SecurityCloud Storage FUSE applies Google Cloud's standard authentication through Application Default Credentials to identify users or service accounts. Access to the contents within buckets is governed by Identity and Access Management permissions. Local system permissions secure the mount point itself and any locally cached data.
Pricing for Cloud Storage FUSECloud Storage FUSE is available free of charge, but the storage, metadata, and network I/O it generates to and from Cloud Storage are charged like any other Cloud Storage interface. In other words, all data transfer and operations performed by Cloud Storage FUSE map to Cloud Storage transfers and operations, and are charged accordingly. For more information on common Cloud Storage FUSE operations and how they map to Cloud Storage operations, see operations mapping.
To avoid surprises, you should estimate how your use of Cloud Storage FUSE translates to Cloud Storage charges. For example, if you are using Cloud Storage FUSE to store log files, you can incur charges quickly if logs are aggressively flushed on hundreds or thousands of machines at the same time.
See Cloud Storage pricing for information on charges such as storage, network usage, and operations.
LimitationsWhile Cloud Storage FUSE has a file system interface, it is not like an NFS or CIFS file system on the backend. Additionally, Cloud Storage FUSE is not POSIX compliant. For a POSIX file system product in Google Cloud, see Filestore.
When using Cloud Storage FUSE, be aware of its limitations and semantics, which are different than that of POSIX file systems. Cloud Storage FUSE should only be used within its capabilities.
Limitations and differences from POSIX file systemsThe following list describes the limitations of Cloud Storage FUSE:
syscall.ESTALE
error when attempting to save their edits due to precondition checks. To ensure that your data is consistently written, we strongly recommend against multiple sources modifying the same object.
Note that multiple readers can access the same or different objects within a bucket, and multiple writers can modify different objects in the same bucket simultaneously. Concurrent writes to the same Cloud Storage object are supported from the same mount and behave similarly to built-in file systems.
objects with content-encoding: gzip
in metadata: Any such object in a Cloud Storage FUSE-mounted directory does not undergo decompressive transcoding. Instead, the object remains compressed in the same manner that it's stored in the bucket.
For example, a file of 1000 bytes, uploaded to a bucket using the gcloud storage cp
command with the --gzip-local
flag, might become 60 bytes (the actual compressed size depends on the content and the gzip implementation used by the gcloud CLI) as a Cloud Storage object. If the bucket is mounted using `gcsfuse`, and the corresponding file is listed or read from the mount directory, its size is returned as 60 bytes, and its contents are a compressed version of the original 1000 bytes content.
This is in contrast to a download using gcloud storage cp gs://bucket/path /local/path
which undergoes decompressive transcoding: in the gcloud
command, the content is auto-decompressed during the download, and the original, uncompressed content is served.
content-encoding: gzip
can produce unpredictable behavior. This is because Cloud Storage FUSE uploads the object content as it is (without compressing it) while retaining content-encoding: gzip
, and if this content is not properly gzip-compressed, it might fail in being read from the server by other clients, such as the gcloud CLI. This is because other clients employ decompressive transcoding while reading, and it fails for improper gzip content.Cloud Storage FUSE supports reading objects from buckets with a retention policy, but the bucket must be mounted as Read-Only
by passing the -o RO
flag during bucket mounting.
rsync
limitations: Cloud Storage FUSE's file system latency affects rsync
, which reads and writes only one file at a time. To transfer multiple files to or from your bucket in parallel, use the Google Cloud CLI by running gcloud storage rsync
. For more information, see the rsync
documentation.ls
, Cloud Storage FUSE calls the Objects: list API on Cloud Storage. The API paginates results, which means that Cloud Storage FUSE might need to issue multiple calls, depending on how many objects are in your bucket, which can make a list operation expensive and slow.For a list of known issues in Cloud Storage FUSE, see the open Cloud Storage FUSE issues in GitHub.
Get supportYou can get support, submit general questions, and request new features by using one of Google Cloud's official support channels. You can also get support by filing issues in GitHub.
For solutions to commonly-encountered issues, see Troubleshooting for production issues in the Cloud Storage FUSE GitHub documentation.
What's nextLearn how to install the Cloud Storage FUSE CLI.
Discover Cloud Storage FUSE by completing a quickstart.
Learn how to mount buckets.
Learn how to configure the behavior of Cloud Storage FUSE, using the gcsfuse
CLI or a Cloud Storage FUSE configuration file.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.5