fsspec
: Filesystem interfaces for Python
Filesystem Spec (fsspec
) is a project to provide a unified pythonic interface to local, remote and embedded file systems and bytes storage.
There are many places to store bytes, from in memory, to the local disk, cluster distributed storage, to the cloud. Many files also contain internal mappings of names to bytes, maybe in a hierarchical directory-oriented tree. Working with all these different storage media, and their associated libraries, is a pain. fsspec
exists to provide a familiar API that will work the same whatever the storage backend. As much as possible, we iron out the quirks specific to each implementation, so you need do no more than provide credentials for each service you access (if needed) and thereafter not have to worry about the implementation again.
fsspec
provides two main concepts: a set of filesystem classes with uniform APIs (i.e., functions such as cp
, rm
, cat
, mkdir
, …) supplying operations on a range of storage systems; and top-level convenience functions like fsspec.open()
, to allow you to quickly get from a URL to a file-like object that you can use with a third-party library or your own code.
The section Background gives motivation and history of this project, but most users will want to skip straight to Usage to find out how to use the package and Features of fsspec to see the long list of added functionality included along with the basic file-system interface.
Who usesfsspec
?
You can use fsspec
’s file objects with any python function that accepts file objects, because of duck typing.
You may well be using fsspec
already without knowing it. The following libraries use fsspec
internally for path and file handling:
Dask, the parallel, out-of-core and distributed programming platform
Intake, the data source cataloguing and loading library and its plugins
pandas, the tabular data analysis package
xarray and zarr, multidimensional array storage and labelled operations
DVC, version control system for machine learning projects
Kedro, a Python framework for reproducible, maintainable and modular data science code
pyxet, a Python library for mounting and accessing very large datasets from XetHub
Huggingface🤗 Datasets, a popular library to load&manipulate data for Deep Learning models
fsspec
filesystems are also supported by:
pyarrow, the in-memory data layout engine
petl, a general purpose package for extracting, transforming and loading tables of data.
… plus many more that we don’t know about.
Installationfsspec
can be installed from PyPI or conda and has no dependencies of its own
pip install fsspec conda install -c conda-forge fsspec
Not all filesystem implementations are available without installing extra dependencies. For example to be able to access data in GCS, you can use the optional pip install syntax below, or install the specific package required
pip install fsspec[gcs] conda install -c conda-forge gcsfs
fsspec
attempts to provide the right message when you attempt to use a filesystem for which you need additional dependencies. The current list of known implementations can be found as follows
from fsspec.registry import known_implementations known_implementations
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4