xet-core enables huggingface_hub to utilize xet storage for uploading and downloading to HF Hub. Xet storage provides chunk-based deduplication, efficient storage/retrieval with local disk caching, and backwards compatibility with Git LFS. This library is not meant to be used directly, and is instead intended to be used from huggingface_hub.
โป chunk-based deduplication implementation: avoid transferring and storing chunks that are shared across binary files (models, datasets, etc).
๐ค Python bindings: bindings for huggingface_hub package.
โ network communications: concurrent communication to HF Hub Xet backend services (CAS).
๐ local disk caching: chunk-based cache that sits alongside the existing huggingface_hub disk cache.
Repo Organization - Rust CratesTo build xet-core, look at requirements in GitHub Actions CI Workflow for the Rust toolchain to install. Follow Rust documentation for installing rustup and that version of the toolchain. Use the following steps for building, testing, benchmarking.
Many of us on the team use VSCode, so we have checked in some settings in the .vscode directory. Install the rust-analyzer extension.
Build:
Test:
Benchmark:
Linting:
cargo clippy -r --verbose -- -D warnings
Formatting (requires nightly toolchain):
cargo +nightly fmt --manifest-path ./Cargo.toml --all
Building Python package and running locally (on *nix systems):
python3 -mvenv ~/venv
source ~/venv/bin/activate
pip3 install maturin ipython
cd hf_xet
maturin develop
ipython
import hfxet
hfxet.upload_files()
hfxet.download_files()
Building universal whl for MacOS:
From hf_xet directory:
MACOSX_DEPLOYMENT_TARGET=10.9 maturin build --release --target universal2-apple-darwin --features openssl_vendored
Note: You may need to install x86_64: rustup target add x86_64-apple-darwin
Unit-tests are run with cargo test
, benchmarks are run with cargo bench
. Some crates have a main.rs that can be run for manual testing.
Please join us in making xet-core better. We value everyone's contributions. Code is not the only way to help. Answering questions, helping each other, improving documentation, filing issues all help immensely. If you are interested in contributing (please do!), check out the contribution guide for this repository.
To limit the size our our built binaries, we are releasing python wheels with binaries that are stripped of debugging symbols. If you encounter a panic while running hf-xet, you can use the debug symbols to help identify the part of the library that failed.
Here are the recommended steps:
pip show hf-xet
. The Location
field will show the location of all the site packages. The hf_xet
package will be within that directory.hf_xet.pdb
libhf_xet-macosx-x86_64.dylib.dSYM
for Intel based Macs and libhf_xet-macosx-aarch64.dylib.dSYM
for Apple Silicon.cat
the WHEEL
file name within the hf_xet.dist-info
directory in your site packages. The wheel file will have the linux build and architecture in the file name. Eg: cat /home/ubuntu/.venv/lib/python3.12/site-packages/hf_xet-*.dist-info/WHEEL
. You will use the file named hf_xet-<manylinux | musllinux>-<x86_64 | arm64>.abi3.so.dbg
choosing the distribution and platform that matches your wheel. Eg: hf_xet-manylinux-x86_64.abi3.so.dbg
.hf_xet
. Eg: cp -r hf_xet-1.1.2-manylinux-x86_64.abi3.so.dbg /home/ubuntu/.venv/lib/python3.12/site-packages/hf_xet
RUST_BACKTRACE=full
and recreate your failure.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4