Documentation: https://tamp.readthedocs.io/en/latest/
Source Code: https://github.com/BrianPugh/tamp
Online Demo: https://brianpugh.github.io/tamp
Tamp is a low-memory, DEFLATE-inspired lossless compression library optimized for embedded and resource-constrained environments.
Tamp delivers the highest data compression ratios, while using the least amount of RAM and firmware storage.
tamp/__init__.py
, tamp/compressor.py
, tamp/decompressor.py
pip install tamp
will use a python-bound C implementation optimized for speed.mpy_bindings/
tamp/__init__.py
, tamp/compressor_viper.py
, tamp/decompressor_viper.py
tamp/_c_src/
wasm/
Tamp contains 4 implementations:
This section instructs how to install each implementation.
The Tamp library and CLI requires Python >=3.8
and can be installed via:
Tamp provides pre-compiled [native modules]{.title-ref} that are easy to install, are small, and are incredibly fast.
Download the appropriate .mpy
file from the release page.
armv6m
for a pi pico).Rename the file to tamp.mpy
and transfer it to your board. If using Belay, tamp can be installed by adding the following to pyproject.toml
.
[tool.belay.dependencies] tamp = "https://github.com/BrianPugh/tamp/releases/download/v1.7.0/tamp-1.7.0-mpy1.23-armv6m.mpy"
NOT RECOMMENDED, PLEASE USE NATIVE MODULE
For micropython use, there are 3 main files:
tamp/__init__.py
- Always required.tamp/decompressor_viper.py
- Required for on-device decompression.tamp/compressor_viper.py
- Required for on-device compression.For example, if on-device decompression isn't used, then do not include decompressor_viper.py
. If manually installing, just copy these files to your microcontroller's /lib/tamp
folder.
If using mip, tamp can be installed by specifying the appropriate package-*.json
file.
mip install github:brianpugh/tamp # Defaults to package.json: Compressor & Decompressor mip install github:brianpugh/tamp/package-compressor.json # Compressor only mip install github:brianpugh/tamp/package-decompressor.json # Decompressor only
If using Belay, tamp can be installed by adding the following to pyproject.toml
.
[tool.belay.dependencies] tamp = [ "https://github.com/BrianPugh/tamp/blob/main/tamp/__init__.py", "https://github.com/BrianPugh/tamp/blob/main/tamp/compressor_viper.py", "https://github.com/BrianPugh/tamp/blob/main/tamp/decompressor_viper.py", ]
Copy the tamp/_c_src/tamp
folder into your project. For more information, see the documentation.
Tamp works on desktop python and micropython. On desktop, Tamp is bundled with the tamp
command line tool for compressing and decompressing tamp files.
Use tamp compress
to compress a file or stream. If no input file is specified, data from stdin will be read. If no output is specified, the compressed output stream will be written to stdout.
$ tamp compress --help Usage: tamp compress [ARGS] [OPTIONS] Compress an input file or stream. ╭─ Parameters ───────────────────────────────────────────────────────────────────────────────╮ │ INPUT,--input -i Input file to compress. Defaults to stdin. │ │ OUTPUT,--output -o Output compressed file. Defaults to stdout. │ │ --window -w Number of bits used to represent the dictionary window. [default: 10] │ │ --literal -l Number of bits used to represent a literal. [default: 8] │ ╰────────────────────────────────────────────────────────────────────────────────────────────╯
Example usage:
tamp compress enwik8 -o enwik8.tamp # Compress a file echo "hello world" | tamp compress | wc -c # Compress a stream and print the compressed size.
The following options can impact compression ratios and memory usage:
window
- 2^window
plaintext bytes to look back to try and find a pattern. A larger window size will increase the chance of finding a longer pattern match, but will use more memory, increase compression time, and cause each pattern-token to take up more space. Try smaller window values if compressing highly repetitive data, or short messages.literal
- Number of bits used in each plaintext byte. For example, if all input data is 7-bit ASCII, then setting this to 7 will improve literal compression ratios by 11.1%. The default, 8-bits, can encode any binary data.Use tamp decompress
to decompress a file or stream. If no input file is specified, data from stdin will be read. If no output is specified, the compressed output stream will be written to stdout.
$ tamp decompress --help Usage: tamp decompress [ARGS] [OPTIONS] Decompress an input file or stream. ╭─ Parameters ───────────────────────────────────────────────────────────────────────────────╮ │ INPUT,--input -i Input file to decompress. Defaults to stdin. │ │ OUTPUT,--output -o Output decompressed file. Defaults to stdout. │ ╰────────────────────────────────────────────────────────────────────────────────────────────╯
Example usage:
tamp decompress enwik8.tamp -o enwik8 echo "hello world" | tamp compress | tamp decompress
The python library can perform one-shot compression, as well as operate on files/streams.
import tamp # One-shot compression string = b"I scream, you scream, we all scream for ice cream." compressed_data = tamp.compress(string) reconstructed = tamp.decompress(compressed_data) assert reconstructed == string # Streaming compression with tamp.open("output.tamp", "wb") as f: for _ in range(10): f.write(string) # Streaming decompression with tamp.open("output.tamp", "rb") as f: reconstructed = f.read()
In the following section, we compare Tamp against:
All of these are LZ-based compression algorithms, and tests were performed using a 1KB (10 bit) window. Since zlib already uses significantly more memory by default, the lowest memory level (memLevel=1
) was used in these benchmarks. It should be noted that higher zlib memory levels will having greater compression ratios than Tamp. Currently, there is no micropython-compatible zlib or heatshrink compression implementation, so these numbers are provided simply as a reference.
The following table shows compression algorithm performance over a variety of input data sourced from the Silesia Corpus and Enwik8. This should give a general idea of how these algorithms perform over a variety of input data types.
dataset raw tamp tamp (LazyMatching) zlib heatshrink enwik8 100,000,000 51,635,633 51,252,113 56,205,166 56,110,394 build/silesia/dickens 10,192,446 5,546,761 5,511,604 6,049,169 6,155,768 build/silesia/mozilla 51,220,480 25,121,385 24,936,067 25,104,966 25,435,908 build/silesia/mr 9,970,564 5,027,032 4,886,272 4,864,734 5,442,180 build/silesia/nci 33,553,445 8,643,610 8,645,299 5,765,521 8,247,487 build/silesia/ooffice 6,152,192 3,814,938 3,798,261 4,077,277 3,994,589 build/silesia/osdb 10,085,684 8,520,835 8,506,443 8,625,159 8,747,527 build/silesia/reymont 6,627,202 2,847,981 2,820,870 2,897,661 2,910,251 build/silesia/samba 21,606,400 9,102,594 9,060,692 8,862,423 9,223,827 build/silesia/sao 7,251,944 6,137,755 6,101,744 6,506,417 6,400,926 build/silesia/webster 41,458,703 18,694,172 18,567,288 20,212,235 19,942,817 build/silesia/x-ray 8,474,240 7,510,606 7,405,814 7,351,750 8,059,723 build/silesia/xml 5,345,280 1,681,687 1,672,660 1,586,985 1,665,179Tamp usually out-performs heatshrink, and is generally very competitive with zlib. While trying to be an apples-to-apples comparison, zlib still uses significantly more memory during both compression and decompression (see next section). Tamp accomplishes competitive performance while using around 10x less memory.
Lazy Matching is a simple technique to improve compression ratios at the expense of CPU while requiring very little code. One can expect 50-75% more CPU usage for modest compression gains (around 0.5 - 2.0%). Because of this poor trade-off, it is disabled by default; however, in applications where we want to compress once on a powerful machine (like a desktop/server) and decompress on an embedded device, it may be worth it to spend a bit more compute. Lazy matched compressed data is the exact same format; it appears no different to the tamp decoder.
One might wonder "Why did Tamp perform so much worse than zlib on the nci dataset?" The nci
dataset contains highly compressible data with long patterns. For example, the following 49-character text appears repeatedly in the dataset:
0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
Tamp's maximum pattern length peaks at around 15 characters, meaning that these 49 characters has to be compressed as 4 pattern-matches. Zlib can handle patterns with a maximum length of 258, meaning that it can encode this highly repeating data more efficiently. Given Tamp's excellent performance in most of the other data compression benchmark files, this is a good tradeoff for most real-world scenarios.
The following table shows approximately how much memory each algorithm uses during compression and decompression.
Compression Decompression Tamp (1 << windowBits) (1 << windowBits) ZLib (1 << (windowBits + 2)) + 7KB (1 << windowBits) + 7KB Heatshrink (1 << (windowBits + 1)) (1 << (windowBits + 1)) Deflate (micropython) (1 << windowBits) (1 << windowBits)All libraries have a few dozen bytes of overhead in addition to the primary window buffer, but are implementation-specific and ignored for clarity here. Tamp uses significantly less memory than ZLib, and half the memory of Heatshrink.
As a rough benchmark, here is the performance (in seconds) of these different compression algorithms on the 100MB enwik8 dataset. These tests were performed on an M1 Macbook Air.
Compression (s) Decompression (s) Tamp (Python Reference) 109.5 76.0 Tamp (C) 16.45 0.142 ZLib 0.98 0.98 Heatshrink (with index) 6.22 0.82 Heatshrink (without index) 41.73 0.82Heatshrink v0.4.1 was used in these benchmarks. When heathshrink uses an index, an additional (1 << (windowBits + 1))
bytes of memory are used, resulting in 4x more memory-usage than Tamp. Tamp could use a similar indexing to increase compression speed, but has chosen not to to focus on the primary goal of a low-memory compressor.
To give an idea of Tamp's speed on an embedded device, the following table shows compression/decompression in bytes/second of the first 100KB of enwik8 on a pi pico (rp2040) at the default 125MHz clock rate. The C benchmark does not use a filesystem nor dynamic memory allocation, so it represents the maximum speed Tamp can achieve. In all tests, a 1KB window (10 bit) was used.
Compression (bytes/s) Decompression (bytes/s) Tamp (MicroPython Viper) 4,300 42,000 Tamp (Micropython Native Module) 12,770 644,000 Tamp (C) 28,500 1,042,524 Deflate (micropython builtin) 6,715 146,477Tamp resulted in a 51637 byte archive, while Micropython's builtin deflate
resulted in a larger, 59442 byte archive.
To give an idea on the resulting binary sizes, Tamp and other libraries were compiled for the Pi Pico (armv6m
). All libraries were compiled with -O3
. Numbers reported in bytes.
Heatshrink doesn't include a high level API; in an apples-to-apples comparison the Tamp library would be even smaller.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4