⚡ DeekSeek v3/R1
model support.
⚡ New flexible weight packing
: allow quantized weights to be packed to [int32, int16, int8]
dtypes. Triton and Torch kernels supports full range of new QuantizeConfig.pack_dtype.
⚡ Over 50% speedup for vl
model quantization (Qwen 2.5-VL + Ovis)
⚡ New auto_gc: bool
control in quantize()
which can reduce quantization time for small model with no chance of oom.
⚡ New GPTQModel.push_to_hub()
api for easy quant model upload to HF repo.
⚡ New buffered_fwd: bool
control in model.quantize().
🐛 Fixed bits=3
packing and group_size=-1
regression in v1.7.4.
🐛 Fixed Google Colab install requiring two install passes
🐛 Fixed Python 3.10 compatibility
pack_dtype
to dynamic config and fix validate by @Qubitium in #1178max_memory
arg by @Qubitium in #1197buffered_fwd
quantize control by @Qubitium in #1205GPTQModel.push_to_hub()
support by @Qubitium in #1216Full Changelog: v1.7.4...v1.8.1
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4