Showing content from https://github.com/mudler/LocalAI/releases/tag/v3.2.0 below:
Release v3.2.0 · mudler/LocalAI · GitHub
🚀 LocalAI 3.2.0
Welcome to LocalAI 3.2.0! This is a release that refactors our architecture to be more flexible and lightweight.
The core is now separated from all the backends, making LocalAI faster to download, easier to manage, portable, and much more smaller.
TL;DR – What’s New in LocalAI 3.2.0 🎉
- 🧩 Modular Backends: All backends now live outside the main binary in our new Backend Gallery. This means you can update, add, or manage backends independently of LocalAI releases.
- 📉 Leaner Than Ever: The LocalAI binary and container images are drastically smaller, making for faster downloads and a reduced footprint.
- 🤖 Smart Backend Installation: It just works! When you install a model, LocalAI automatically detects your hardware (CPU, NVIDIA, AMD, Intel) and downloads the necessary backend. No more manual configuration!
- 🛠️ Simplified Build Process: The new modular architecture significantly simplifies the build process for contributors and power users.
- ⚡️ Intel GPU Support for Whisper: Transcription with Whisper can now be accelerated on Intel GPUs using SYCL, bringing more hardware options to our users.
- 🗣️ Enhanced Realtime Audio: We've added speech started and stopped events for more interactive applications and OpenAI-compatible support for the input_audio field in the chat API.
- 🧠 Massive Model Expansion: The gallery has been updated with over 50 new models, including the latest from
Qwen3
, Gemma
, Mistral
, Nemotron
, and more!
Note: CI is in the process of building all the backends for this release and will be available soon - if you hit any issue, please try in a few, thanks for understanding!
Note: Some parts of the documentation and the installation scripts (that download the release binaries) have to yet be adapted to the latest changes and/or might not reflect the current state
A New Modular Architecture 🧩
The biggest change in v3.2.0 is the complete separation of inference backends from the core LocalAI binary. Backends like llama.cpp, whisper.cpp, piper, and stablediffusion-ggml are no longer bundled in.
This fundamental shift makes LocalAI:
- Lighter: Significantly smaller binary and container image sizes.
- More Flexible: Update backends anytime from the gallery without waiting for a new LocalAI release.
- Easier to Maintain: A cleaner, more streamlined codebase for faster development.
- Easier to Customize: you can build your own backends and install them in your LocalAI instances.
Smart, Automatic Backend Installation 🤖
To make the new modular system seamless, LocalAI now features automatic backend installation.
When you install a model from the gallery (or a YAML file), LocalAI intelligently detects the required backend and your system's capabilities, then downloads the correct version for you. Whether you're running on a standard CPU, an NVIDIA GPU, an AMD GPU, or an Intel GPU, LocalAI handles it automatically.
For advanced use cases or to override auto-detection, you can use the LOCALAI_FORCE_META_BACKEND_CAPABILITY environment variable. Here are the available options:
default
: Forces CPU-only backend. This is the fallback if no specific hardware is detected.
nvidia
: Forces backends compiled with CUDA support for NVIDIA GPUs.
amd
: Forces backends compiled with ROCm support for AMD GPUs.
intel
: Forces backends compiled with SYCL/oneAPI support for Intel GPUs.
The Backend Gallery & CLI Control 🖼️
You are in full control. You can browse, install, and manage all available backends directly from the WebUI or using the new CLI commands:
# List all available backends in the gallery
local-ai backends list
# Install a specific backend (e.g., llama-cpp)
local-ai backends install llama-cpp
# Uninstall a backend
local-ai backends uninstall llama-cpp
For development, offline or air-gapped environments, you can now also install backends directly from a local OCI tar file:
local-ai backends install "ocifile://<PATH_TO_TAR_FILE>"
Other Key Improvements
- 🗣️ Enhanced Realtime and Audio APIs: Building voice-activated applications is now easier.
- The new speech started and stopped events give you precise control over realtime audio streams.
- We now support the input_audio field in the /v1/chat/completions endpoint for multimodal audio inputs, improving OpenAI compatibility.
- ⚡️ Intel GPU Acceleration for Whisper: Our Whisper backend now supports SYCL, enabling hardware-accelerated transcriptions on Intel GPUs.
- ✅ UI and Bug Fixes: We've squashed several bugs for a smoother experience, including a fix that correctly shows the download status for backend images in the gallery, so you always know what's happening.
- 🧠 Massive Model Gallery Expansion: Our model gallery has never been bigger! We've added over 50 new and updated models, with a focus on powerful new releases like qwen3, devstral-small, and nemotron.
🚨 Important Note for Upgrading
Due to the new modular architecture, if you have existing models installed with a version prior to 3.2.0, they might not have a specific backend assigned.
After upgrading, you may need to install the required backend manually for these models to work. You can do this easily from the WebUI or via the CLI: local-ai backends install <backend_name>
.
The Complete Local Stack for Privacy-First AI LocalAI
The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.
Link: https://github.com/mudler/LocalAI
LocalAGI
A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.
Link: https://github.com/mudler/LocalAGI
LocalRecall
A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI.
Link: https://github.com/mudler/LocalRecall
Thank you! ❤️
A massive THANK YOU to our incredible community and our sponsors! LocalAI has over 34,100 stars, and LocalAGI has already rocketed past 900+ stars!
As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time and our sponsors to provide us the hardware! If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!
👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI
Full changelog 👇 👉 Click to expand 👈 What's Changed Breaking Changes 🛠
- feat: do not bundle llama-cpp anymore by @mudler in #5790
- feat: refactor build process, drop embedded backends by @mudler in #5875
Bug fixes 🐛
Exciting New Features 🎉
- feat(llama.cpp): allow to set kv-overrides by @mudler in #5745
- feat(backends): add metas in the gallery by @mudler in #5784
- feat(system): detect and allow to override capabilities by @mudler in #5785
- chore(cli): add backends CLI to manipulate and install backends by @mudler in #5787
- feat(whisper): Enable SYCL by @richiejp in #5802
- feat(cli): allow to install backends from OCI tar files by @mudler in #5816
- feat(cli): add command to create custom OCI images from directories by @mudler in #5844
- feat(realtime): Add speech started and stopped events by @richiejp in #5856
- fix: autoload backends when installing models from YAML files by @mudler in #5859
- feat: split piper from main binary by @mudler in #5858
- feat: remove stablediffusion-ggml from main binary by @mudler in #5861
- feat: split whisper from main binary by @mudler in #5863
- feat(openai): support input_audio chat api field by @mgoltzsche in #5870
- fix(realtime): Reset speech started flag on commit by @richiejp in #5879
- fix(build): Add and update ONEAPI_VERSION by @richiejp in #5874
🧠 Models
- chore(model gallery): add qwen3-55b-a3b-total-recall-v1.3-i1 by @mudler in #5746
- chore(model gallery): add qwen3-55b-a3b-total-recall-deep-40x by @mudler in #5747
- chore(model gallery): add qwen3-42b-a3b-stranger-thoughts-deep20x-abliterated-uncensored-i1 by @mudler in #5748
- chore(model gallery): add mistral-small-3.2-46b-the-brilliant-raconteur-ii-instruct-2506 by @mudler in #5749
- chore(model gallery): add qwen3-22b-a3b-the-harley-quinn by @mudler in #5750
- chore(model gallery): add gemma-3-4b-it-max-horror-uncensored-dbl-x-imatrix by @mudler in #5751
- chore(model gallery): add qwen3-33b-a3b-stranger-thoughts-abliterated-uncensored by @mudler in #5755
- chore(model gallery): add thedrummer_anubis-70b-v1.1 by @mudler in #5771
- chore(model gallery): add steelskull_l3.3-shakudo-70b by @mudler in #5772
- chore(model gallery): add pinkpixel_crystal-think-v2 by @mudler in #5773
- chore(model gallery): add helpingai_dhanishtha-2.0-preview by @mudler in #5791
- chore(model gallery): add agentica-org_deepswe-preview by @mudler in #5792
- chore(model gallery): add zerofata_ms3.2-paintedfantasy-visage-33b by @mudler in #5793
- chore(model gallery): add ockerman0_anubislemonade-70b-v1 by @mudler in #5794
- chore(model gallery): add sicariussicariistuff_impish_llama_4b by @mudler in #5799
- chore(model gallery): add nano_imp_1b-q8_0 by @mudler in #5800
- chore(model gallery): add compumacy-experimental-32b by @mudler in #5803
- chore(model gallery): add mini-hydra by @mudler in #5804
- chore(model gallery): add zonui-3b-i1 by @mudler in #5805
- chore(model gallery): add huihui-jan-nano-abliterated by @mudler in #5806
- chore(model gallery): add cognitivecomputations_dolphin-mistral-24b-venice-edition by @mudler in #5813
- chore(model gallery): add ockerman0_anubislemonade-70b-v1.1 by @mudler in #5814
- chore(model gallery): add qwen3-8b-shiningvaliant3 by @mudler in #5815
- chore(model gallery): add lyranovaheart_starfallen-snow-fantasy-24b-ms3.2-v0.0 by @mudler in #5818
- chore(model gallery): add zerofata_l3.3-geneticlemonade-opus-70b by @mudler in #5819
- chore(model gallery): add huggingfacetb_smollm3-3b by @mudler in #5820
- chore(model gallery): add delta-vector_plesio-70b by @mudler in #5825
- chore(model gallery): add thedrummer_big-tiger-gemma-27b-v3 by @mudler in #5826
- chore(model gallery): add thedrummer_tiger-gemma-12b-v3 by @mudler in #5827
- chore(model gallery): add microsoft_nextcoder-32b by @mudler in #5832
- chore(model gallery): add huihui-ai_huihui-gemma-3n-e4b-it-abliterated by @mudler in #5833
- chore(model gallery): add mistralai_devstral-small-2507 by @mudler in #5834
- chore(model gallery): add nvidia_llama-3_3-nemotron-super-49b-genrm-multilingual by @mudler in #5837
- chore(model gallery): add mistral-2x24b-moe-power-coder-magistral-devstral-reasoning-ultimate-neo-max-44b by @mudler in #5838
- chore(model gallery): add impish_magic_24b-i1 by @mudler in #5839
- chore(model gallery): add google_medgemma-4b-it by @mudler in #5842
- chore(model gallery): add google_medgemma-27b-it by @mudler in #5843
- chore(model gallery): add zhi-create-qwen3-32b-i1 by @mudler in #5847
- chore(model gallery): add sophosympatheia_strawberrylemonade-70b-v1.1 by @mudler in #5848
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #5865
- chore(model gallery): add omega-qwen3-atom-8b by @mudler in #5883
- chore(model gallery): add dream-org_dream-v0-instruct-7b by @mudler in #5884
- chore(model gallery): add entfane_math-genius-7b by @mudler in #5885
- chore(model gallery): add menlo_lucy by @mudler in #5886
- chore(model gallery): add qwen3-235b-a22b-instruct-2507 by @mudler in #5887
- chore(model gallery): add qwen3-coder-480b-a35b-instruct by @mudler in #5888
📖 Documentation and examples
- fix(docs): Improve Header Responsiveness - Hide "Star us on GitHub!" on Mobile by @dedyf5 in #5770
👒 Dependencies
- chore: ⬆️ Update ggml-org/llama.cpp to
27208bf657cfe7262791df473927225e48efe482
by @localai-bot in #5753
- chore: ⬆️ Update ggml-org/llama.cpp to
caf5681fcb47dfe9bafee94ef9aa8f669ac986c7
by @localai-bot in #5758
- chore: ⬆️ Update ggml-org/llama.cpp to
0a5a3b5cdfd887cf0f8e09d9ff89dee130cfcdde
by @localai-bot in #5759
- chore: ⬆️ Update ggml-org/whisper.cpp to
bca021c9740b267c2973fba56555be052006023a
by @localai-bot in #5776
- chore: ⬆️ Update ggml-org/llama.cpp to
de569441470332ff922c23fb0413cc957be75b25
by @localai-bot in #5777
- chore: ⬆️ Update ggml-org/whisper.cpp to
d9999d54c868b8bfcd376aa26067e787d53e679e
by @localai-bot in #5782
- chore: ⬆️ Update ggml-org/llama.cpp to
e75ba4c0434eb759eb7ff74e034ebe729053e575
by @localai-bot in #5783
- chore(bark-cpp): generalize and move to bark-cpp by @mudler in #5786
- chore: ⬆️ Update PABannier/bark.cpp to
5d5be84f089ab9ea53b7a793f088d3fbf7247495
by @localai-bot in #4786
- chore: ⬆️ Update ggml-org/llama.cpp to
bee28421be25fd447f61cb6db64d556cbfce32ec
by @localai-bot in #5788
- chore: ⬆️ Update ggml-org/llama.cpp to
ef797db357e44ecb7437fa9d22f4e1614104b342
by @localai-bot in #5795
- chore: ⬆️ Update ggml-org/llama.cpp to
a0374a67e2924f2e845cdc59dd67d9a44065a89c
by @localai-bot in #5798
- chore: ⬆️ Update ggml-org/llama.cpp to
6491d6e4f1caf0ad2221865b4249ae6938a6308c
by @localai-bot in #5801
- chore: ⬆️ Update ggml-org/llama.cpp to
12f55c302b35cfe900b84c5fe67c262026af9c44
by @localai-bot in #5808
- chore: ⬆️ Update ggml-org/whisper.cpp to
869335f2d58d04010535be9ae23a69a9da12a169
by @localai-bot in #5809
- chore: ⬆️ Update ggml-org/llama.cpp to
6efcd65945a98cf6883cdd9de4c8ccd8c79d219a
by @localai-bot in #5817
- chore: ⬆️ Update ggml-org/llama.cpp to
0b8855775c6b873931d40b77a5e42558aacbde52
by @localai-bot in #5830
- chore: ⬆️ Update ggml-org/llama.cpp to
f5e96b368f1acc7f53c390001b936517c4d18999
by @localai-bot in #5835
- chore: ⬆️ Update ggml-org/llama.cpp to
c31e60647def83d671bac5ab5b35579bf25d9aa1
by @localai-bot in #5840
- chore: ⬆️ Update ggml-org/whisper.cpp to
3775c503d5133d3d8b99d7d062e87a54064b0eb8
by @localai-bot in #5841
- chore: ⬆️ Update ggml-org/whisper.cpp to
a16da91365700f396da916d16a7f5a2ec99364b9
by @localai-bot in #5846
- chore: ⬆️ Update ggml-org/llama.cpp to
982e347255723fe6d02e60ee30cfdd0559c884c5
by @localai-bot in #5845
- chore: ⬆️ Update ggml-org/whisper.cpp to
032697b9a850dc2615555e2a93a683cc3dd58559
by @localai-bot in #5849
- chore: ⬆️ Update ggml-org/llama.cpp to
bdca38376f7e8dd928defe01ce6a16218a64b040
by @localai-bot in #5850
- chore: ⬆️ Update ggml-org/llama.cpp to
4a4f426944e79b79e389f9ed7b34831cb9b637ad
by @localai-bot in #5852
- chore: ⬆️ Update ggml-org/llama.cpp to
496957e1cbcb522abc63aa18521036e40efce985
by @localai-bot in #5854
- chore: ⬆️ Update ggml-org/llama.cpp to
d6fb3f6b49b27ef1c0f4cf5128e041f7e7dc03af
by @localai-bot in #5857
- chore(deps): bump securego/gosec from 2.22.5 to 2.22.7 by @dependabot[bot] in #5878
- chore: ⬆️ Update richiejp/stable-diffusion.cpp to
10c6501bd05a697e014f1bee3a84e5664290c489
by @localai-bot in #5732
- fix(stablediffusion-cpp): Switch back to upstream and update by @richiejp in #5880
Other Changes
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5752
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5775
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5781
- chore: ⬆️ Update ggml-org/llama.cpp to
bf9087f59aab940cf312b85a67067ce33d9e365a
by @localai-bot in #5860
- chore: ⬆️ Update ggml-org/llama.cpp to
a979ca22db0d737af1e548a73291193655c6be99
by @localai-bot in #5862
- chore: ⬆️ Update ggml-org/llama.cpp to
2be60cbc2707359241c2784f9d2e30d8fc7cdabb
by @localai-bot in #5867
- chore: ⬆️ Update ggml-org/whisper.cpp to
1f5cf0b2888402d57bb17b2029b2caa97e5f3baf
by @localai-bot in #5876
- chore: ⬆️ Update ggml-org/llama.cpp to
6c9ee3b17e19dcc82ab93d52ae46fdd0226d4777
by @localai-bot in #5877
- chore: drop vllm for cuda 11 by @mudler in #5881
- chore: ⬆️ Update ggml-org/llama.cpp to
acd6cb1c41676f6bbb25c2a76fa5abeb1719301e
by @localai-bot in #5882
- fix: rename Dockerfile.go --> Dockerfile.golang to avoid IDE errors by @dave-gray101 in #5892
- chore(Makefile): drop unused targets by @mudler in #5893
- chore: ⬆️ Update ggml-org/llama.cpp to
a86f52b2859dae4db5a7a0bbc0f1ad9de6b43ec6
by @localai-bot in #5894
- fix: untangle pkg and core by @dave-gray101 in #5896
- Update quickstart.md by @Shinrai in #5898
New Contributors
Full Changelog: v3.1.1...v3.2.0
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4