Showing content from https://github.com/ai-dynamo/dynamo/releases/latest below:
Release Dynamo Release v0.4.0 · ai-dynamo/dynamo · GitHub
Dynamo 0.4.0 Release Notes
Dynamo is a high-performance, low-latency inference framework designed to serve generative AI models—across any framework, architecture, or deployment scale. It's an open-source project under the Apache 2.0 license. Dynamo is available for installation via pip wheels and containers from NVIDIA NGC.
As a vendor-neutral serving framework, Dynamo supports multiple large language model (LLM) inference engines to varying degrees:
- NVIDIA TensorRT-LLM
- vLLM
- SGLang
Major Features and Improvements Increasing Framework Support
-
vLLM Updates
- Added E2E integration tests (#1935) and multimodal example with Llama4 Maverick (#1990)
- Prefill-aware routing for improved performance (#1895)
- Configurable namespace support for vLLM examples (#1909)
- Routing via
ApproxKvIndexer
with use_kv_events
flag (#1869)
- Updated all vLLM examples to new UX (#1756)
-
SGLang Updates
- Receive KV metrics from scheduler (#1789)
- Disaggregated deployment examples (#2137)
- Launch and deploy examples added (#2068)
-
TRT-LLM Updates
- New/speculative decoding example: Llama-4 + Eagle-3 (#1828)
-
Routing Performance
- Removed router hot-path lock for faster request handling (#1963)
- Added radix tree dumps as router events (#2057)
UX Updates
Deployment, Kubernetes, and CLI
Performance and Observability
Bug Fixes
- Fixed GPU resource specifications in LLM deployments (#1812)
- Corrected vLLM, SGLang, and TRTLLM deployment issues, including container builds, runtime packaging, and helm chart updates (#1942, #2062, #1825)
- Addressed port conflicts, deterministic port assignments, and health check improvements (#1937, #1996)
- Improved error handling for empty message lists and invalid configurations (#2067, #2071)
- Fixed nil pointer dereference issues in the Dynamo controller (#2299, #2335)
- Locked dependencies to avoid breaking changes (e.g., Triton 3.4.0 w/ TRT-LLM 1.0.0) (#2233)
Documentation
-
Guides and Examples
- New hello world Python binding example (#2083)
- Added multinode, disaggregated, and Grove deployment guides (#2155, #2086)
- Added AKS/EKS deployment guides (#2080)
-
Docs Restructuring
- Updated for new Python UX (#2070)
- Refactored README and reorganized examples (#2141, #2174)
Build, CI, and Test
- Added support for sGLang runtime image builds (#1770)
- Optional TRTLLM dependency and custom build support (#2113)
- New end-to-end router tests with mockers (#2073)
- Fixed vLLM builds for Blackwell GPUs (#2020)
Release Assets
Python Wheels:
Rust Crates:
Containers:
Helm Charts:
Open Issues
- x86 TRT-LLM container image not compatible out of the box with B200. Dev container still works for B200/GB200
Contributors
We welcome new contributors in this release:
@umang-kedia-hpe, @Ethan-ES, @messiaen, @galletas1712, @mc-nv, @zaristei, @jhaotingc, @saurabh-nvidia.
For the full list of changes, see the changelog.
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4