RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/ai-dynamo/dynamo/releases/latest below:

Release Dynamo Release v0.4.0 · ai-dynamo/dynamo · GitHub

Dynamo 0.4.0 Release Notes

Dynamo is a high-performance, low-latency inference framework designed to serve generative AI models—across any framework, architecture, or deployment scale. It's an open-source project under the Apache 2.0 license. Dynamo is available for installation via pip wheels and containers from NVIDIA NGC.

As a vendor-neutral serving framework, Dynamo supports multiple large language model (LLM) inference engines to varying degrees:

NVIDIA TensorRT-LLM
vLLM
SGLang

Major Features and Improvements Increasing Framework Support

vLLM Updates
- Added E2E integration tests (#1935) and multimodal example with Llama4 Maverick (#1990)
- Prefill-aware routing for improved performance (#1895)
- Configurable namespace support for vLLM examples (#1909)
- Routing via ApproxKvIndexer with use_kv_events flag (#1869)
- Updated all vLLM examples to new UX (#1756)
SGLang Updates
- Receive KV metrics from scheduler (#1789)
- Disaggregated deployment examples (#2137)
- Launch and deploy examples added (#2068)
TRT-LLM Updates
- New/speculative decoding example: Llama-4 + Eagle-3 (#1828)
Routing Performance
- Removed router hot-path lock for faster request handling (#1963)
- Added radix tree dumps as router events (#2057)

UX Updates

Migration to New Python UX
- Updated all Python launch flows to the new UX structure (#2003), including refactoring vLLM backend integration (#1983).
- Removed outdated examples that relied on the old UX (#1899).
CLI and Packaging Enhancements
- Added Python bindings for Dynamo CLI tools (#1799).
- Updated Python packaging to align with the new UX (#2054).
- Introduced a Python frontend/ingress node for easier deployment integration (#1912).
- Added a convenience script to uninstall Dynamo Deploy CRDs (#1933).
Kubernetes Deployment UX
- Enhanced Helm chart flexibility:
  - Added ability to override any podSpec property (#2116).
  - Enabled Helm upgrade via deploy script for smoother iteration (#1936).
  - Added Grove scheduling support to the graph Helm chart (#1954).
- Introduced Kubernetes deployment examples for vLLM, SGLang, and TRT-LLM (#2062, #2133).
- New Hello World Kubernetes deployment example (#1854).
Examples & Docs Overhaul
- Hello World Python binding example (#2083).
- Documentation updated for UX (#2070), reorganized example READMEs (#2174), and refactored core README structure (#2141).

Deployment, Kubernetes, and CLI

Helm and Graph Deployments
- Liveness/readiness probes in graph Helm chart (#1888)
- Added ability to override any podSpec property (#2116)
- Support for Grove scheduling in Helm (#1954)
Planner and Profiling
- Deploy SLA profiler and SLA planner to Kubernetes (#2030, #2135)

Performance and Observability

Structured Logging Improvements
- Enhanced structured JSONL logs with span start/close events, trace ID/span ID injection, duration formatting in microseconds, and improved context capture for distributed tracing workflows (PR #2061).
Tokenizer & Runtime
- De-tokenize performance improved by ~50% (#1868)
- Runtime now uses all available parallelism (#1858)
Metrics
- Hierarchical Prometheus metrics registry (#2008)
- Generic ingress handler metrics (#2090)

Bug Fixes

Fixed GPU resource specifications in LLM deployments (#1812)
Corrected vLLM, SGLang, and TRTLLM deployment issues, including container builds, runtime packaging, and helm chart updates (#1942, #2062, #1825)
Addressed port conflicts, deterministic port assignments, and health check improvements (#1937, #1996)
Improved error handling for empty message lists and invalid configurations (#2067, #2071)
Fixed nil pointer dereference issues in the Dynamo controller (#2299, #2335)
Locked dependencies to avoid breaking changes (e.g., Triton 3.4.0 w/ TRT-LLM 1.0.0) (#2233)

Documentation

Guides and Examples
- New hello world Python binding example (#2083)
- Added multinode, disaggregated, and Grove deployment guides (#2155, #2086)
- Added AKS/EKS deployment guides (#2080)
Docs Restructuring
- Updated for new Python UX (#2070)
- Refactored README and reorganized examples (#2141, #2174)

Build, CI, and Test

Added support for sGLang runtime image builds (#1770)
Optional TRTLLM dependency and custom build support (#2113)
New end-to-end router tests with mockers (#2073)
Fixed vLLM builds for Blackwell GPUs (#2020)

Release Assets

Python Wheels:

Rust Crates:

Containers:

Helm Charts:

Open Issues

x86 TRT-LLM container image not compatible out of the box with B200. Dev container still works for B200/GB200

Contributors

We welcome new contributors in this release:
@umang-kedia-hpe, @Ethan-ES, @messiaen, @galletas1712, @mc-nv, @zaristei, @jhaotingc, @saurabh-nvidia.

For the full list of changes, see the changelog.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4