Showing content from https://github.com/sgl-project/sglang/issues/8004 below:
[Roadmap] Supporting Ascend NPU on 2025 H2 · Issue #8004 · sgl-project/sglang · GitHub
SGLang NPU support on 2025 H2
During 2025 H1, we have contributed initial supports for NPU (#3853, #7022), which make it possible for users to run SGLang on NPU hardware.
Our goal on 2025 H2 is to provide a seamless running experience on NPUs, and here is a rough development roadmap:
CI on NPU hardware
User / Developer experience
User experience is also to be taken into our consideration, containers and documents will be provided soon
Model support
We will start with supporting the hotest models
- [July] DeepseekV2 / V3 family
- [July] Qwen3 family
- [July] Qwen3-MoE family
Performance Enhancement Attention Backend
Parallelism
Quantization
Cache
- [July] A new transfer-engine implementation supports Device-to-device transfer on NPUs [feature] kv transfer support of ascend npu #7795
- [November] A new cache pooling system supports HBM & DRAM mixed-pooling, coherent memory access and remote L3 cache direct copy to L1 cache on NPUs
- [October] An optimized bucketing router policy for extremely uneven prompt length
Support Graph Mode
EPLB
- [October] Support Expert Distribution Recorder on NPUs
- [October] Support Async loading of experts' weights
Speculative Decoding
- [August] Support DeepSeek-R1's MTP
Community
#npu-support
is actively constructing on SGLang slack channel
zhyncs, lambert0312, AniZpZ, ErvinXie, Alcanderian and 6 moreSwipe4057Swipe4057moyans
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4