RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/sgl-project/sglang/issues/9104 below:

[Feature] Support AWQ quantization on NPU · Issue #9104 · sgl-project/sglang · GitHub

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

AWQ is a high-performance 4-bit weight quantization method that offers excellent trade-offs between efficiency and accuracy. By enabling AWQ quantization on the NPU backend in SGLang, we can allow all 8-card NPUs to run the DeepSeek 671B model. This feature follows the Roadmap#8004 of NPU.

Proposal

Add AWQ quantization format support for the Ascend NPU backend.
Use MM kernels for MLP layers and GMM kernels for MoE layers to fully utilize NPU performance.

Related resources

No response

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4