RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/sgl-project/sglang/issues/8030 below:

[Feature] support ACLGraph · Issue #8030 · sgl-project/sglang · GitHub

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation 1. Motivation

After some features had been merged, we could run SGLang on Ascend servers with eager mode. But if we want to get better performance, we need implement ACLGraph or NPUGraph now.

Goals
Goal 1: Define a NPUGraphRunner class for SGLang, which provides basic functions and supports llama or Qwen models.

Goal 2: Adapt to TP/DP , GraphTree and dynamic shape scenarios, including memory reuse.

Goal 3: Improve performance based on torch.compile,

2. Technical Design

Workflow Phases

Key messages

we have torch_npu.npu.NPUGraph, which has similar interfaces and functions to torch.cuda.CUDAGraph

Concerning the level of RTS, we can refer to this document.

3. Roadmap

Phase 1: Basic support

Implement NPUGraphRunner refer to CUDAGraphRunner, but we should handle some special case:

Because we use this torch_npu.npu_fused_infer_attention_score API, which has a host_list input, we have to update its value each time using torch_npu.npu.NPUGraph.update. For more details, please refer to task update.
Phase 2:

Related resources

No response

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4