A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/deepseek-ai/DeepEP/issues/39 below:

Easier Potential Overall Design · Issue #39 · deepseek-ai/DeepEP · GitHub

For extreme GPU memory saving, we currently use communication queues for NVLink and RDMA buffers. This means tokens cyclically reuse a small buffer - when the queue is full, no new tokens are transmitted, and transfers only occur when the queue has space.

However, this approach has drawbacks: it can potentially cause deadlocks, repeatedly polling the queue introduces latency, and reaching maximum performance requires complex implementation. You can see this reflected in our internode code, where adding new features comes at a significant cost.

If you're referencing our code but want to design your own implementation, we also suggest a simpler overall design for your consideration:

Overall, this approach might use more GPU memory (the exact amount depends on the specific scenario), but the implementation would be much simpler. You could more easily add new features, and the performance ceiling might be slightly better.

Thanks to @KnowingNothing from ByteDance for discussing and suggesting this approach!

GHGmc2, stoensin, oyanghd, zartbot, Qizhi697 and 31 moreMengAiDev


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4