Re-structured the OpenAI-compatible server to support production and enterprise environments. Key improvements include:
Consistent metrics and logging for better observability and debugging.
Unified error handling, request validation, and processing logic for improved reliability and maintainability.
Improved request tracking across sessions and components.
Fixed bugs in embedding requests and reasoning parsers.
This work was a collaborative effort involving engineers from academic and industry institutions. Special thanks to the Oracle Cloud team and the SGLang team and community — including @slin1237, @CatherineSue, @key4ng, @JustinTong0323, @jhinpan, @yhyang201, @woodx9 and @whybeyoung — for their invaluable contributions.
DeepSeek R1 FP4 on Blackwell GPUAdded support for DeepSeek R1 with FP4 and MTP on NVIDIA Blackwell GPU.
Integrated FlashInfer NVFP4 MoE, supporting TP, EP, and DP.
Supported 2-stream shared expert execution.
Achieved up to 90 TPS per user at isl/osl/bs = 1k/1k/16 on B200.
Further optimization in progress. Special thanks to the FlashInfer, NVIDIA Enterprise Products, Novita AI, DataCrunch, Google Cloud, and SGLang teams — especially @Alcanderian and @pyc96 — for their critical contributions.
Breaking Change: OpenAI-Compatible API Module MovedThe sglang/srt/openai_api
directory has been removed and replaced with sglang/srt/entrypoints/openai
.
Update your imports to the new module path. For example:
- from sglang.srt.openai_api.protocol import Tool + from sglang.srt.entrypoints.openai.protocol import ToolWhat's Changed
INFO
to DEBUG
for dp and add force quit for tokenizer manager by @ishandhanani in #7251_normalize_rid
before other normalization in io_struct
by @CatherineSue in #7363openai_api
with entrypoints/openai
by @CatherineSue in #7351TokenToKVPoolAllocator
by @hnyls2002 in #7414BaseFormatDetector.parse_streaming_increment
by @CatherineSue in #7479Full Changelog: v0.4.7...v0.4.8
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4