OpenAI gpt-oss models Day 0 support for SGLang is here! 🎉 It's the result of a collaborative effort across Eigen AI, AMD, NVIDIA, SGLang, and the broader open-source community!
Installation Docker# hopper
docker pull lmsysorg/sglang:v0.5.0rc2-cu126
# blackwell cu128
docker pull lmsysorg/sglang:v0.5.0rc2-cu128-b200
# blackwell cu129
docker pull lmsysorg/sglang:b200-cu129
Build from source
# build from source
git clone https://github.com/sgl-project/sglang
cd sglang
pip3 install pip --upgrade
pip3 install -e "python[all]"
# ROCm 6.3
pip3 install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
git clone https://github.com/triton-lang/triton
cd python/triton_kernels
pip3 install .
# hopper
pip3 install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip3 install sgl-kernel==0.3.5 --force-reinstall
# blackwell cu128
pip3 install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip3 install https://github.com/sgl-project/whl/releases/download/v0.3.5/sgl_kernel-0.3.5+cu128-cp310-abi3-manylinux2014_x86_64.whl --force-reinstall
# blackwell cu129
pip3 install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129
pip3 install https://github.com/sgl-project/whl/releases/download/v0.3.5/sgl_kernel-0.3.5+cu129-cp310-abi3-manylinux2014_x86_64.whl --force-reinstall
Launch command MXFP4
# 20b mxfp4 tp 1
python3 -m sglang.launch_server --model openai/gpt-oss-20b
# 120b mxfp4 tp 2
python3 -m sglang.launch_server --model openai/gpt-oss-120b --tp 2
FP8/BF16
# 20b fp8/bf16 tp 1
python3 -m sglang.launch_server --model lmsys/gpt-oss-20b-bf16
# 120b fp8/bf16 tp 4
python3 -m sglang.launch_server --model lmsys/gpt-oss-120b-bf16 --tp 4
AMD/ROCm
Early access docker image [MI308x, MI300x]: henryx/haisgl:sgl-v0.4.10.post2-vllm-v0.9.2-rocm630-mi30x-gpt-oss-0806
lm_eval lmsys/gpt-oss-20b-bf16
with TP 1:/sgl-workspace/sglang# SGLANG_USE_AITER=0 python3 -m sglang.launch_server --model /data/models/gpt-oss-20b-bf16 --attention-backend triton
/sgl-workspace/sglang# lm_eval --model local-chat-completions --model_args model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048 --tasks gsm8k --batch_size 1024 --apply_chat_template --num_fewshot 1
... ... ... ...
local-chat-completions (model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048), gen_kwargs: (None), limit: None, num_fewshot: 1, batch_size: 1024
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 1|exact_match|↑ |0.8370|± |0.0102|
| | |strict-match | 1|exact_match|↑ |0.0273|± |0.0045|
lm_eval lmsys/gpt-oss-120b-bf16
with TP 4:/sgl-workspace/sglang# SGLANG_USE_AITER=0 python3 -m sglang.launch_server --model /data/models/gpt-oss-120b-bf16 --attention-backend triton --tp 4
/sgl-workspace/sglang# lm_eval --model local-chat-completions --model_args model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048 --tasks gsm8k --batch_size 1024 --apply_chat_template --num_fewshot 1
... ... ... ...
local-chat-completions (model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048), gen_kwargs: (None), limit: None, num_fewshot: 1, batch_size: 1024
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 1|exact_match|↑ |0.8923|± |0.0085|
| | |strict-match | 1|exact_match|↑ |0.0857|± |0.0077|
MI300x [untuned] lmsys/gpt-oss-120b-bf16
with TP 4:
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
| | max_concurrency | input_throughput | output_throughput | mean_ttft_ms | median_ttft_ms | p99_ttft_ms | mean_tpot_ms | median_tpot_ms | p99_tpot_ms | per_user_throughput |
+====+===================+====================+=====================+================+==================+===============+================+==================+===============+=======================+
| 0 | 1.000 | 83.347 | 166.694 | 81.911 | 70.766 | 124.400 | 5.923 | 5.923 | 5.925 | 166.694 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
| 1 | 4.000 | 219.549 | 439.098 | 948.246 | 1109.688 | 1229.610 | 8.189 | 8.197 | 8.198 | 109.775 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
| 2 | 16.000 | 640.980 | 1281.960 | 661.724 | 986.559 | 1057.074 | 11.843 | 11.809 | 12.929 | 80.123 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
| 3 | 32.000 | 968.566 | 1937.131 | 1115.647 | 1167.554 | 1954.841 | 15.440 | 15.680 | 16.718 | 60.535 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
MI300x [untuned] lmsys/gpt-oss-20b-bf16
with TP 1:
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
| 1 | 1.000 | 96.461 | 192.921 | 182.145 | 47.580 | 612.522 | 5.009 | 5.014 | 5.020 | 192.921 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
| 2 | 4.000 | 219.506 | 439.013 | 632.336 | 662.481 | 1165.204 | 8.500 | 8.647 | 8.729 | 109.753 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
| 3 | 16.000 | 679.230 | 1358.460 | 829.124 | 751.873 | 1132.749 | 10.976 | 10.839 | 11.914 | 84.904 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
| 4 | 32.000 | 1140.442 | 2280.883 | 700.543 | 803.830 | 912.796 | 13.354 | 13.399 | 14.152 | 71.278 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
Further plan
zhyncs, Ying1123, slin1237, Hanrui-Wang, JustinTong0323 and 37 morexwuShirley, Ying1123, zhyncs, merrymercy, Hanrui-Wang and 21 moreyiakwy-xpu-ml-framework-team, luv-bansal, yilian49, xxrjun and zhyncszhyncs, Ying1123, JustinTong0323, a-r-r-o-w, zhaochenyang20 and 15 more
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4