RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/sgl-project/sglang/issues/8833 below:

[Tracking] OpenAI gpt-oss Day 0 Support · Issue #8833 · sgl-project/sglang · GitHub

Announcement

OpenAI gpt-oss models Day 0 support for SGLang is here! 🎉 It's the result of a collaborative effort across Eigen AI, AMD, NVIDIA, SGLang, and the broader open-source community!

Installation Docker

# hopper
docker pull lmsysorg/sglang:v0.5.0rc2-cu126

# blackwell cu128
docker pull lmsysorg/sglang:v0.5.0rc2-cu128-b200

# blackwell cu129
docker pull lmsysorg/sglang:b200-cu129

Build from source

# build from source
git clone https://github.com/sgl-project/sglang
cd sglang
pip3 install pip --upgrade
pip3 install -e "python[all]"

# ROCm 6.3
pip3 install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
git clone https://github.com/triton-lang/triton
cd python/triton_kernels
pip3 install .

# hopper
pip3 install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip3 install sgl-kernel==0.3.5 --force-reinstall

# blackwell cu128
pip3 install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip3 install https://github.com/sgl-project/whl/releases/download/v0.3.5/sgl_kernel-0.3.5+cu128-cp310-abi3-manylinux2014_x86_64.whl --force-reinstall

# blackwell cu129
pip3 install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129
pip3 install https://github.com/sgl-project/whl/releases/download/v0.3.5/sgl_kernel-0.3.5+cu129-cp310-abi3-manylinux2014_x86_64.whl --force-reinstall

Launch command MXFP4

# 20b mxfp4 tp 1
python3 -m sglang.launch_server --model openai/gpt-oss-20b

# 120b mxfp4 tp 2
python3 -m sglang.launch_server --model openai/gpt-oss-120b --tp 2

FP8/BF16

LMSYS OpenAI gpt-oss bf16

# 20b fp8/bf16 tp 1
python3 -m sglang.launch_server --model lmsys/gpt-oss-20b-bf16

# 120b fp8/bf16 tp 4
python3 -m sglang.launch_server --model lmsys/gpt-oss-120b-bf16 --tp 4

AMD/ROCm

Early access docker image [MI308x, MI300x]: henryx/haisgl:sgl-v0.4.10.post2-vllm-v0.9.2-rocm630-mi30x-gpt-oss-0806

lm_eval lmsys/gpt-oss-20b-bf16 with TP 1:
/sgl-workspace/sglang# SGLANG_USE_AITER=0 python3 -m sglang.launch_server --model /data/models/gpt-oss-20b-bf16 --attention-backend triton

/sgl-workspace/sglang# lm_eval --model local-chat-completions --model_args model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048 --tasks gsm8k --batch_size 1024 --apply_chat_template --num_fewshot 1
... ... ... ...
local-chat-completions (model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048), gen_kwargs: (None), limit: None, num_fewshot: 1, batch_size: 1024
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     1|exact_match|↑  |0.8370|±  |0.0102|
|     |       |strict-match    |     1|exact_match|↑  |0.0273|±  |0.0045|

lm_eval lmsys/gpt-oss-120b-bf16 with TP 4:
/sgl-workspace/sglang# SGLANG_USE_AITER=0 python3 -m sglang.launch_server --model /data/models/gpt-oss-120b-bf16 --attention-backend triton --tp 4

/sgl-workspace/sglang# lm_eval --model local-chat-completions --model_args model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048 --tasks gsm8k --batch_size 1024 --apply_chat_template --num_fewshot 1
... ... ... ...
local-chat-completions (model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048), gen_kwargs: (None), limit: None, num_fewshot: 1, batch_size: 1024
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     1|exact_match|↑  |0.8923|±  |0.0085|
|     |       |strict-match    |     1|exact_match|↑  |0.0857|±  |0.0077|

MI300x [untuned] lmsys/gpt-oss-120b-bf16 with TP 4:

+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|    |   max_concurrency |   input_throughput |   output_throughput |   mean_ttft_ms |   median_ttft_ms |   p99_ttft_ms |   mean_tpot_ms |   median_tpot_ms |   p99_tpot_ms |   per_user_throughput |
+====+===================+====================+=====================+================+==================+===============+================+==================+===============+=======================+
|  0 |             1.000 |             83.347 |             166.694 |         81.911 |           70.766 |       124.400 |          5.923 |            5.923 |         5.925 |               166.694 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  1 |             4.000 |            219.549 |             439.098 |        948.246 |         1109.688 |      1229.610 |          8.189 |            8.197 |         8.198 |               109.775 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  2 |            16.000 |            640.980 |            1281.960 |        661.724 |          986.559 |      1057.074 |         11.843 |           11.809 |        12.929 |                80.123 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  3 |            32.000 |            968.566 |            1937.131 |       1115.647 |         1167.554 |      1954.841 |         15.440 |           15.680 |        16.718 |                60.535 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+

MI300x [untuned] lmsys/gpt-oss-20b-bf16 with TP 1:

+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  1 |             1.000 |             96.461 |             192.921 |        182.145 |           47.580 |       612.522 |          5.009 |            5.014 |         5.020 |               192.921 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  2 |             4.000 |            219.506 |             439.013 |        632.336 |          662.481 |      1165.204 |          8.500 |            8.647 |         8.729 |               109.753 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  3 |            16.000 |            679.230 |            1358.460 |        829.124 |          751.873 |      1132.749 |         10.976 |           10.839 |        11.914 |                84.904 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  4 |            32.000 |           1140.442 |            2280.883 |        700.543 |          803.830 |       912.796 |         13.354 |           13.399 |        14.152 |                71.278 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+

Further plan

Eagle 3 coming soon
SWA memory optimization coming soon

zhyncs, Ying1123, slin1237, Hanrui-Wang, JustinTong0323 and 37 morexwuShirley, Ying1123, zhyncs, merrymercy, Hanrui-Wang and 21 moreyiakwy-xpu-ml-framework-team, luv-bansal, yilian49, xxrjun and zhyncszhyncs, Ying1123, JustinTong0323, a-r-r-o-w, zhaochenyang20 and 15 more

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4