RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/huggingface/gpt-oss-recipes below:

huggingface/gpt-oss-recipes: Collection of scripts and notebooks for OpenAI's latest GPT OSS models

Collection of scripts demonstrating different optimization and fine-tuning techniques for OpenAI's GPT-OSS models (20B and 120B parameters).

Resources

generate_tp.py - Model with Tensor Parallelism.
generate_flash_attention.py - Model with Flash Attention + Tensor Parallelism.
generate_tp_continuous_batching.py - Model with Flash Attention + Tensor Parallelism and Continuous Batching.
generate_all.py - Model with all optimizations: Expert Parallelism, Tensor Parallelism, Flash Attention.
sft.py - Script for fine-tuning the model using supervised fine-tuning (SFT). Supports both full-parameter training and LoRA training.

All generation scripts support both 20B and 120B models. To switch between model sizes, simply edit the model_path variable at the top of each script:

# Model configuration - uncomment the model size you want to use
model_path = "openai/gpt-oss-120b"  # 120B model (default)
# model_path = "openai/gpt-oss-20b"  # 20B model - uncomment this line and comment the line above

The scripts automatically configure the appropriate device mapping and settings based on the selected model size.

First create a virtual environment using e.g. uv:

uv venv gpt-oss --python 3.11 && source gpt-oss/bin/activate && uv pip install --upgrade pip

Next install PyTorch and Triton kernels:

uv pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128

If your hardware supports the MXFP4 quantization format, you can also install Triton kernels for optimized performance:

uv pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels

Finall install the remaining dependencies:

uv pip install -r requirements.txt

Important

Before running any script, edit the model_path variable to select your desired model size (20B or 120B).

Run a generation script:

python generate_<script_name>.py

or for distributed:

torchrun --nproc_per_node=x generate_<script_name>.py

For full-parameter training on one node of 8 GPUs, run:

# Eager attention
accelerate launch --config_file configs/zero3.yaml sft.py --config configs/sft_full.yaml

# FlashAttention3
accelerate launch --config_file configs/zero3.yaml sft.py --config configs/sft_full.yaml --attn_implementation kernels-community/vllm-flash-attn3

For LoRA training on one GPU, run:

python sft.py --config configs/sft_lora.yaml

To change the dataset or training hyperparameters, either modify the sft_lora.yaml or sft_full.yaml files or pass them as command line arguments e.g.:

accelerate launch --config_file configs/zero3.yaml \
    sft.py --config configs/sft_full.yaml \
    --dataset_name DATASET_NAME

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4