A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://alibaba.github.io/ROLL/docs/English/QuickStart/config_guide below:

Configuration Guide | ROLL

Configuration Guide Pipeline Config

Refer to RLVR Pipeline Start and Agentic Pipeline Start for more details about RLVR/Agentic pipeline configurations and reward settings.

rollout_batch_size: 64
prompt_length: 2048
response_length: 4096
num_return_sequences_in_group: 8
Worker Config ActorTrain/ActorInfer/Critic/Reference
actor_train:
model_args:
dtype: bf16
disable_gradient_checkpointing: False
...
training_args:
learning_rate: 1.0e-6
weight_decay: 0
per_device_train_batch_size: 1
gradient_accumulation_steps: 32
warmup_steps: 20
...
data_args:
template: native
file_name: xxx/train.json
prompt: instruction
strategy_args:
strategy_name: megatron_train
strategy_config:
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
expert_model_parallel_size: 1
infer_batch_size: 4
device_mapping: list(range(0,16))
actor_infer:
model_args:
...
generating_args:
max_new_tokens: ${response_length}
temperature: 0.99
...
strategy_args:
strategy_name: vllm
strategy_config:
gpu_memory_utilization: 0.6
block_size: 16
max_model_len: 8000
num_gpus_per_worker: 1
device_mapping: list(range(0,16))
reference:
model_args:
...
strategy_args:
strategy_name: megatron_infer
strategy_config:
tensor_model_parallel_size: 1
device_mapping: list(range(0,16))
Model Arguments (model_args) Data Arguments (data_args)

Configure data_args under actor_train.

Generating Arguments (generating_args)

Configure generating_args under actor_infer.

Strategy Arguments (strategy_args)

Commonly used strategy configs are listed below:

Megatron Strategy Config VLLM Strategy Config SGLang Strategy Config DeepSpeed Strategy Config

There are DeepSpeed configurations in ./examples/config/ that can be overridden in the default list for strategy configuration.

For example, to use the deepspeed_zero2 strategy, add the following to your config:

defaults:
- ../config/envs@_here_
- ../config/deepspeed_zero@_here_
- ../config/deepspeed_zero2@_here_
- ../config/deepspeed_zero3@_here_
- ../config/deepspeed_zero3_cpuoffload@_here_
actor_train:
strategy_args:
strategy_name: deepspeed_train
strategy_config: ${deepspeed_zero2}
Training Arguments (training_args)

Used for configuring training parameters such as learning_rate, weight_decay, warmup_steps, etc.

In deepspeed training the global train batch size is per_device_train_batch_size * gradient_accumulation_steps * world_size (a.k.a length of device_mapping for actor_train/critic).

In megatron training the global train batch size is per_device_train_batch_size * gradient_accumulation_steps * world_size / tensor_model_parallel_size / pipeline_model_parallel_size / context_parallel_size (don't need to divide expert_model_parallel_size).

If you want to perform one optimization step in each rollout, set gradient_accumulation_steps to rollout_batch_size * num_return_sequences_in_group * tensor_model_parallel_size * pipeline_model_parallel_size * context_parallel_size/ per_device_train_batch_size / world_size.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4