We are thrilled to announce that NeMo RL is now officially open source! We welcome the community to use and contribute to it to help shape the future of reinforcement learning.
โจ Highlights ๐ฏ DeepScaleR Reproducer in NeMo RLThis release features a reproducer for the DeepScaleR work by Agentica AI, where a 1.5B parameter model surpassed O1-Preview on the AIME benchmark (Pass@1). Our implementation replicates this by iteratively scaling DeepSeek's GRPO algorithm from 8K โ 16K โ 24K context lengths.
You can start the first stage of training (8K context window) using the following command:
uv run examples/run_grpo_math.py --config=examples/configs/grpo-deepscaler-1.5b-8K.yaml
For the complete 3-stage iterative training instructions and more details, please see our GRPO on DeepScaleR guide.
๐ OpenMathInstruct-2 SFT in NeMo RLThis release includes a Supervised Fine-Tuning (SFT) recipe that follows the OpenMathInstruct-2 paper. Using this recipe, training a Llama-3.1-8B model on the train_1M
split of the nvidia/OpenMathInstruct-2 dataset achieves a score of 0.5020 on the MATH-500 benchmark, matching the reference implementation in NeMo-Skills.
You can run the OpenMathInstruct-2 recipe using the following command:
uv run examples/run_sft.py --config=examples/configs/sft_openmathinstruct2.yaml
For more details on dataset splits, training times, and evaluation, please see our SFT on OpenMathInstruct-2 guide.
โก Faster GRPO with Dynamic BatchingGRPO E2E performance has been significantly improved with the introduction of dynamic batching. This feature optimizes GPU utilization by sorting variable-length responses by sequence length and bucketing them into microbatches. These microbatches aim to have a total number of tokens close to train_mb_tokens
and logprob_mb_tokens
for the training and logprob stages, respectively.
Important: Dynamic batching requires dtensor
to be enabled.
You can enable dynamic batching and dtensor
in your YAML configuration like so:
policy: # Enable DTensor (required for dynamic batching) dtensor_cfg: enabled: True # Other dtensor settings like tensor_parallel_size, sequence_parallel, etc. # tensor_parallel_size: 1 # sequence_parallel: False # activation_checkpointing: True # Dynamic batching settings dynamic_batching: enabled: True # Target number of tokens for training microbatches train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}} # Target number of tokens for logprob microbatches logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}} # Round sequence lengths to the nearest multiple of this value for bucketing sequence_length_round: 64 # Other policy settings like max_total_sequence_length, train_micro_batch_size, etc. # max_total_sequence_length: 4096 # train_micro_batch_size: 4 # logprob_batch_size: 8
Alternatively, you can enable these features and configure them via command-line overrides when running a script (e.g., run_grpo_math.py
):
uv run examples/run_grpo_math.py \ --config=<your_base_config.yaml> \ policy.dtensor_cfg.enabled=True \ policy.dynamic_batching.enabled=True \ # Optionally override other dynamic batching or dtensor parameters: # policy.dynamic_batching.train_mb_tokens=16384 \ # policy.dynamic_batching.logprob_mb_tokens=32768 \ # policy.dtensor_cfg.tensor_parallel_size=2
Make sure to adjust train_mb_tokens
, logprob_mb_tokens
, and other parameters according to your sequence length and batch size configuration.
NeMo RL enables users to leverage powerful open models from families such as Qwen, Llama, and Gemma for reinforcement learning. For this v0.2.1 release, we've enhanced support, particularly for Gemma3 models, addressing their unique characteristics like tied weights across all model sizes (which require special handling for tensor parallelism) and specific vLLM initialization needs. NeMo RL automatically handles these model quirks to ensure seamless training and inference. For more details on this, please see our Model Quirks guide.
๐ ๏ธ Bug FixesWe have provided Tensorboard logs to release runs to give you a head start on what to expect from our recipes.
To view these Tensorboard logs easily, we've provided a Google Collab to download and serve the Tensorboard logs.
What's Changedtoken_ids
for consistency by @ashors1 in #290feat: add and log a very rough entropy approximation (342)
into r0.2.1
by @ko3n1g in #358fix: recipes missing args (365)
into r0.2.1
by @ko3n1g in #372fix: add missing multi-turn, container information in README (369)
into r0.2.1
by @ko3n1g in #376fix: Save last checkpoint (368)
into r0.2.1
by @ko3n1g in #380feat: Handle Gemma3 special cases in code (379)
into r0.2.1
by @ko3n1g in #386feat: Fixed metric calculation and made all grpo metrics token-level (373)
into r0.2.1
by @ko3n1g in #390feat: SFT on OpenMathInstruct-2 (360)
into r0.2.1
by @ko3n1g in #393feat: add aime24 validation set (388)
into r0.2.1
by @ko3n1g in #396feat: add deepscaler guide (391)
into r0.2.1
by @ko3n1g in #397feat: dynamic batching for training and log prob stages (274)
into r0.2.1
by @ko3n1g in #400docs: deepscaler guide on sidebar (401)
into r0.2.1
by @ko3n1g in #402Full Changelog: v0.2.0...v0.2.1
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4