Democratizing Reinforcement Learning for LLM Agents
DeepSWE OverviewDeepSWE-Preview is a fully open-sourced, state-of-the-art coding agent trained with only reinforcement learning (RL) to excel at software engineering (SWE) tasks. DeepSWE-Preview demonstrates strong reasoning capabilities in navigating complex codebases and viewing/editing multiple files, and it serves as a foundational model for future coding agents. The model achieves an impressive 59.0% on SWE-Bench-Verified, which is currently #1 in the open-weights category.
DeepSWE-Preview is trained on top of Qwen3-32B with thinking mode enabled. With just 200 steps of RL training, SWE-Bench-Verified score increases by ~20%.
Discover more about DeepSWE-Preview's development and capabilities in our technical blog post.
Figure 1: SWE-Bench-Verified Performance vs. Model Size for LLM Agents. Trained with only reinforcement learning (RL, no SFT), DeepSWE-Preview with test time scaling (TTS) solves 59% of problems, beating all open-source agents by a large margin. We note that DeepSWE-Preview's Pass@1 performance (42.2%, averaged over 16 runs) is one of best for open-weights coding agents.
Usage RecommendationsTo get the best performance out of DeepSWE-Preview, we suggest setting:
file_editor.py
, execution_bash.py
, search.py
, finish.py
). See here for more details.Figure 2: Validation Score for SWE-Bench-Hard, where an agent receives positive reward if it submits the final answer and passes all tests. With just 200 steps of RL training, SWE-Bench-Verified score increases from 23→42% (+20%).
Data 🗄️Our dataset contains 4.5K problems from a subset of R2E-Gym
. To avoid data contamination during training, we filtered out problems that are derived from the same repositories as SWE-Bench-Verified
, such as sympy
. All problems map to individual Docker images.
Our environment wraps around R2E-Gym
, an existing Gym environment for scalable curation of high-quality executable SWE environments.
State & Action. R2E-Gym
defines a set of four tools as part of the action space. The output of each tool (a Python program with stdout/stderr) represents the returned state. More specifically:
Reward. To keep things simple, our reward function employs a sparse Outcome Reward Model (ORM):
1
- LLM’s generated patch passes a selected sample of tests (Pass2Pass and Fail2Pass) within a time limit. To accelerate training, our max time limit is 5 minutes, while the official SWE-Bench evaluation is 30 minutes.0
- We assign no reward if the LLM’s code fails on at least one test case or times out.We enhance the original GRPO algorithm, integrating insights from DAPO, Dr. GRPO, LOOP/RLOO, and our innovations to enable stable training and improved performance. Our final, amalgamate algorithm consists of:
A more detailed description of the training recipe can be found in our blog post.
EvaluationDeepSWE-Preview is evaluated via the official R2E-Gym
codebase at 64k max context length and 100 max enviornment steps. DeepSWE's generated patches are then ported over to the offical SWE-bench repo to calculate final score. Below, We report Pass@1 accuracy averaged over 16 runs.
Figure 3: SWE-Bench Verified Performance w.r.t. different TTS strategies. With hybrid TTS, DeepSWE-Preview achieves 59%, beating the current SOTA open-weights model (SkyWork + TTS, 47%) by 12%. We note that only using execution-based and execution-free verifiers is still effective and can bring 10+% performance.
Serving DeepSWE-PreviewOur model can be served using popular high-performance inference systems:
All these systems support the OpenAI Chat Completions API format.
vLLM (Recommended)We suggest using vllm>=0.8.5
and enabling long context in VLLM to serve DeepSWE-Preview.
export MAX_CONTEXT_LEN=65536
export TENSOR_PARALLEL_SIZE=8
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve agentica-org/DeepSWE-Preview --tensor-parallel-size $TENSOR_PARALLEL_SIZE --max-model-len $MAX_CONTEXT_LEN --hf-overrides '{\"max_position_embeddings\": $MAX_CONTEXT_LEN}' --enable_prefix_caching
License
This project is released under the MIT License, reflecting our commitment to open and accessible AI development. We believe in democratizing AI technology by making our work freely available for anyone to use, modify, and build upon. This permissive license ensures that researchers, developers, and enthusiasts worldwide can leverage and extend our work without restrictions, fostering innovation and collaboration in the AI community.
AcknowledgementQwen/Qwen3-32B
.@misc{deepswe2025,
title={DeepSWE: Training a State-of-the-Art Coding Agent from Scratch by Scaling RL},
author={Michael Luo, Naman Jain, Jaskirat Singh, Sijun Tan, Ameen Patel, Qingyang Wu, Alpay Ariyak, Colin Cai, Tarun Venkat, Shang Zhu, Ben Athiwaratkun, Manan Roongta, Ce Zhang, Li Erran Li, Raluca Ada Popa, Koushik Sen, Ion Stoica},
howpublished={\url{N/A}},
note={Notion Blog},
year={2025}
}
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4