Skip to content

thinkwee/AgentsMeetRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

138 Commits
 
 
 
 
 
 

Repository files navigation

NOVER Logo

Base Framework General Search & RAG Web & GUI
Tool Code & SWE Reasoning Multi-Agent
Memory Embodied Domain-Specific Reward & Training
Safety VLM Agent Self-Evolution Environment

Interactive Dashboard

When LLM Agents Meet Reinforcement Learning

AgentsMeetRL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning:

  • 🤖 The criteria for identifying an agent project are that it must have at least one of the following: multi-turn interactions or tool use (so TIR projects, Tool-Integrated Reasoning, are considered in this repo).
  • ⚠️ This project is based on code analysis from open-source repositories using LLM coding agents, which may contain unfaithful cases. Although manually reviewed, there may still be omissions. If you find any errors, please don't hesitate to let us know immediately through issues or PRs - we warmly welcome them!
  • 🚀 We particularly focus on the reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on how these excellent open-source projects make their technical choices. See [Click to view technical details] under each table.
  • 📅 Last updated: 2026-04-18
  • 🤗 Feel free to submit your own projects anytime - we welcome contributions!

Taxonomy:

  • Base Framework: General-purpose RL training frameworks for LLM agents (e.g., veRL, OpenRLHF, trl)
  • General/MultiTask: Agent systems trained/evaluated across multiple tasks or environments
  • Search & RAG: Search-augmented reasoning agents that use retrieval tools to enhance LLM reasoning
  • Web & GUI: Agents that interact with web browsers, mobile/desktop GUIs, or operating systems
  • Tool-Use: Agents trained to invoke external tools (APIs, code executors, MCP, etc.)
  • Code & SWE: Software engineering and code generation agents
  • Reasoning: Reasoning agents with tool-integrated or multi-turn reasoning (math, QA, visual)
  • Multi-Agent RL: Multi-agent collaboration, negotiation, or credit assignment via RL
  • Memory: Agents that learn to manage, retrieve, or evolve memory
  • Embodied: Agents operating in embodied/physical simulation environments
  • Domain-Specific: RL agents for specialized domains (medical, OS tuning, etc.)
  • Reward & Training: Process/outcome reward models and training methodologies for agents
  • Safety: RL for agent safety alignment, adversarial red-teaming, and jailbreak defense/attack
  • VLM Agent: Vision-language model agents trained with RL for multimodal interaction
  • Self-Evolution: Agents that self-evolve via RL feedback loops (⚠️ definition still evolving in the community)
  • Environment: Benchmarks, gyms, and sandbox environments for agent training/evaluation

Some Enumeration:

  • Enumeration for Reward Type:
    • External Verifier: e.g., a compiler or math solver
    • Rule-Based: e.g., a LaTeX parser with exact match scoring
    • Model-Based: e.g., a trained verifier LLM or reward LLM
    • Custom

Updates

  • 📢 2026-04 Update: Added 67 new repositories covering Apr 2025 – Apr 2026 across nearly every category (notably VLM Agent +9, Search & RAG +10, Web & GUI +7, Tool-Use +7). Also reclassified SkyRL (→ General) and SPIRAL (→ Multi-Agent), and updated the VAGEN entry to its NeurIPS'25 upstream repo.
  • 📢 2026-03 Update: Restructured taxonomy from 12 to 16 categories (added Multi-Agent RL, Reward & Training, Safety, VLM Agent, Self-Evolution, Domain-Specific; merged GUI into Web & GUI; retired TextGame/Biomedical). Added ~70 new repositories covering Sep 2025 – Mar 2026, growing the total from ~134 to 205.

🔧 Base Framework

Github Repo 🌟 Stars Date Org Paper Link
Open-AgentRL Stars 2026.2 Gen-Verse Paper
OpenClaw-RL Stars 2026.3 Gen-Verse Paper
Claw-R1 Stars 2026.3 USTC --
prime-rl Stars 2025.2 Prime Intellect --
NeMo-RL Stars 2026.1 NVIDIA --
RLinf Stars 2025.8 Tsinghua/Infinigence AI/PKU Paper
siiRL Stars 2025.7 Shanghai Innovation Institute Paper
slime 2025.6 Tsinghua University (THUDM) blog
agent-lightning Stars 2025.6 Microsoft Research Paper
AReaL Stars 2025.6 AntGroup/Tsinghua Paper
ROLL Stars 2025.6 Alibaba Paper
MARTI Stars 2025.5 Tsinghua --
Tunix Stars 2025.4 Google --
RL2 Stars 2025.4 Accio
verifiers Stars 2025.3 Individual --
oat Stars 2024.11 NUS/Sea AI Paper
veRL Stars 2024.10 ByteDance Paper
OpenRLHF Stars 2023.7 OpenRLHF Paper
trl Stars 2019.11 HuggingFace --
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
Open-AgentRL GRPO-TCR Single Both Multi Reasoning/GUI/Coding Model (PRM) Yes (SandboxFusion)
OpenClaw-RL GRPO/OPD Both Both Multi Terminal/GUI/SWE/Tool-call Model/External Yes
Claw-R1 Generic RL Framework Multi Both Multi General Agent All Yes (Framework-agnostic)
prime-rl GRPO/PPO Multi Outcome Multi Math/Code/Search Model/External Yes
NeMo-RL GRPO/DAPO/GDPO/DPO Single Outcome Multi Math/Reasoning/Code Rule/External No
RLinf PPO/GRPO/DAPO/SAC/REINFORCE++/CrossQ/RLPD Both Both Multi Robotics/Math/Code/QA/VQA All (Rule/Model/External) Yes
siiRL PPO/GRPO/CPGD/MARFT Multi Both Multi LLM/VLM/LLM-MAS PostTraining Model/Rule Planned
slime GRPO/GSPO/REINFORCE++ Single Both Both Math/Code External Verifier Yes
agent-lightning PPO/Custom/Automatic Prompt Optimization Multi Outcome Multi Calculator/SQL Model/External/Rule Yes
AReaL PPO Both Outcome Both Math/Code External Yes
ROLL PPO/GRPO/Reinforce++/TOPR/RAFT++ Multi Both Multi Math/QA/Code/Alignment All Yes
MARTI PPO/GRPO/REINFORCE++/TTRL Multi Both Multi Math All Yes
Tunix PPO/GRPO/GSPO-Token/DAPO/Dr.GRPO Single Outcome Multi Math/Code/Game Rule/External Yes
RL2 Dr. GRPO/PPO/DPO Single Both Both QA/Dialogue Rule/Model/External Yes
verifiers GRPO Multi Outcome Both Reasoning/Math/Code All Code
oat PPO/GRPO Single Outcome Multi Math/Alignment External No
veRL PPO/GRPO Single Outcome Both Math/QA/Reasoning/Search All Yes
OpenRLHF PPO/REINFORCE++/GRPO/DPO/IPO/KTO/RLOO Multi Both Both Dialogue/Chat/Completion Rule/Model/External Yes
trl PPO/GRPO/DPO Single Both Single QA Custom No

💪 General/MultiTask

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MetaClaw Stars 2026.3 UNC-Chapel Hill (AIMING Lab) Paper Custom
SkillRL Stars 2026.2 UNC-Chapel Hill (AIMING Lab) Paper Custom
LLM-in-Sandbox Stars 2026.1 RUC/MSRA/THU Paper rllm (w/ veRL)
youtu-agent Stars 2025.12 Tencent Youtu Lab Paper Custom
DEPO Stars 2025.11 HKUST/SJTU Paper LLaMA-Factory
SPEAR Stars 2025.10 Tencent Youtu Lab Paper veRL/verl-agent
DeepAgent Stars 2025.10 RUC/Xiaohongshu Paper Custom
AgentRL Stars 2025.9 Tsinghua Paper veRL
AgentGym-RL Stars 2025.9 Fudan University Paper veRL
Agent_Foundation_Models Stars 2025.8 OPPO Personal AI Lab Paper veRL
Trinity-RFT Stars 2025.5 Alibaba Paper veRL
SPA-RL-Agent Stars 2025.5 PolyU Paper TRL
verl-agent Stars 2025.5 NTU/Skywork Paper veRL
SkyRL Stars 2025.4 UC Berkeley / NovaSky-AI Paper Self (skyrl-train)
VAGEN Stars 2025.3 Northwestern University (mll-lab-nu) Paper veRL
ART Stars 2025.3 OpenPipe Paper TRL
OpenManus-RL Stars 2025.3 UIUC/MetaGPT -- Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MetaClaw GRPO (LoRA) Single Process Multi General Agentic Model (PRM) Yes (Skill-augmented)
SkillRL GRPO Single Outcome Multi ALFWorld/WebShop/Search Rule Yes (Web search, actions)
LLM-in-Sandbox GRPO++ Single Outcome Multi Math/Physics/Chemistry/Biomedicine/Long-context/IF/SWE Rule Yes (Code Sandbox w/ Terminal, File, Internet)
youtu-agent Training-Free GRPO Single Outcome Multi Deep Research/Data Analysis/Tool-use Model/External Yes (Web search, code, file)
DEPO KTO + Efficiency Loss Single Both Multi Agent (BabyAI/WebShop) Rule Yes
SPEAR GRPO/GiGPO + SIL Single Both Multi Math/Agent Rule/External Yes (Search, Sandbox, Browser)
DeepAgent ToolPO Single Outcome Multi ToolBench/ALFWorld/WebShop/GAIA/HLE Model Yes (16,000+ RapidAPIs)
AgentRL GRPO/REINFORCE++/RLOO/ReMax/GAE Single Outcome Multi Agent Tasks External Yes
AgentGym-RL PPO/GRPO/RLOO/REINFORCE++ Single Outcome Multi Web/Search/Game/Embodied/Science Rule/Model/External Yes (Web, Search, Env APIs)
Agent_Foundation_Models DAPO/PPO Single Outcome Single QA/Code/Math Rule/External Yes
Trinity-RFT PPO/GRPO Single Outcome Both Math/TextGame/Web All Yes
SPA-RL-Agent PPO Single Process Multi Navigation/Web/TextGame Model No
verl-agent PPO/GRPO/GiGPO/DAPO/RLOO/REINFORCE++ Multi Both Multi Phone Use/Math/Code/Web/TextGame All Yes
SkyRL GRPO/PPO Single Both Multi Long-horizon Agents (SWE-Bench/Search/Math/SQL) Rule/External/Custom Yes
VAGEN PPO/GRPO (World Modeling RL) Single Both Multi Navigation/TextGame/Multimodal All Yes
ART GRPO Multi Both Multi TextGame All Yes
OpenManus-RL PPO/DPO/GRPO Multi Outcome Multi TextGame All Yes

🔍 Search & RAG Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
ProRAG Stars 2026.1 RUC Paper Custom
MemSearcher Stars 2025.11 CAS Paper Custom
ReSeek Stars 2025.10 Tencent PCG BAC/Tsinghua University Paper veRL
AutoGraph-R1 Stars 2025.10 HKUST KnowComp Paper Custom
Tree-GRPO Stars 2025.9 AMAP Paper veRL
ASearcher Stars 2025.8 Ant Research RL Lab
Tsinghua University & UW
Paper RealHF/AReaL
Graph-R1 Stars 2025.7 BUPT/NTU/NUS Paper veRL
Kimi-Researcher Stars 2025.6 Moonshot AI blog Custom
R-Search Stars 2025.6 Individual -- veRL
R1-Searcher-plus Stars 2025.5 RUC Paper Custom
StepSearch Stars 2025.5 SenseTime Paper veRL
AutoRefine Stars 2025.5 USTC Paper veRL
ZeroSearch Stars 2025.5 Alibaba Paper veRL
ReasonRAG Stars 2025.5 CityU HK / Huawei Paper Custom
Agentic-RAG-R1 Stars 2025.12 PKU -- Custom
WebThinker Stars 2025.4 RUC Paper Custom
DeepResearcher Stars 2025.4 SJTU Paper veRL
Search-R1 Stars 2025.3 UIUC/Google paper1, paper2 veRL
R1-Searcher Stars 2025.3 RUC Paper OpenRLHF
C-3PO Stars 2025.2 Alibaba Paper OpenRLHF
DeepRetrieval Stars 2025.2 UIUC Paper veRL
SSRL Stars 2025.8 Tsinghua Paper Custom
Research-Venus Stars 2025.8 Ant Group Paper Custom
DeepResearch Stars 2025.9 Alibaba/Tongyi Lab Paper Custom
DeepDive Stars 2025.9 Tsinghua/THUDM Paper Custom
O-Researcher Stars 2026.1 OPPO PersonalAI Lab Paper Custom
DR Tulu Stars 2025.11 AI2 / UW / CMU / MIT Paper Open-Instruct
WebSeer Stars 2025.10 Individual Paper veRL
HiPRAG Stars 2025.10 Individual Paper veRL
VRAG Stars 2025.5 USTC / Tongyi Lab, Alibaba Paper veRL
MaskSearch Stars 2025.5 Tongyi Lab, Alibaba Paper DAPO / veRL
R3-RAG Stars 2025.5 Fudan NLP Paper OpenRLHF
O2-Searcher Stars 2025.5 KnowledgeXLab Paper veRL
s3 Stars 2025.5 UIUC Paper veRL
knowledge-r1 Stars 2025.5 CAS / UCAS Paper veRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
ProRAG GRPO + DGA (dual-granularity advantage) Single Both Multi Multi-hop RAG Model (PRM via MCTS) Yes (Retrieval)
MemSearcher Multi-context GRPO Single Outcome Multi Search/QA + Memory Rule/Model Yes (Web search + Memory)
ReSeek GRPO/PPO Single Both Multi QA/Search Rule Search/JUDGE
AutoGraph-R1 GRPO (via VeRL) Single Outcome Multi KG Construction for QA Rule Yes (Graph retrieval)
Tree-GRPO GRPO/Tree-GRPO Single Outcome Multi Search Rule Search
ASearcher PPO/GRPO + Decoupled PPO Single Outcome Multi Math/Code/SearchQA External/Rule Yes
Graph-R1 GRPO/REINFORCE++/PPO Single Outcome Multi KGQA Rule (EM/F1) Yes (Graph retrieval)
Kimi-Researcher REINFORCE Single Outcome Multi Research Outcome Search, Browse, Coding
R-Search PPO/GRPO Single Both Multi QA/Search All Yes
R1-Searcher-plus Custom Single Outcome Multi Search Model Search
StepSearch PPO Single Process Multi QA Model Search
AutoRefine PPO/GRPO Multi Both Multi RAG QA Rule Search
ZeroSearch PPO/GRPO/REINFORCE Single Outcome Multi QA/Search Rule Yes
ReasonRAG DPO + MCTS-based PRM Single Process Multi Multi-hop QA Model (PRM) Yes (Wikipedia search)
Agentic-RAG-R1 GRPO Single Outcome Multi Knowledge-intensive QA Rule/Model Yes (Wiki/Doc search)
WebThinker DPO Single Outcome Multi Reasoning/QA/Research Model/External Web Browsing
DeepResearcher PPO/GRPO Multi Outcome Multi Research All Yes
Search-R1 PPO/GRPO Single Outcome Multi Search All Search
R1-Searcher PPO/DPO Single Both Multi Search All Yes
C-3PO PPO Multi Outcome Multi Search Model Yes
DeepRetrieval GRPO Single Outcome Multi Query Generation/IR Rule Yes (Search)
SSRL GRPO Single Outcome Multi Self-Search Rule Yes (Self-search)
Research-Venus GRPO Single Both Multi Deep Research Model (atomic thought) Yes (Search)
DeepResearch RL-based Single Outcome Multi Deep Research Model Yes (Search, Browse)
DeepDive GRPO Single Outcome Multi KG-augmented Search Rule Yes (KG + Search)
O-Researcher GRPO + RLAIF Multi Process Multi Deep Research (Zhihu-KOL/WideSearch/ELI5) Model (LLM-as-Judge) Yes (Search/Crawl)
DR Tulu GRPO + evolving rubrics Single Outcome Multi Long-form Deep Research Model (rubrics) Yes (Search/MCP)
WebSeer GRPO-style Single Outcome Multi Web Search QA (w/ self-reflection) Rule/Model Yes (Search)
HiPRAG PPO Single Process Multi Efficient Agentic RAG Model/Rule Yes (Retrieval)
VRAG GRPO Single Both Multi Visually-rich RAG Rule/Model Yes (Visual retrieval)
MaskSearch DAPO Single Outcome Multi RAMP Pretraining + QA Rule/Model Yes (Search)
R3-RAG PPO Single Both Multi Multi-hop QA Rule Yes (Retrieval)
O2-Searcher GRPO Single Outcome Multi Open-ended QA Rule/Model Yes (Search)
s3 GRPO Single Outcome Multi RAG / Medical QA Model (Gain-Beyond-RAG) Yes (Retrieval)
knowledge-r1 GRPO Single Outcome Multi Knowledge-intensive QA (KB-aware) Rule Yes (Retrieval)

🌐 Web & GUI Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MobileAgent Stars 2025.9 X-PLUG (TongyiQwen) paper veRL
InfiGUI-G1 Stars 2025.8 InfiX AI Paper veRL
UI-AGILE Stars 2025.7 Xiamen University Paper Custom
gui-rcpo Stars 2025.8 Zhejiang University Paper Custom
Grounding-R1 Stars 2025.6 Salesforce blog trl
AgentCPM-GUI Stars 2025.6 OpenBMB/Tsinghua/RUC Paper Huggingface
TTI Stars 2025.6 CMU Paper Custom
SE-GUI Stars 2025.5 Nankai University/vivo Paper trl
ARPO Stars 2025.5 CUHK/HKUST Paper veRL
GUI-G1 Stars 2025.5 RUC Paper TRL
WebAgent-R1 Stars 2025.5 Amazon/UVA Paper Custom
GUI-R1 Stars 2025.4 CAS/NUS Paper veRL
UI-R1 Stars 2025.3 vivo/CUHK Paper TRL
CollabUIAgents Stars 2025.2 Tsinghua/Alibaba/HKUST Paper Custom
WebAgent Stars 2025.1 Alibaba paper1, paper2 LLaMA-Factory
UI-TARS Stars 2025.9 ByteDance Seed Paper Custom
DigiQ Stars 2025.2 UC Berkeley/CMU/Amazon Paper Custom
ZeroGUI Stars 2025.5 Shanghai AI Lab Paper Custom
InfiGUI-R1 Stars 2025.4 Zhejiang University Paper Custom
GUI-Agent-RL Stars 2025.2 Microsoft Paper Custom
GUI-Libra Stars 2026.2 GUI-Libra (MS-affiliated) Paper Custom
MobileRL Stars 2025.9 Tsinghua / Zhipu AI (THUDM) Paper Custom
DART-GUI Stars 2025.9 Computer-use-agents Paper veRL
Mano-P Stars 2025.9 Mininglamp AI Paper Mano-SDK
GUI-G2 Stars 2025.7 Zhejiang University (ZJU-REAL) Paper Custom (VLM-R1)
MagicGUI Stars 2025.7 Honor (MagicAgent-GUI) Paper Custom
GTA1 Stars 2025.6 Salesforce / ANU Paper Custom (DeepSpeed)
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MobileAgent semi-online RL Single Both Multi MobileGUI/Automation Rule Yes
InfiGUI-G1 AEPO Single Outcome Single GUI/Grounding Rule No
UI-AGILE GRPO Single Outcome Single GUI Grounding Rule (continuous) No
gui-rcpo RCPO Single Outcome Single GUI Grounding Rule (self-supervised) No
Grounding-R1 GRPO Single Outcome Multi GUI Grounding Model Yes
AgentCPM-GUI GRPO Single Outcome Multi Mobile GUI Model Yes
TTI REINFORCE/BC Single Outcome Multi Web External Web Browsing
SE-GUI GRPO Single Both Single GUI Grounding Rule Yes
ARPO GRPO Single Outcome Multi GUI External Computer Use
GUI-G1 GRPO Single Outcome Single GUI Rule/External No
WebAgent-R1 M-GRPO Single Outcome Multi Web Navigation (WebArena-Lite) Rule (task success) Yes (Web browsing)
GUI-R1 GRPO Single Outcome Multi GUI Rule No
UI-R1 GRPO Single Process Both GUI Rule Computer/Phone Use
CollabUIAgents DPO (credit re-assignment) Multi Process Multi GUI (Mobile + Web) Model (LLM) Yes (GUI interaction)
WebAgent DAPO Multi Process Multi Web Model Yes
UI-TARS Multi-turn RL Single Both Multi GUI (Cross-platform) Model Yes (GUI actions)
DigiQ Value-based offline RL Single Outcome Multi Android Device Control Model (Q-function) Yes
ZeroGUI Online RL Single Outcome Multi GUI Agent Rule Yes (GUI actions)
InfiGUI-R1 RL + sub-goal guidance Single Both Multi GUI Reasoning Rule Yes
GUI-Agent-RL Value-based RL (VEM) Single Outcome Multi GUI (Web Shopping) Model Yes
GUI-Libra KL-regularized GRPO (Partially Verifiable RL) Single Outcome Multi GUI (AndroidWorld/WebArena/Online-Mind2Web) Rule Yes
MobileRL AdaGRPO (Difficulty-Adaptive) Single Outcome Multi Mobile GUI (AndroidWorld/AndroidLab) Rule Yes (Android)
DART-GUI Decoupled GRPO Single Outcome Multi GUI (OSWorld) Rule Yes
Mano-P Three-stage SFT→Offline RL→Online RL Single Both Multi GUI (OSWorld) Rule Yes
GUI-G2 GRPO (Gaussian Reward) Single Outcome Single GUI Grounding Rule (continuous) No
MagicGUI Reinforcement Fine-Tuning (RFT) Single Outcome Multi Mobile GUI Model/Rule Yes
GTA1 GRPO-style (click-success reward) Single Outcome Multi GUI Grounding (OSWorld/ScreenSpot-Pro) Rule Yes

🔨 Tool-Use Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MATPO Stars 2025.10 MiroMind AI Paper Custom
MiroRL Stars 2025.8 MiroMindAI HF Repo veRL
verl-tool Stars 2025.6 TIGER-Lab X veRL
Multi-Turn-RL-Agent Stars 2025.5 University of Minnesota Paper Custom
Tool-N1 Stars 2025.5 NVIDIA Paper veRL
Tool-Star Stars 2025.5 RUC Paper LLaMA-Factory
RL-Factory Stars 2025.5 Simple-Efficient model veRL
ReTool Stars 2025.4 ByteDance Paper veRL
AWorld Stars 2025.3 Ant Group (inclusionAI) Paper veRL
Agent-R1 Stars 2025.3 USTC Paper veRL
ReCall Stars 2025.3 BaiChuan Paper veRL
ToolRL Stars 2025.4 UIUC Paper veRL
ToolOrchestra Stars 2025.11 NVIDIA / HKU Paper Custom (veRL-based)
ToolMaster Stars 2025.11 Northeastern University (NEUIR) Paper Custom
CodeGym Stars 2025.9 Academic Paper Custom
UserRL Stars 2025.9 Salesforce AI Research Paper veRL
ToolBrain Stars 2025.9 ToolBrain (AAMAS 2026) Paper Custom
Tool-R1 Stars 2025.9 Individual (YBYBZhang) Paper Custom
calculator_agent_rl Stars 2025.5 Individual (Danau5tin) -- Verifiers
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MATPO GRPO (multi-agent) Multi Outcome Multi Tool-use/Search Rule Yes (MCP: Serper, Web scraping)
MiroRL GRPO Single Both Multi Reasoning/Planning/ToolUse Rule-based MCP
verl-tool PPO/GRPO Single Both Both Math/Code Rule/External Yes
Multi-Turn-RL-Agent GRPO Single Both Multi Tool-use/Math Rule/External Yes
Tool-N1 PPO Single Outcome Multi Math/Dialogue All Yes
Tool-Star PPO/DPO/ORPO/SimPO/KTO Single Outcome Multi Multi-modal/Tool Use/Dialogue Model/External Yes
RL-Factory GRPO Multi Both Multi Tool-use/NL2SQL All MCP
ReTool PPO Single Outcome Multi Math External Code
AWorld GRPO Both Outcome Multi Search/Web/Code External/Rule Yes
Agent-R1 PPO/GRPO Single Both Multi Tool-use/QA Model Yes
ReCall PPO/GRPO/RLOO/REINFORCE++/ReMax Single Outcome Multi Tool-use/Math/QA All Yes
ToolRL GRPO/PPO Single Outcome Multi Tool Learning Rule/External Yes
ToolOrchestra End-to-end RL (outcome+efficiency+preference) Single Both Multi Tool orchestration / agentic workflows All Yes (Search/Code/LLMs)
ToolMaster SFT + GRPO (trial-then-execute) Single Outcome Multi Tool trialing + execution (ToolHop/TMDB/StableToolBench) Rule/External Yes (Simulated tools)
CodeGym GRPO-family Single Outcome Multi Synthetic Multi-turn Tool-Use Rule (verifiable) Yes (Synthesized tools)
UserRL GRPO (multi-turn credit) Single Both Multi User-centric (Function/Persuade/Search/Tau Gyms) Model/External Yes
ToolBrain GRPO/DPO Single Outcome Multi Agentic tool training Rule/Model Yes (User-defined tools)
Tool-R1 Policy optimization (PPO-style) Single Outcome Multi Agentic Tool Use (GAIA) Model + External Yes (Python exec)
calculator_agent_rl GRPO Single Outcome Multi Calculator Tool Use Model (Claude-judge) Yes

💻 Code & SWE Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
CUDA-Agent Stars 2026.2 ByteDance/Tsinghua Paper Custom
LLM-in-Sandbox Stars 2026.1 RUC/MSRA/THU Paper rllm (w/ veRL)
PPP-Agent Stars 2025.11 CMU/OpenHands Paper veRL
RepoDeepSearch Stars 2025.8 PKU, Bytedance, BIT Paper veRL
CUDA-L1 Stars 2025.7 DeepReinforce AI Paper Custom
MedAgentGym Stars 2025.6 Emory/Georgia Tech Paper Hugginface
CURE Stars 2025.6 University of Chicago
Princeton/ByteDance
Paper Huggingface
Time-R1 Stars 2025.5 UIUC Paper veRL
ML-Agent Stars 2025.5 MASWorks Paper Custom
digitalhuman Stars 2025.4 Tencent Paper veRL
sweet_rl Stars 2025.3 Meta/UCB Paper OpenRLHF
swe-rl Stars 2025.2 Meta/UIUC/CMU Paper Custom
rllm Stars 2025.1 Berkeley Sky Computing Lab
BAIR / Together AI
Notion Blog veRL
open-r1 Stars 2025.1 HuggingFace -- TRL
R1-Code-Interpreter Stars 2025.5 MIT Paper Custom
CTRL Stars 2025.2 HKU/ByteDance Paper Custom
DeepAnalyze Stars 2025.10 RUC/Tsinghua Paper Custom
AceCoder Stars 2025.2 Waterloo (TIGER-Lab) Paper Custom
SWE-World Stars 2026.2 RUC (RUCAIBox) Paper OpenRLHF + veRL
CUDA-L2 Stars 2026.1 DeepReinforce AI Paper Custom
SWE-Swiss Stars 2025.7 Tsinghua / ByteDance -- veRL
Skywork-OR1 Stars 2025.4 Skywork AI Paper Custom (veRL fork)
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
CUDA-Agent Agentic RL (staged) Single Outcome Multi CUDA Kernel Generation Rule (correctness + performance) Yes (compile/verify/profile)
LLM-in-Sandbox GRPO++ Single Outcome Multi Code/SWE + General (Math/Sci/Bio) Rule Yes (Code Sandbox w/ Terminal, File, Internet)
PPP-Agent PPP-RL Single Both Multi SWE/Research Rule+Model Search, Ask, Browse
RepoDeepSearch GRPO Single Both Multi Search/Repair Rule/External Yes
CUDA-L1 Contrastive RL Single Outcome Single CUDA Optimization Rule (performance) No
MedAgentGym SFT/DPO/PPO/GRPO Single Outcome Multi Medical/Code External Yes
CURE PPO Single Outcome Single Code External No
Time-R1 PPO/GRPO/DPO Multi Outcome Multi Temporal All Code
ML-Agent Custom Single Process Multi Code All Yes
digitalhuman PPO/GRPO/ReMax/RLOO Multi Outcome Multi Empathy/Math/Code/MultimodalQA Rule/Model/External Yes
sweet_rl DPO Multi Process Multi Design/Code Model Web Browsing
swe-rl RL-based Single Outcome Single SWE (SWE-bench) Rule (similarity) No
rllm PPO/GRPO Single Outcome Multi Code Edit External Yes
open-r1 GRPO Single Outcome Single Math/Code All Yes
R1-Code-Interpreter GRPO Single Outcome Multi Code Interpretation Rule/External Yes (Code exec)
CTRL RL (critique-revision) Single Process Multi Code Refinement Model Yes (Code exec)
DeepAnalyze Curriculum RL Single Outcome Multi Data Science Rule/External Yes (Code exec)
AceCoder GRPO Single Outcome Single Code Generation External (test cases) Yes
SWE-World RL with learned world model (SWT + SWR) Single Both Multi Docker-free SWE (SWE-Bench Verified) Model (surrogate) + Rule Yes
CUDA-L2 Contrastive RL Single Outcome Single HGEMM / CUDA Matmul Rule (TFLOPs) Yes (compile/benchmark)
SWE-Swiss Two-stage RL curriculum Single Outcome Multi SWE (Localization/Repair/Unit-Test) Rule (test-based) Yes
Skywork-OR1 Large-scale rule-based RL (GRPO variant) Single Outcome Single Math + Code (AIME/LiveCodeBench) Rule (verifiable) No

🤔 Reasoning Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
Agent0 Stars 2025.10 UNC‑Chapel Hill / Salesforce Research / Stanford University Paper veRL
KG-R1 Stars 2025.9 UIUC/Google Paper1, Paper2 veRL
AgentFlow Stars 2025.09 Stanford University arXiv veRL
ARPO Stars 2025.7 RUC, Kuaishou Paper veRL
terminal-bench-rl Stars 2025.7 Individual (Danau5tin) N/A rLLM
MOTIF Stars 2025.6 University of Maryland Paper trl
cmriat/l0 Stars 2025.6 CMRIAT Paper veRL
agent-distillation Stars 2025.5 KAIST Paper Custom
EasyR1 Stars 2025.4 Individual repo1/paper2 veRL
AutoCoA Stars 2025.3 BJTU Paper veRL
ToRL Stars 2025.3 SJTU Paper veRL
ReMA Stars 2025.3 SJTU, UCL Paper veRL
Agentic-Reasoning Stars 2025.2 Oxford Paper Custom
SimpleTIR Stars 2025.2 NTU, Bytedance Notion Blog veRL
openrlhf_async_pipline Stars 2024.5 OpenRLHF Paper OpenRLHF
THOR Stars 2025.9 USTC / iFLYTEK Paper veRL
Tool-Light Stars 2025.9 RUC (RUC-NLPIR) Paper LLaMA-Factory
AutoTIR Stars 2025.7 Beihang University / BAAI Paper veRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
Agent0 ADPO Multi Process Multi Math/Visual Model/Verifier Yes
KG-R1 GRPO/PPO Single Both Multi KGQA Rule/Model KG Retrieval
AgentFlow Flow-GRPO Single Outcome Multi Search/Math/QA Model/External Yes
ARPO GRPO Single Outcome Multi Math/Coding Model/Rule Yes
terminal-bench-rl GRPO Single Outcome Multi Coding/Terminal Model+External Verifier Yes
MOTIF GRPO Single Outcome Multi QA Rule No
cmriat/l0 PPO Multi Process Multi QA All Yes
agent-distillation PPO Single Process Multi QA/Math External Yes
EasyR1 GRPO Single Process Multi Vision-Language Model Yes
AutoCoA GRPO Multi Outcome Multi Reasoning/Math/QA All Yes
ToRL GRPO Single Outcome Single Math Rule/External Yes
ReMA PPO Multi Outcome Multi Math Rule No
Agentic-Reasoning Custom Single Process Multi QA/Math External Web Browsing
SimpleTIR PPO/GRPO (with extensions) Single Outcome Multi Math, Coding All Yes
openrlhf_async_pipline PPO/REINFORCE++/DPO/RLOO Single Outcome Multi Dialogue/Reasoning/QA All No
THOR Hierarchical GRPO (trajectory+step) Single Both Multi Math (MATH500/AIME/Olympiad) External (SandboxFusion) Yes (Python)
Tool-Light Self-Evolved DPO Single Outcome Multi Tool-Integrated Reasoning Model (preference) Yes (FlashRAG/Python)
AutoTIR PPO Single Outcome Multi Autonomous Tool Selection (QA/Math/IF) Rule Yes (Search/Python)

👥 Multi-Agent RL

Github Repo 🌟 Stars Date Org Paper Link RL Framework
PettingLLMs Stars 2025.10 Intel / UCSD Paper Custom
MASPRM Stars 2025.10 UBC / Huawei Paper Custom
ARIA Stars 2025.6 Fudan University Paper Custom
AMPO Stars 2025.5 Tongyi Lab, Alibaba Paper veRL
MAPoRL Stars 2025.8 Academic -- Custom
FlowReasoner Stars 2025.4 Sea AI Lab / NUS Paper Custom
DrMAS Stars 2026.2 NTU Paper Custom
MarsRL Stars 2025.11 Academic Paper veRL
MrlX Stars 2025.10 Ant Group (AQ-MedAI) Paper Custom (SGLang + Megatron)
CoMAS Stars 2025.10 Shanghai AI Lab / CUHK / Oxford / NUS Paper Custom
CoMLRL Stars 2025.8 OpenMLRL Paper TRL
SPIRAL Stars 2025.6 NUS / A*STAR / Sea AI Lab Paper Oat
MARFT Stars 2025.4 SII / SJTU Paper Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
PettingLLMs AT-GRPO Multi Both Multi Game/Code/Math/Planning Rule (verifiable) No
MASPRM PRM (trained from MCTS rollouts) Multi Process Multi Reasoning (GSM8K/MATH/MMLU) Learned PRM No
ARIA REINFORCE Both Process Multi Negotiation/Bargaining Other No
AMPO BC/AMPO(GRPO improvement) Multi Outcome Multi Social Interaction Model-based No
MAPoRL PPO Multi Outcome Multi Collaborative LLM Tasks Rule No
FlowReasoner GRPO Multi Outcome Multi Multi-agent Workflow Design Rule Yes
DrMAS GRPO (agent-wise) Multi Outcome Multi Multi-agent LLM Systems Rule No
MarsRL RLVR (agent-specific rewards) Multi Both Multi Math Reasoning (AIME/BeyondAIME) Rule (verifiable) No
MrlX M-GRPO (hierarchical) Multi Outcome Multi Deep Research (GAIA/XBench) Rule + Model Yes (Search)
CoMAS RL w/ LLM-Judge intrinsic reward Multi Process Multi Co-evolving Reasoning Model No
CoMLRL MAGRPO / MAREINFORCE / MARLOO Multi Outcome Multi Writing / Code / Minecraft Custom Minimal
SPIRAL Role-conditioned Advantage Estimation (RAE) Multi Outcome Multi Zero-sum Games (TicTacToe/Kuhn/Negotiation) Rule No
MARFT MARFT paradigm (action+token level) Multi Both Multi Research / Math Rule Yes

🧠 Memory

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MEM1 Stars 2025.7 MIT Paper veRL (based on Search-R1)
Memento Stars 2025.6 UCL, Huawei Paper Custom
MemAgent Stars 2025.6 Bytedance, Tsinghua-SIA Paper veRL
Mem-alpha Stars 2025.9 UCSD / USTC Paper veRL
M3-Agent Stars 2025.7 ByteDance Seed / Zhejiang University Paper Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MEM1 PPO/GRPO Single Outcome Multi WebShop/GSM8K/QA Rule/Model Yes
Memento soft Q-Learning Single Outcome Multi Research/QA/Code/Web External/Rule Yes
MemAgent PPO, GRPO, DPO Multi Outcome Multi Long-context QA Rule/Model/External Yes
Mem-alpha GRPO Single Outcome Multi Long-context QA + Memory Construction Rule (downstream QA) Yes (memory tools)
M3-Agent RL-based Single Outcome Multi Long-video QA (M3-Bench) Rule/Model Yes (multimodal memory graph)

🦾 Embodied

Github Repo 🌟 Stars Date Org Paper Link RL Framework
Embodied-R1 Stars 2025.6 Tianjing University Paper veRL
STeCa Stars 2025.2 The Hong Kong Polytechnic University Paper FastChat/TRL
VIKI-R Stars 2025.6 MARS-EAI (NeurIPS 2025 D&B) Paper veRL + LLaMA-Factory
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
Embodied-R1 GRPO Single Outcome Single Grounding/Waypoint Rule No
STeCa DPO (RFT) Single Both Multi Embodied/Household Rule/MC Environment Actions
VIKI-R GRPO (RFT after SFT) Multi Outcome Multi Embodied Multi-Robot Cooperation (VIKI-Bench) Rule + Model No

🏷️ Domain-Specific

Github Repo 🌟 Stars Date Org Paper Link RL Framework Domain
MedSAM-Agent Stars 2026.2 CUHK/Tencent Paper Custom Medical
OS-R1 Stars 2025.8 ISCAS Paper Custom OS/Systems
MMedAgent-RL Stars 2025.8 Unknown paper Unknown Medical
DoctorAgent-RL Stars 2025.5 UCAS/CAS/USTC Paper RAGEN Medical
Biomni Stars 2025.3 Stanford University (SNAP) Paper Custom Biomedical
Doctor-R1 Stars 2025.12 Tsinghua (thu-unicorn) Paper veRL Medical
Alpha-R1 Stars 2025.12 SJTU / FinStep.AI / StepFun Paper Custom Financial
MedResearcher-R1 Stars 2025.8 Ant Group (AQ-MedAI) Paper Custom Medical
LegalDelta Stars 2025.8 Northeastern University (NEUIR) Paper Custom Legal
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MedSAM-Agent GRPO (via veRL) Single Both Multi Medical Image Segmentation Model (clinical fidelity) Yes (SAM/MedSAM2)
OS-R1 GRPO (via veRL) Single Outcome Multi Linux Kernel Tuning Rule Yes (LightRAG, kernel config)
MMedAgent-RL Unknown Multi Unknown Unknown Unknown Unknown Unknown
DoctorAgent-RL GRPO Multi Both Multi Consultation/Diagnosis Model/Rule No
Biomni TBD Single TBD Single scRNAseq/CRISPR/ADMET/Knowledge TBD Yes
Doctor-R1 Experiential Agentic RL Multi Both Multi Clinical inquiry & diagnosis Model + Rule + safety veto No
Alpha-R1 GRPO Single Outcome Multi Alpha factor screening (with real-time news) External (portfolio returns) + Model Yes
MedResearcher-R1 GRPO-based (SFT + Online RL) Single Outcome Multi Medical Deep Research (MedBrowseComp) Rule + Model Yes (Search/KG)
LegalDelta GRPO (CoT-guided info-gain) Single Process Multi Legal Reasoning Model + Rule No

🎯 Reward & Training Methodology

Github Repo 🌟 Stars Date Org Paper Link Focus
ToolPRMBench Stars 2026.1 Arizona State University Paper PRM Benchmark for Tool-Use
RLVR-World Stars 2025.5 THU ML Group Paper RLVR for World Models
AgentPRM Stars 2025.2 Cornell Paper Process Reward for Agents
Agentic-Reward-Modeling Stars 2025.2 THU-KEG Paper Agentic Reward Agent
AgentRM Stars 2025.2 THUNLP/Tsinghua Paper Generalizable Agent RM
AgentProg Stars 2025.5 MobileLLM Paper Progress Reward Model (ProgRM)
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
ToolPRMBench N/A (Benchmark) Single Process Multi Tool-Use Rule/Model Yes
RLVR-World RLVR Single Outcome Multi World Modeling (Language/Video) Model (verifiable) No
AgentPRM PPO/DPO + PRM Single Process Multi ALFWorld/General Model (PRM) Yes
Agentic-Reward-Modeling DPO/Best-of-N Single Outcome Single General Instruction Model (Reward Agent) Yes (Verification)
AgentRM MCTS/RM-guided Single Outcome Multi 9 Agent Tasks Model (regression PRM) Yes
AgentProg Online RL w/ progress reward Single Process Multi GUI Agent Training Model (ProgRM) Yes

🛡️ Safety

Github Repo 🌟 Stars Date Org Paper Link RL Framework
SafeSearch Stars 2025.11 Amazon Science Paper veRL
curiosity_redteam Stars 2024.2 MIT Paper Custom
RLbreaker Stars 2024.6 Purdue Paper Custom
xJailbreak Stars 2025.1 Academic Paper Custom
Auto-RT Stars 2025.1 ICIP-CAS Paper Custom
ToolSafe Stars 2026.1 Academic (MurrayTom) Paper veRL
TROJail Stars 2025.12 Academic (ACL 2026) Paper RAGEN + vLLM
Jailbreak-R1 Stars 2025.6 Academic (yuki-younai) Paper Custom
GuardReasoner-VL Stars 2025.5 NUS (yueliu1999) Paper Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
SafeSearch PPO (GAE/GRPO) Single Both Multi Safe QA/Search Rule + Model Search
curiosity_redteam RL + Curiosity Single Outcome Multi Red Teaming Model Yes (iterative query)
RLbreaker Custom PPO Single Outcome Multi Jailbreaking Model Yes (mutator selection)
xJailbreak RL Single Outcome Multi Jailbreaking Model (embedding) Yes (iterative)
Auto-RT PPO Single Outcome Multi Red Teaming Model Yes (strategy exploration)
ToolSafe Multi-task GRPO Single Process Multi Tool-Invocation Safety Guardrail Rule + Model Yes (tool monitoring)
TROJail Multi-turn GRPO variant Single Both Multi Multi-turn Jailbreak Attack Model (harmfulness judge) + Rule Yes (target LLM)
Jailbreak-R1 GRPO (3-stage: imitation→warm-up→progressive) Single Both Multi Red-teaming Prompt Generation Model (judge) Yes (target LLM)
GuardReasoner-VL Online RL w/ rejection sampling Single Both Multi VLM Safety Guard (multimodal) Rule + Model No

👁️ VLM Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
multimodal-search-r1 Stars 2025.6 ByteDance/NTU Paper Custom
DeepEyesV2 Stars 2025.11 Xiaohongshu Paper Custom
VDeepEyes Stars 2025.5 Xiaohongshu/XJTU Paper veRL
CoSo Stars 2025.5 NTU/Alibaba Paper Custom
RL4VLM Stars 2024.5 UC Berkeley Paper Custom
VSC-RL Stars 2025.2 Liverpool/Huawei/Tianjin/UCL Paper Custom
AlphaDrive Stars 2025.3 HUST/Horizon Robotics Paper Custom
Mini-o3 Stars 2025.9 Mini-o3 team Paper veRL
VisionThink Stars 2025.7 CUHK (dvlab-research) Paper veRL + EasyR1
AutoVLA Stars 2025.6 UCLA Mobility Lab Paper Custom
Pixel-Reasoner Stars 2025.5 University of Waterloo (TIGER-AI-Lab) Paper OpenRLHF
Visual-ARFT Stars 2025.5 Shanghai AI Lab / SJTU Paper Custom
VTool-R1 Stars 2025.5 UIUC Paper veRL + EasyR1
OpenThinkIMG Stars 2025.5 Academic (zhaochen0110) Paper OpenR1
Chain-of-Focus Stars 2025.5 Multi-institution Paper veRL
GRIT Stars 2025.5 UC Santa Cruz (eric-ai-lab) Paper trl
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
multimodal-search-r1 GRPO Single Outcome Multi Multimodal Search Rule Yes (Search)
DeepEyesV2 Outcome RL Single Outcome Multi Multimodal Reasoning Rule Yes (Code exec, Web search)
VDeepEyes PPO/GRPO Multi Process Multi VQA All Yes
CoSo Soft RL (counterfactual) Single Outcome Multi Android/Card/Embodied Rule Yes
RL4VLM PPO Single Outcome Multi GymCards/ALFWorld Rule Yes
VSC-RL Variational RL Single Outcome Multi Mobile Device Control Rule Yes
AlphaDrive GRPO Single Outcome Multi Autonomous Driving Rule (4 planning rewards) No
Mini-o3 GRPO Single Outcome Multi Visual Search (V*/HR-Bench) Rule Yes (image crop)
VisionThink GRPO w/ LLM-as-Judge Single Outcome Multi Efficient VQA Model (LLM-Judge) Yes (hi-res request)
AutoVLA GRPO (RFT after SFT) Single Outcome Multi Autonomous Driving (nuScenes/nuPlan/Waymo) Rule (PDMS) No
Pixel-Reasoner Curiosity-driven GRPO Single Both Multi Visual Reasoning (V*/TallyQA/Info-VQA) Rule + Model Yes (zoom/select-frame)
Visual-ARFT GRPO (agentic RFT) Single Outcome Multi Multimodal Agentic Tool Use (MAT-Search/Coding) Rule Yes (Search/Python)
VTool-R1 RFT (GRPO-based) Single Outcome Multi Chart/Table VQA Rule Yes (Python visual tools)
OpenThinkIMG V-ToolRL (GRPO) Single Outcome Multi Chart Reasoning Rule Yes (GroundingDINO/SAM/OCR/crop)
Chain-of-Focus AGAR (GRPO) Single Outcome Multi Visual Reasoning (V*) Rule (outcome+format) Yes (zoom-in)
GRIT GRPO-GR (Grounded Reasoning) Single Outcome Single Visual Reasoning (bbox) Rule Yes (bbox)

🔄 Self-Evolution

⚠️ Note: The definition of "Self-Evolution" in the context of RL for LLM agents is still evolving and not yet well-established. This category currently collects works whose paper titles explicitly contain "self-evolving" or "self-evolution", where the agent improves itself through RL-driven feedback loops.

Github Repo 🌟 Stars Date Org Paper Link RL Framework
AgentEvolver Stars 2025.11 Alibaba/Tongyi Lab Paper Custom
SEAgent Stars 2025.8 Shanghai AI Lab / CUHK Paper Custom
MemSkill Stars 2026.2 NTU/UIUC/UIC/Tsinghua Paper Custom
MemRL Stars 2026.1 SJTU/Xidian/NUS/USTC/MemTensor Paper Custom
RAGEN Stars 2025.1 RAGEN-AI Paper veRL
WebRL Stars 2024.11 Tsinghua/Zhipu AI Paper Custom
EvolveR Stars 2025.10 KnowledgeXLab / Shanghai AI Lab Paper veRL
R-Zero Stars 2025.8 Tencent AI Seattle Lab / WashU / UMD Paper EasyR1
Absolute-Zero-Reasoner Stars 2025.5 Tsinghua (LeapLabTHU) / BIGAI / PSU Paper veRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
AgentEvolver ADCA-GRPO Single Outcome Multi Social Game/Tool-use Rule Yes
SEAgent GRPO Single Outcome Multi Computer Use (OSWorld) Model Yes (Screenshot-based)
MemSkill PPO Single Process Multi QA/ALFWorld Model (learned skills) Yes
MemRL RL-based (Q-value) Single Process Multi HLE/BigCodeBench/ALFWorld Model (retrieval) Yes
RAGEN PPO/GRPO (StarPO) Single Both Multi TextGame All Yes
WebRL Actor-Critic RL + ORM Single Outcome Multi Web Navigation (WebArena) Model (ORM) Yes (Web browsing)
EvolveR GRPO (closed-loop online+offline) Single Outcome Multi Multi-hop QA (NQ/HotpotQA) Rule Yes (experience retrieval)
R-Zero GRPO (Challenger + Solver co-evolution) Multi Outcome Multi Math/SuperGPQA/MMLU-Pro/BBEH Rule (majority voting) No
Absolute-Zero-Reasoner TRR++ (Task-Relative REINFORCE++) Single Outcome Single Code/Math Reasoning (HumanEval/MBPP/LiveCodeBench) Rule + learnability Yes (Python exec)

⛰️ Environment

Github Repo 🌟 Stars Date Org Task
OpenSandbox Stars 2026.3 Alibaba Code/GUI/Agent Eval
OpenEnv Stars 2026.3 Meta (PyTorch) Chess/Arcade/Finance
NeMo-Gym Stars 2026.1 NVIDIA Multi-step/Multi-turn
open-trajectory-gym Stars 2026.3 Individual CTF/Security
R2E-Gym Stars 2025.4 UC Berkeley/ANU SWE
LoCoBench-Agent 2025.11 Salesforce AI Research SWE
Simia-Agent-Training 2025.10 Microsoft ToolUse/API
PaperArena Stars 2025.9 University of Science and Technology of China ScientificLiteratureQA
enterprise-deep-research 2025.9 Salesforce AI Research DeepResearch
meta-agents-research-environments Stars 2025.9 Meta (FAIR) Gaia2 / Multi-universe
BrowseComp-Plus Stars 2025.8 University of Waterloo Deep Research Eval
MCP-Bench Stars 2025.8 Accenture MCP Tool-use (28 servers)
MCPVerse Stars 2025.8 Individual MCP Tools (550+)
CompassVerifier Stars 2025.7 Shanghai AI Lab Reasoning
tau2-bench Stars 2025.6 Sierra Research Tool-Agent-User
MCP-Universe Stars 2025.5 Salesforce AI Research MCP Tool-use
SWE-smith Stars 2025.4 Princeton/Stanford/SWE-bench SWE
SWE-Gym Stars 2024.12 UC Berkeley/UIUC/CMU/Apple SWE
Mind2Web-2 Stars 2025.6 Ohio State University Web
gem Stars 2025.5 Sea AI Lab Math/Code/Game/QA
MLE-Dojo Stars 2025.5 GIT, Stanford MLE
atropos Stars 2025.4 Nous Research Game/Code/Tool
InternBootcamp Stars 2025.4 InternBootcamp Coding/QA/Game
loong Stars 2025.3 CAMEL-AI.org RLVR
DataSciBench Stars 2025.2 Tsinghua data analysis
reasoning-gym Stars 2025.1 open-thought Math/Game
llmgym Stars 2025.1 tensorzero TextGame/Tool
debug-gym Stars 2024.11 Microsoft Research Debugging/Game/Code
gym-llm Stars 2024.8 Rodrigo Sánchez Molina Control/Game
AgentGym Stars 2024.6 Fudan Web/Game
tau-bench Stars 2024.6 Sierra Tool
appworld Stars 2024.6 Stony Brook University Phone Use
android_world Stars 2024.5 Google Research Phone Use
TheAgentCompany Stars 2024.3 CMU, Duke Coding
LlamaGym Stars 2024.3 Rohan Pandey Game
visualwebarena Stars 2024.1 CMU Web
LMRL-Gym Stars 2023.12 UC Berkeley Game
OSWorld Stars 2023.10 HKU, CMU, Salesforce, Waterloo Computer Use
webarena Stars 2023.7 CMU Web
AgentBench Stars 2023.7 Tsinghua University Game/Web/QA/Tool
WebShop Stars 2022.7 Princeton-NLP Web
ScienceWorld Stars 2022.3 AllenAI TextGame/ScienceQA
alfworld Stars 2020.10 Microsoft, CMU, UW Embodied
factorio-learning-environment Stars 2021.6 JackHopkins Game
jericho Stars 2018.10 Microsoft, GIT TextGame
TextWorld Stars 2018.6 Microsoft Research TextGame

Under Review/Waiting for Open Source

Star History

Star History Chart

Citation

If you find this repository useful, please consider citing it:

@misc{agentsMeetRL,
  title={When LLM Agents Meet Reinforcement Learning: A Comprehensive Survey},
  author={AgentsMeetRL Contributors},
  year={2025},
  url={https://github.com/thinkwee/agentsMeetRL}
}

Made with ❤️ by the AgentsMeetRL community

Releases

No releases published

Packages

 
 
 

Contributors

Languages