Welcome to the agent-rl repository! This repo contains training scripts for Reinforcement Learning with GRPO (Group Relative Policy Optimization) using the ms‑swift framework (v3) along with bleeding edge versions of Hugging Face Transformers and vLLM.
## RequirementsSee below guides for different RL Frameworks (ms-swift or EasyR1) and model sizes (Qwen 2.5 VL or Qwen 2.5)
For detailed instructions on setting up a RunPod instance with 1000GB storage, the SWIFT framework, and support for Qwen 2.5 VL models, please refer to: How to RunPod with Qwen 2.5 VL Models
For instructions on setting up and running EasyR1 (a reinforcement learning framework for LLMs) on a RunPod instance with Qwen 2.5 models, please refer to: How to Run EasyR1 with Qwen 2.5 Models on RunPod
Refer to the training scripts in the scripts_train/ directory for various configurations:
- Full Training with vLLM
- LoRA Training with/without vLLM
Each script sets critical training parameters such as batch sizes, number of generations, and reward functions. Check the comments within each script for further details.
