agent-rl Training Scripts

Welcome to the agent-rl repository! This repo contains training scripts for Reinforcement Learning with GRPO (Group Relative Policy Optimization) using the ms‑swift framework (v3) along with bleeding edge versions of Hugging Face Transformers and vLLM.

## Requirements

See below guides for different RL Frameworks (ms-swift or EasyR1) and model sizes (Qwen 2.5 VL or Qwen 2.5)

Setup and Installation

For detailed instructions on setting up a RunPod instance with 1000GB storage, the SWIFT framework, and support for Qwen 2.5 VL models, please refer to: How to RunPod with Qwen 2.5 VL Models

For instructions on setting up and running EasyR1 (a reinforcement learning framework for LLMs) on a RunPod instance with Qwen 2.5 models, please refer to: How to Run EasyR1 with Qwen 2.5 Models on RunPod

Usage

Refer to the training scripts in the scripts_train/ directory for various configurations:

Full Training with vLLM
LoRA Training with/without vLLM

Each script sets critical training parameters such as batch sizes, number of generations, and reward functions. Check the comments within each script for further details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
documentation		documentation
images		images
scripts_train		scripts_train
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-rl Training Scripts

Setup and Installation

Usage

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-rl Training Scripts

Setup and Installation

Usage

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages