🌍 GEM: A Gym for Agentic LLMs

Overview

We’re entering the era of experience, where large language models (LLMs) learn not just from static datasets, but from interactive experience gathered in complex, expressive environments.

As a step toward this, we introduce GEM — a General Experience Maker for LLMs — an open-source environment suite designed for training agentic LLMs via online reinforcement learning.

Like OpenAI Gym for traditional RL, GEM provides a standardized API and a growing collection of diverse environments. It is training framework-agnostic and supports seamless integration with six popular RL training frameworks including Oat and Tinker, offering:

🧩 Clean, composable environment APIs
⚙️ Async vectorized execution for high-throughput simulation
🔧 Tool integration & custom wrappers
🧠 Multi-environment training
🎈 Ready-to-use benchmark environments and algorithms

Links

Installation

pip install -U gem-llm

Or install from source for the latest version:

git clone https://github.com/axon-rl/gem.git
cd gem
pip install -e .

Please check Getting Started for more setup details.

🔥 You can jump into examples to quickly start your agentic RL training with GEM & your favorite training framework.

Interface

GEM's interface closely follows OpenAI-Gym's API. Here's an example using the game:GuessTheNumber-v0 environment:

import gem

# List all supported environments
gem.print_envs()

# Initialize the environment
env = gem.make("game:GuessTheNumber-v0")

# Reset the environment to generate the first observation
observation, info = env.reset()

# Start the agent-environment loop
while True:
    action = env.sample_random_action() # insert policy here, e.g.,
    # (pseudocode) action = llm.generate(observation)

    # apply action and receive next observation, reward
    # and whether the episode has ended
    next_observation, reward, terminated, truncated, info = env.step(action)
    print("OBS", observation)
    print("ACT", action)

    # update the policy (online) here
    # e.g., policy = learn(policy, observation, action, reward, info)

    observation = next_observation
    # Exit when the episode terminates
    if terminated or truncated:
        break

Features

Environments consist of tasks and (optional) tools. Tool-calling is achieved via an environment wrapper, as demonstrated here.
GEM is training framework-agnostic, and we demonstrate its integration with six popular RL training frameworks.
We provide implementations and benchmarking results for different algorithms across a diverse set of environments.

Supported Tasks

Category	Example Environments	Description
Games	`game:GuessTheNumber-v0-hard`, `game:Sudoku-v0-easy`	Classic language games
Math	`math:Math12K`, `math:DeepScaleR40K`	Mathematical reasoning
Code	`code:CodeContest`, `code:Taco8k`	Competitive coding
QA	`qa:NaturalQuestions`, `qa:HotpotQA`	Knowledge-intensive question answering
ReasoningGym	`rg:arc_1d`, `rg:letter_counting`	Diverse synthetic reasoning tasks

Supported Tools

Tool	Description
Python	Python code executor that parses code blocks, executes them, and returns outputs
Search	Calls a search engine to retrieve documents for any query
MCP	Calls the general MCP API to train tool-use agents

Supported Frameworks

Framework	Description
Oat	vLLM + DeepSpeed, modular, no ray
Tinker	SDK provided by Thinking Machines, frees you from infra issues
Verl	Support diverse backends, models, and algorithms
RL2	SGLang + FSDP, no ray, easy to hack
ROLL	Support diverse backends, models, and algorithms
OpenRLHF	Support diverse backends, models, and algorithms

Examples of training agents on GEM environments with all above frameworks can be found in here!

Supported Algorithms

Algorithm	Description
REINFORCE	A general policy gradient algorithm that can be applied to single- and multi-turn environments
GRPO	Mainly for bandits (single-turn), using group advantage normalization
PPO	Learns a turn-level critic to compute generalized advantage estimation (GAE)
REINFORCE + ReBN	REINFORCE with return batch normalization as introduced in our paper

Please check out our paper for a more detailed description for each algorithm and empirical results showing their tradeoffs.

Contributing

We welcome all forms of contribution — from adding new environments to integrating additional training frameworks. We're planning to write a community-driven technical report, and major contributors will be recognized with authorship. Join discord to discuss more!

Acknowledgement

This work is supported by Sea AI Lab for computing resources.
Our code learns from and builds on several awesome projects such as gym, rllm, TextArena, Search-R1, ReasoningGym.
The training example code is built on Oat, Tinker, Verl, RL2, ROLL, OpenRLHF.

Citation

If you find our works useful for your research, please consider citing:

GEM paper (please prioritize citing the paper unless you believe the blog is a better fit):

@article{liu2025gem,
  title={GEM: A Gym for Agentic LLMs},
  author={Liu, Zichen and Sims, Anya and Duan, Keyu and Chen, Changyu and Yu, Simon and Zhou, Xiangxin and Xu, Haotian and Xiong, Shaopan and Liu, Bo and Tan, Chenmien and others},
  journal={arXiv preprint arXiv:2510.01051},
  year={2025}
}

GEM blog:

@misc{liu2025gemblog,
  title={GEM: A Gym for Generalist LLMs},
  author={Liu, Zichen and Sims, Anya and Duan, Keyu and Chen, Changyu and Yang, Diyi and Lee, Wee Sun and Lin, Min},
  year={2025},
  howpublished={\url{https://axon-rl.notion.site/gem}},
  note={Notion Blog},
}

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
examples		examples
gem		gem
tests		tests
.gitignore		.gitignore
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌍 GEM: A Gym for Agentic LLMs

Overview

Links

Installation

Interface

Features

Supported Tasks

Supported Tools

Supported Frameworks

Supported Algorithms

Contributing

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

License

Scuffed-AI/gem

Folders and files

Latest commit

History

Repository files navigation

🌍 GEM: A Gym for Agentic LLMs

Overview

Links

Installation

Interface

Features

Supported Tasks

Supported Tools

Supported Frameworks

Supported Algorithms

Contributing

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages