Gym-V

A Unified Vision Environment System for Agentic Vision Research

Installation • Quick Start • Environments • Key Findings

Gym-V is a unified platform of 179 procedurally generated visual environments across 10 domains with controllable difficulty, built on a Gymnasium-compatible API. It unifies interactive training, offline supervision, and benchmark evaluation under one interface — enabling controlled experiments on vision-language agents that were previously infeasible across fragmented toolkits.

Highlights

179 environments spanning single-turn reasoning, multi-turn games, spatial navigation, and retro arcade games
Gymnasium-compatible API with multi-agent support, composable wrappers, and tool integration
Controllable difficulty via parametric generation with difficulty presets (levels 0, 1, 2)
Evaluation-as-a-Service with a distributed reward server (Ray Serve) supporting heterogeneous backends
Composable observation wrappers that make task representation an explicit experimental variable

Installation

# Basic installation
pip install -e .

# With optional environment groups
pip install -e ".[games]"       # Board/card games (TextArena, PettingZoo)
pip install -e ".[spatial]"     # 2D/3D navigation (MiniGrid, MiniWorld)
pip install -e ".[temporal]"    # Retro games (stable-retro)
pip install -e ".[vlmeval]"     # VLM evaluation benchmarks

# All optional dependencies
pip install -e ".[games,spatial,temporal,vlmeval,reasoning-gym]"

Quick Start

import gym_v

# Single-turn: observe an image, give an answer, receive a reward
env = gym_v.make("Arc/ArcAgi-v0")
obs, info = env.reset(seed=42)
# obs = {"agent_0": Observation(image=PIL.Image, text="...", metadata={})}
obs, reward, terminated, truncated, info = env.step({"agent_0": "[[0,1],[1,0]]"})
env.close()

# Multi-turn: interact with the environment over multiple steps
env = gym_v.make("Games/Chess-v0")
obs, info = env.reset(seed=0)
obs, reward, terminated, truncated, info = env.step({"agent_0": "e2e4"})
# Continue stepping until terminated["__all__"] or truncated["__all__"]
env.close()

Interactive Demo

python examples/demo.py --id "Games/TicTacToe-v0"

Architecture

gym_v/
├── core.py              # Env, Observation, Wrapper base classes
├── envs/
│   ├── registration.py  # register() / make() system
│   ├── single_turn/     # 125 single-step reasoning environments
│   ├── multi_turn/      # 74 interactive environments
│   │   ├── games/       #   Board, card & puzzle games
│   │   ├── spatial/     #   2D/3D navigation tasks
│   │   └── temporal/    #   Retro arcade games (stable-retro)
│   ├── offline/         # Generic JSONL dataset loader
│   └── eval/            # VLMEval & GenEval integration
├── wrappers/            # Composable observation/action wrappers
├── tools/               # Agent tool system (IPython, etc.)
└── utils/               # Image, seeding, rendering utilities

Key Findings

Using Gym-V, our experiments reveal several insights for training vision-language agents:

Observation scaffolding > RL algorithm choice. Captions, game rules, and interaction history determine whether learning succeeds at all — more so than the choice between GRPO, GSPO, or SAPO.
Diverse training generalizes; narrow training hurts. Cross-domain curricula transfer broadly, while training on a single domain can cause negative transfer. Multi-turn interaction amplifies both effects.
RL closes the gap. A 7B model trained with RL on Gym-V environments can surpass much larger models' zero-shot performance on several task categories.

For full results, see our paper.

License

This project is for research use. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
deploy		deploy
evaluation		evaluation
examples		examples
gym_v		gym_v
services/rewards		services/rewards
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
gymv.pdf		gymv.pdf
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gym-V

Highlights

Installation

Quick Start

Interactive Demo

Architecture

Key Findings

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gym-V

Highlights

Installation

Quick Start

Interactive Demo

Architecture

Key Findings

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages