Harbor is a framework for running agent evaluations and creating and using RL environments.
-
Updated
Apr 22, 2026 - Python
Harbor is a framework for running agent evaluations and creating and using RL environments.
Official Implementation of "CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion"
Spoox CLI - Terminal Agent - SPlit lOOp eXand agent
Trajectories for running OpenHands on Terminal Bench
reproducible Terminal-Bench task that evaluates a Bash script for parsing log files.
Multi-agent reasoning MCP server for Claude Code. Spawns parallel research agents to find knowledge LLMs don't have. +23.1% on Terminal Bench 2.0 SWE tasks.
Add a description, image, and links to the terminal-bench topic page so that developers can more easily learn about it.
To associate your repository with the terminal-bench topic, visit your repo's landing page and select "manage topics."