Open harness for running, measuring, and visualizing agent benchmarks. Adapters for AutomationBench, τ-bench, LeRobot, WorkArena.
-
Updated
May 3, 2026 - TypeScript
Open harness for running, measuring, and visualizing agent benchmarks. Adapters for AutomationBench, τ-bench, LeRobot, WorkArena.
Snapshot your AI agent's world to OTel span events. Scrub the timeline, diff between runs.
Typed asset shapes + visual + headless views for AI agents. One asset definition. Three rendering targets (HTML / Markdown / Text).
Add a description, image, and links to the automationbench topic page so that developers can more easily learn about it.
To associate your repository with the automationbench topic, visit your repo's landing page and select "manage topics."