Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 378 Bytes

File metadata and controls

5 lines (3 loc) · 378 Bytes

This repository is about benchmarking models with our custom agent harness, which gives the agent a sandbox (container) that the agent can use. It has several tools in it, like browser, terminal, etc.

We use uv for the project, so if you want to add a new package, you can use uv commands like uv add [package].

Try to keep things simple while mainintaing correctness.