Hi team, thanks for the awesome benchmark! Really enjoying working with it.
I had a few questions about how mock services are set up — just want to make sure I'm understanding the design correctly:
-
Mock service source code visible to agents — I noticed that mock_services/server.py ends up in the agent's workspace after setup. We've seen some models read the source and try to start the services themselves (causing port conflicts). Was this intentional, or would it make sense to hide the implementation from agents?
-
Missing tmp/ in task_4 and task_6 — In category3, their warmup scripts reference /tmp_workspace/tmp/messages.json, but there's no tmp/ directory in the workspace. Just wondering if this is intentional or an oversight?
Appreciate any clarification. Thanks again for the great work!