Motivation
The current sandbox pattern is sandbox as tool: the agent decides when to execute code in the sandbox. This addresses the case where the user asks the agent to compute something.
A complementary pattern is sandbox as wrapper: the sandbox sits in front of the agent and validates everything the agent tries to do. This addresses the case where the agent itself is untrusted (prompt injection, multi-agent delegation, arbitrary tool calling).
Threat models addressed
- Agent generates and runs arbitrary code as part of reasoning — wrapper validates before execution
- Agent calls external APIs — wrapper blocks exfiltration, enforces URL allowlists
- Multi-agent delegation — agent A delegates to agent B, wrapper sandboxes B's capabilities transparently
- Prompt injection — a document tricks the agent into malicious tool calls, wrapper catches them
Proposed architecture
User → Gateway → Sandbox Wrapper → Agent → LLM
(validates) → Agent → tool call (inspected)
(validates) ← Agent ← response (inspected)
The wrapper is a reverse proxy that understands the OpenAI tool-calling protocol (SSE streaming). It intercepts tool_calls chunks, validates arguments against a configurable policy, and either forwards or blocks. Also inspects final responses for data exfiltration patterns.
Design considerations
- Could be a separate mode in code-sandbox (
MODE=wrapper vs MODE=executor)
- Policy configuration: YAML-based tool allowlists, argument pattern rules, blocked operations
- Should be transparent to the agent — no agent code changes required
- Relationship to BaseAgent's in-process
ToolInspector: the wrapper is defense-in-depth for cases where in-process inspection could be bypassed
- Deployable as a sidecar (intercepts localhost traffic) or standalone proxy
Scope
This is a new capability, not a modification of the existing code execution sandbox. The current tool pattern continues to work as-is. The wrapper pattern is additive.
References
- Existing
SecurityConfig.tool_inspection in BaseAgent (in-process, bypassable)
- DefenseClaw taxonomy (informed the original sandbox guardrail design)
docs/sandbox-alternatives-evaluation.md for prior art analysis
Motivation
The current sandbox pattern is sandbox as tool: the agent decides when to execute code in the sandbox. This addresses the case where the user asks the agent to compute something.
A complementary pattern is sandbox as wrapper: the sandbox sits in front of the agent and validates everything the agent tries to do. This addresses the case where the agent itself is untrusted (prompt injection, multi-agent delegation, arbitrary tool calling).
Threat models addressed
Proposed architecture
The wrapper is a reverse proxy that understands the OpenAI tool-calling protocol (SSE streaming). It intercepts
tool_callschunks, validates arguments against a configurable policy, and either forwards or blocks. Also inspects final responses for data exfiltration patterns.Design considerations
MODE=wrappervsMODE=executor)ToolInspector: the wrapper is defense-in-depth for cases where in-process inspection could be bypassedScope
This is a new capability, not a modification of the existing code execution sandbox. The current tool pattern continues to work as-is. The wrapper pattern is additive.
References
SecurityConfig.tool_inspectionin BaseAgent (in-process, bypassable)docs/sandbox-alternatives-evaluation.mdfor prior art analysis