Skip to content

feat: Sandbox-as-wrapper pattern — security proxy for untrusted agents #10

@rdwj

Description

@rdwj

Motivation

The current sandbox pattern is sandbox as tool: the agent decides when to execute code in the sandbox. This addresses the case where the user asks the agent to compute something.

A complementary pattern is sandbox as wrapper: the sandbox sits in front of the agent and validates everything the agent tries to do. This addresses the case where the agent itself is untrusted (prompt injection, multi-agent delegation, arbitrary tool calling).

Threat models addressed

  • Agent generates and runs arbitrary code as part of reasoning — wrapper validates before execution
  • Agent calls external APIs — wrapper blocks exfiltration, enforces URL allowlists
  • Multi-agent delegation — agent A delegates to agent B, wrapper sandboxes B's capabilities transparently
  • Prompt injection — a document tricks the agent into malicious tool calls, wrapper catches them

Proposed architecture

User → Gateway → Sandbox Wrapper → Agent → LLM
                 (validates)      → Agent → tool call (inspected)
                 (validates)      ← Agent ← response (inspected)

The wrapper is a reverse proxy that understands the OpenAI tool-calling protocol (SSE streaming). It intercepts tool_calls chunks, validates arguments against a configurable policy, and either forwards or blocks. Also inspects final responses for data exfiltration patterns.

Design considerations

  • Could be a separate mode in code-sandbox (MODE=wrapper vs MODE=executor)
  • Policy configuration: YAML-based tool allowlists, argument pattern rules, blocked operations
  • Should be transparent to the agent — no agent code changes required
  • Relationship to BaseAgent's in-process ToolInspector: the wrapper is defense-in-depth for cases where in-process inspection could be bypassed
  • Deployable as a sidecar (intercepts localhost traffic) or standalone proxy

Scope

This is a new capability, not a modification of the existing code execution sandbox. The current tool pattern continues to work as-is. The wrapper pattern is additive.

References

  • Existing SecurityConfig.tool_inspection in BaseAgent (in-process, bypassable)
  • DefenseClaw taxonomy (informed the original sandbox guardrail design)
  • docs/sandbox-alternatives-evaluation.md for prior art analysis

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions