feat: Sandbox-as-wrapper pattern — security proxy for untrusted agents

## Motivation

The current sandbox pattern is **sandbox as tool**: the agent decides when to execute code in the sandbox. This addresses the case where the user asks the agent to compute something.

A complementary pattern is **sandbox as wrapper**: the sandbox sits in front of the agent and validates everything the agent tries to do. This addresses the case where **the agent itself is untrusted** (prompt injection, multi-agent delegation, arbitrary tool calling).

## Threat models addressed

- Agent generates and runs arbitrary code as part of reasoning — wrapper validates before execution
- Agent calls external APIs — wrapper blocks exfiltration, enforces URL allowlists
- Multi-agent delegation — agent A delegates to agent B, wrapper sandboxes B's capabilities transparently
- Prompt injection — a document tricks the agent into malicious tool calls, wrapper catches them

## Proposed architecture

```
User → Gateway → Sandbox Wrapper → Agent → LLM
                 (validates)      → Agent → tool call (inspected)
                 (validates)      ← Agent ← response (inspected)
```

The wrapper is a reverse proxy that understands the OpenAI tool-calling protocol (SSE streaming). It intercepts `tool_calls` chunks, validates arguments against a configurable policy, and either forwards or blocks. Also inspects final responses for data exfiltration patterns.

## Design considerations

- Could be a separate mode in code-sandbox (`MODE=wrapper` vs `MODE=executor`)
- Policy configuration: YAML-based tool allowlists, argument pattern rules, blocked operations
- Should be transparent to the agent — no agent code changes required
- Relationship to BaseAgent's in-process `ToolInspector`: the wrapper is defense-in-depth for cases where in-process inspection could be bypassed
- Deployable as a sidecar (intercepts localhost traffic) or standalone proxy

## Scope

This is a new capability, not a modification of the existing code execution sandbox. The current tool pattern continues to work as-is. The wrapper pattern is additive.

## References

- Existing `SecurityConfig.tool_inspection` in BaseAgent (in-process, bypassable)
- DefenseClaw taxonomy (informed the original sandbox guardrail design)
- `docs/sandbox-alternatives-evaluation.md` for prior art analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Sandbox-as-wrapper pattern — security proxy for untrusted agents #10

Motivation

Threat models addressed

Proposed architecture

Design considerations

Scope

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Sandbox-as-wrapper pattern — security proxy for untrusted agents #10

Description

Motivation

Threat models addressed

Proposed architecture

Design considerations

Scope

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions