Security posture of python_tool(): default execution tier and unenforced CapabilityPolicy fields

## Problem

\`python_tool()\` currently defaults to \`tier=\"local_unsafe\"\`, which executes model-generated Python code as a direct subprocess with no container isolation, no filesystem restrictions, no network restrictions, and no memory/CPU limits. The \`CapabilityPolicy\` fields that could theoretically constrain this behaviour are almost entirely unenforced at runtime (\`ENFORCED_filesystem_read_roots = False\`, \`ENFORCED_network_access = False\`, etc.) — meaning even the \`local\` tier (with a policy) provides minimal additional protection over \`local_unsafe\`.

Three concerns arise from this:

**1. Default tier**
Any developer who calls \`python_tool()\` without specifying a tier is running model-generated output as an unrestricted subprocess. The name \`local_unsafe\` is honest, but it is the *default* — a developer reading the examples would not necessarily recognise they are opting into unsafe execution rather than making a deliberate trust decision.

**2. Unenforced CapabilityPolicy fields**
\`CapabilityPolicy\` is documented as declarative-only for most fields, but this is easy to miss. A developer who passes \`CapabilityPolicy(network_access=False)\` gets no actual network restriction. This may create false confidence, particularly for developers moving from \`local_unsafe\` to \`local\` under the impression they are tightening security.

**3. Example and documentation framing**
The \`python_tool_example.py\` docstring describes \`local_unsafe\` as chosen "so no Docker daemon is required", framing the tier as a deployment convenience rather than a trust decision. The tools README recommends using \`code_interpreter\` for untrusted code but uses \`local_code_interpreter\` with an LLM in the quickstart example.

## Open questions

- Should \`python_tool()\` require an **explicit \`tier\` argument** (no default), forcing developers to make a conscious trust decision at the call site?
- Should using \`local_unsafe\` as the default (i.e. when \`tier\` is not explicitly passed) emit a \`warnings.warn(UserWarning)\` to surface the risk at construction time?
- Should \`CapabilityPolicy\`'s unenforced fields be more prominently called out — either with a runtime warning on construction, or a prominent \`.. warning::\` block in the docstring?
- Should the example docs be reframed to make the trust decision explicit, independent of any API change?

## Context

Identified during a security audit of the codebase at HEAD d5bef51d. Both the tier system (#1171) and \`python_tool()\` (#1190) landed after the v0.6.0 release, so any change to the default would not affect published users and carries no backwards-compatibility obligation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security posture of python_tool(): default execution tier and unenforced CapabilityPolicy fields #1232

Problem

Open questions

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Security posture of python_tool(): default execution tier and unenforced CapabilityPolicy fields #1232

Description

Problem

Open questions

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions