Problem
`python_tool()` currently defaults to `tier="local_unsafe"`, which executes model-generated Python code as a direct subprocess with no container isolation, no filesystem restrictions, no network restrictions, and no memory/CPU limits. The `CapabilityPolicy` fields that could theoretically constrain this behaviour are almost entirely unenforced at runtime (`ENFORCED_filesystem_read_roots = False`, `ENFORCED_network_access = False`, etc.) — meaning even the `local` tier (with a policy) provides minimal additional protection over `local_unsafe`.
Three concerns arise from this:
1. Default tier
Any developer who calls `python_tool()` without specifying a tier is running model-generated output as an unrestricted subprocess. The name `local_unsafe` is honest, but it is the default — a developer reading the examples would not necessarily recognise they are opting into unsafe execution rather than making a deliberate trust decision.
2. Unenforced CapabilityPolicy fields
`CapabilityPolicy` is documented as declarative-only for most fields, but this is easy to miss. A developer who passes `CapabilityPolicy(network_access=False)` gets no actual network restriction. This may create false confidence, particularly for developers moving from `local_unsafe` to `local` under the impression they are tightening security.
3. Example and documentation framing
The `python_tool_example.py` docstring describes `local_unsafe` as chosen "so no Docker daemon is required", framing the tier as a deployment convenience rather than a trust decision. The tools README recommends using `code_interpreter` for untrusted code but uses `local_code_interpreter` with an LLM in the quickstart example.
Open questions
- Should `python_tool()` require an explicit `tier` argument (no default), forcing developers to make a conscious trust decision at the call site?
- Should using `local_unsafe` as the default (i.e. when `tier` is not explicitly passed) emit a `warnings.warn(UserWarning)` to surface the risk at construction time?
- Should `CapabilityPolicy`'s unenforced fields be more prominently called out — either with a runtime warning on construction, or a prominent `.. warning::` block in the docstring?
- Should the example docs be reframed to make the trust decision explicit, independent of any API change?
Context
Identified during a security audit of the codebase at HEAD d5bef51. Both the tier system (#1171) and `python_tool()` (#1190) landed after the v0.6.0 release, so any change to the default would not affect published users and carries no backwards-compatibility obligation.
Problem
`python_tool()` currently defaults to `tier="local_unsafe"`, which executes model-generated Python code as a direct subprocess with no container isolation, no filesystem restrictions, no network restrictions, and no memory/CPU limits. The `CapabilityPolicy` fields that could theoretically constrain this behaviour are almost entirely unenforced at runtime (`ENFORCED_filesystem_read_roots = False`, `ENFORCED_network_access = False`, etc.) — meaning even the `local` tier (with a policy) provides minimal additional protection over `local_unsafe`.
Three concerns arise from this:
1. Default tier
Any developer who calls `python_tool()` without specifying a tier is running model-generated output as an unrestricted subprocess. The name `local_unsafe` is honest, but it is the default — a developer reading the examples would not necessarily recognise they are opting into unsafe execution rather than making a deliberate trust decision.
2. Unenforced CapabilityPolicy fields
`CapabilityPolicy` is documented as declarative-only for most fields, but this is easy to miss. A developer who passes `CapabilityPolicy(network_access=False)` gets no actual network restriction. This may create false confidence, particularly for developers moving from `local_unsafe` to `local` under the impression they are tightening security.
3. Example and documentation framing
The `python_tool_example.py` docstring describes `local_unsafe` as chosen "so no Docker daemon is required", framing the tier as a deployment convenience rather than a trust decision. The tools README recommends using `code_interpreter` for untrusted code but uses `local_code_interpreter` with an LLM in the quickstart example.
Open questions
Context
Identified during a security audit of the codebase at HEAD d5bef51. Both the tier system (#1171) and `python_tool()` (#1190) landed after the v0.6.0 release, so any change to the default would not affect published users and carries no backwards-compatibility obligation.