routine: external-content-layer (2026-05-26) by jim4226 · Pull Request #15 · jim4226/CSIS

jim4226 · 2026-05-26T23:20:02Z

Summary

Adds ExternalContentScanner to csis/safety/ — a named call site for Layer 3 of the three-layer containment model, treating incoming tool results and connector payloads as adversarial before the agent processes them.

Source

URL: https://www.anthropic.com/engineering/how-we-contain-claude
Published: May 25, 2026
Key quote: "Data can exfiltrate through approved domains using attacker-controlled credentials, requiring 'defensive man-in-the-middle' inspection of API traffic within sandboxes."

Theme

Theme 3 — Constitutional / safety primitives (also Theme 6 — Substrate / capability boundaries). CSIS's safety stack had named modules for Layers 1 and 2 but not Layer 3. The article makes explicit that approved connectors remain an attack surface: an agent that only scans its own outputs can still be compromised through what it receives.

What changed

csis/safety/external_content.py (new, ~100 LOC): ExternalContentResult frozen dataclass + ExternalContentScanner class with check(), check_many(), any_dirty(). Uses scan_text_no_history so Layer 3 scans don't write to the operator-visible _fired_history deque (F11: don't leak the firing to the agent that caused the content to arrive).
tests/test_safety.py: 5 new tests — clean case, exfil detection, no-history-pollution invariant, check_many multi-source, any_dirty false negative guard.

No cycle-9 chokepoints touched

ExternalContentScanner is a new additive module. It does not touch Coordinator.__init__, _BackendTracker, writer_iteration_id, or the promotion CAS. Wiring it into the Coordinator loop at the tool-result boundary is a Phase-1 step (P1.7 sandbox subprocess); this PR establishes the named abstraction so that wiring is a one-line call site addition.

Test plan

python -m pytest tests/test_safety.py -v    # 11 passed (6 before + 5 new)
python -m pytest tests/ -q                  # 255 passed, 0 failed

Generated by Claude Code

Anthropic's engineering post "How we contain Claude across products" (May 25, 2026) formalises three containment layers. CSIS already implements Layer 1 (capability tiers in substrate/) and Layer 2 (tripwires + constitution in safety/). Layer 3 — treating incoming tool results, API responses, and connector payloads as adversarial by default — had no named call site. ExternalContentScanner wraps the existing Tripwires.scan_text_no_history path so Layer 3 scans don't inflate operator-visible history (which tracks agent-produced output, not external content). The F11 requirement (don't leak the firing to the agent) is preserved: callers roll back and log; the agent sees only the rollback outcome. Five regression tests confirm clean/dirty cases and the no-history invariant.

This was referenced May 26, 2026

routine log: 2026-05-26 #17

Draft

routine log: 2026-05-28 #21

Draft

routine log: 2026-05-29 #24

Draft

routine log: 2026-05-30 #25

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

routine: external-content-layer (2026-05-26)#15

routine: external-content-layer (2026-05-26)#15
jim4226 wants to merge 1 commit into
mainfrom
claude/daily-2026-05-26-external-content-layer

jim4226 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jim4226 commented May 26, 2026

Summary

Source

Theme

What changed

No cycle-9 chokepoints touched

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants