routine: external-content-layer (2026-05-26)#15
Draft
jim4226 wants to merge 1 commit into
Draft
Conversation
Anthropic's engineering post "How we contain Claude across products" (May 25, 2026) formalises three containment layers. CSIS already implements Layer 1 (capability tiers in substrate/) and Layer 2 (tripwires + constitution in safety/). Layer 3 — treating incoming tool results, API responses, and connector payloads as adversarial by default — had no named call site. ExternalContentScanner wraps the existing Tripwires.scan_text_no_history path so Layer 3 scans don't inflate operator-visible history (which tracks agent-produced output, not external content). The F11 requirement (don't leak the firing to the agent) is preserved: callers roll back and log; the agent sees only the rollback outcome. Five regression tests confirm clean/dirty cases and the no-history invariant.
This was referenced May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
ExternalContentScannertocsis/safety/— a named call site for Layer 3 of the three-layer containment model, treating incoming tool results and connector payloads as adversarial before the agent processes them.Source
Theme
Theme 3 — Constitutional / safety primitives (also Theme 6 — Substrate / capability boundaries). CSIS's safety stack had named modules for Layers 1 and 2 but not Layer 3. The article makes explicit that approved connectors remain an attack surface: an agent that only scans its own outputs can still be compromised through what it receives.
What changed
csis/safety/external_content.py(new, ~100 LOC):ExternalContentResultfrozen dataclass +ExternalContentScannerclass withcheck(),check_many(),any_dirty(). Usesscan_text_no_historyso Layer 3 scans don't write to the operator-visible_fired_historydeque (F11: don't leak the firing to the agent that caused the content to arrive).tests/test_safety.py: 5 new tests — clean case, exfil detection, no-history-pollution invariant,check_manymulti-source,any_dirtyfalse negative guard.No cycle-9 chokepoints touched
ExternalContentScanneris a new additive module. It does not touchCoordinator.__init__,_BackendTracker,writer_iteration_id, or the promotion CAS. Wiring it into the Coordinator loop at the tool-result boundary is a Phase-1 step (P1.7 sandbox subprocess); this PR establishes the named abstraction so that wiring is a one-line call site addition.Test plan
Generated by Claude Code