Skip to content

routine: external-content-layer (2026-05-26)#15

Draft
jim4226 wants to merge 1 commit into
mainfrom
claude/daily-2026-05-26-external-content-layer
Draft

routine: external-content-layer (2026-05-26)#15
jim4226 wants to merge 1 commit into
mainfrom
claude/daily-2026-05-26-external-content-layer

Conversation

@jim4226
Copy link
Copy Markdown
Owner

@jim4226 jim4226 commented May 26, 2026

Summary

Adds ExternalContentScanner to csis/safety/ — a named call site for Layer 3 of the three-layer containment model, treating incoming tool results and connector payloads as adversarial before the agent processes them.

Source

Theme

Theme 3 — Constitutional / safety primitives (also Theme 6 — Substrate / capability boundaries). CSIS's safety stack had named modules for Layers 1 and 2 but not Layer 3. The article makes explicit that approved connectors remain an attack surface: an agent that only scans its own outputs can still be compromised through what it receives.

What changed

  • csis/safety/external_content.py (new, ~100 LOC): ExternalContentResult frozen dataclass + ExternalContentScanner class with check(), check_many(), any_dirty(). Uses scan_text_no_history so Layer 3 scans don't write to the operator-visible _fired_history deque (F11: don't leak the firing to the agent that caused the content to arrive).
  • tests/test_safety.py: 5 new tests — clean case, exfil detection, no-history-pollution invariant, check_many multi-source, any_dirty false negative guard.

No cycle-9 chokepoints touched

ExternalContentScanner is a new additive module. It does not touch Coordinator.__init__, _BackendTracker, writer_iteration_id, or the promotion CAS. Wiring it into the Coordinator loop at the tool-result boundary is a Phase-1 step (P1.7 sandbox subprocess); this PR establishes the named abstraction so that wiring is a one-line call site addition.

Test plan

python -m pytest tests/test_safety.py -v    # 11 passed (6 before + 5 new)
python -m pytest tests/ -q                  # 255 passed, 0 failed

Generated by Claude Code

Anthropic's engineering post "How we contain Claude across products"
(May 25, 2026) formalises three containment layers. CSIS already
implements Layer 1 (capability tiers in substrate/) and Layer 2
(tripwires + constitution in safety/). Layer 3 — treating incoming
tool results, API responses, and connector payloads as adversarial
by default — had no named call site.

ExternalContentScanner wraps the existing Tripwires.scan_text_no_history
path so Layer 3 scans don't inflate operator-visible history (which
tracks agent-produced output, not external content). The F11 requirement
(don't leak the firing to the agent) is preserved: callers roll back and
log; the agent sees only the rollback outcome.

Five regression tests confirm clean/dirty cases and the no-history
invariant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants