Skip to content

security: documented sidebar security stack does not match shipped architecture; PTY-injection path bypasses every classifier layer #1370

@garagon

Description

@garagon

Summary

The "Sidebar security stack" documented in CLAUDE.md describes a layered ML defense (L4 testsavant + L4b Haiku transcript + L4c DeBERTa) hosted in sidebar-agent.ts. CLAUDE.md "Sidebar architecture" elsewhere documents that sidebar-agent.ts was ripped when the PTY proved out (the chat-queue path is gone, /sidebar-command and /sidebar-chat and /sidebar-agent/event endpoints are gone).

The result is two surfaces that the table no longer matches:

  1. browse/src/security-classifier.ts is unreferenced from browse/src/. grep -rn "from.*security-classifier" browse/src/ returns zero hits. The file is loaded only by browse/test/*.test.ts. None of loadTestsavant, scanPageContent, scanPageContentDeberta, combineVerdict runs against any production data path on 7b4738b.

  2. The new window.gstackInjectToTerminal(text) PTY path runs no L1-L3 either. Per CLAUDE.md "Cross-pane PTY injection," the toolbar Cleanup button and the Inspector "Send to Code" action pipe text directly to the live claude REPL. That text is page-derived (i.e., influenced by whatever site the operator was inspecting). It does not pass through wrapUntrustedPageContent / markHiddenElements / the URL blocklist / the canary-injection step. The L1-L3 module (content-security.ts) still ships, but nothing on the PTY-injection path calls it.

browse/src/server.ts:1165 continues to surface security: getSecurityStatus() on /health, and getSecurityStatus() (security.ts:582-604) reports layers: { testsavant, transcript, canary } from ~/.gstack/security/session-state.json. With sidebar-agent gone, nothing writes that file in the new architecture, so the layers shown to whatever consumer reads /health reflect whatever the last sidebar-agent run wrote — possibly months ago, possibly empty, possibly 'off' for everything.

Concrete exploit chain (HIGH)

  1. Operator opens an attacker-controlled page in the gstack browser.
  2. Page contains a prompt-injection payload in DOM text, hidden elements, ARIA labels, or simply prose.
  3. Operator clicks Inspector → "Send to Code" or the toolbar Cleanup button.
  4. Page-derived text reaches the live claude REPL via gstackInjectToTerminal with no envelope wrap, no hidden-strip, no classifier scan, no canary check.
  5. The REPL executes the injected prompt as if user-typed.

The chain is simpler than the L1-L6 stack was supposed to make it. The documented mitigation is the load-bearing defense. The fact that none of it is wired is the bug.

Mechanical evidence

$ grep -rn "from.*security-classifier" browse/src/
$ grep -rn "import.*security-classifier" browse/src/
$ ls browse/src/sidebar-agent.ts
ls: browse/src/sidebar-agent.ts: No such file or directory
$ grep -rn "recordSkillUse" browse/src/ | grep -v "domain-skills.ts:"
$

(Last grep: zero callers of recordSkillUse(..., classifierFlagged: true) outside the module under test, confirming the L4 → flag_count → auto-promote chain is also broken — separately filed in #1369.)

Two acceptable shapes for a fix

Option A — re-wire the classifier:

  • Move L1-L3 envelope wrap and L4 / L4b scan into sidepanel-terminal.js before gstackInjectToTerminal reaches the PTY (extension-side call back to a daemon endpoint for ML, since onnxruntime-node can't load from bun --compile's temp extract dir per CLAUDE.md note).
  • Re-add a caller for recordSkillUse(..., classifierFlagged: true) so the auto-promote gate (security: gate domain-skill auto-promote on classifier_score > 0 #1369) re-opens once L4 returns.
  • Keep the security stack table in CLAUDE.md.

Option B — delete + de-document:

  • Remove browse/src/security-classifier.ts.
  • Drop the testsavant / transcript / deberta layers from security.ts:getStatus() and from /health's security field.
  • Drop the "Sidebar security stack" table from CLAUDE.md and update "Sidebar architecture" / "Cross-pane PTY injection" to honestly describe the PTY path as the operator-trust surface it is.
  • Drop the GSTACK_SECURITY_OFF / GSTACK_SECURITY_ENSEMBLE env knobs and the model-cache paths from CLAUDE.md.

Either is fine. The current state — table claims defense, code provides none, /health reports defense as live — is what should not stand.

Related PRs and findings

Filed as an issue rather than a PR because the fix shape is a design call, not a one-line change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions