You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "Sidebar security stack" documented in CLAUDE.md describes a layered ML defense (L4 testsavant + L4b Haiku transcript + L4c DeBERTa) hosted in sidebar-agent.ts. CLAUDE.md "Sidebar architecture" elsewhere documents that sidebar-agent.ts was ripped when the PTY proved out (the chat-queue path is gone, /sidebar-command and /sidebar-chat and /sidebar-agent/event endpoints are gone).
The result is two surfaces that the table no longer matches:
browse/src/security-classifier.ts is unreferenced from browse/src/.grep -rn "from.*security-classifier" browse/src/ returns zero hits. The file is loaded only by browse/test/*.test.ts. None of loadTestsavant, scanPageContent, scanPageContentDeberta, combineVerdict runs against any production data path on 7b4738b.
The new window.gstackInjectToTerminal(text) PTY path runs no L1-L3 either. Per CLAUDE.md "Cross-pane PTY injection," the toolbar Cleanup button and the Inspector "Send to Code" action pipe text directly to the live claude REPL. That text is page-derived (i.e., influenced by whatever site the operator was inspecting). It does not pass through wrapUntrustedPageContent / markHiddenElements / the URL blocklist / the canary-injection step. The L1-L3 module (content-security.ts) still ships, but nothing on the PTY-injection path calls it.
browse/src/server.ts:1165 continues to surface security: getSecurityStatus() on /health, and getSecurityStatus() (security.ts:582-604) reports layers: { testsavant, transcript, canary } from ~/.gstack/security/session-state.json. With sidebar-agent gone, nothing writes that file in the new architecture, so the layers shown to whatever consumer reads /health reflect whatever the last sidebar-agent run wrote — possibly months ago, possibly empty, possibly 'off' for everything.
Concrete exploit chain (HIGH)
Operator opens an attacker-controlled page in the gstack browser.
Page contains a prompt-injection payload in DOM text, hidden elements, ARIA labels, or simply prose.
Operator clicks Inspector → "Send to Code" or the toolbar Cleanup button.
Page-derived text reaches the live claude REPL via gstackInjectToTerminal with no envelope wrap, no hidden-strip, no classifier scan, no canary check.
The REPL executes the injected prompt as if user-typed.
The chain is simpler than the L1-L6 stack was supposed to make it. The documented mitigation is the load-bearing defense. The fact that none of it is wired is the bug.
Mechanical evidence
$ grep -rn "from.*security-classifier" browse/src/
$ grep -rn "import.*security-classifier" browse/src/
$ ls browse/src/sidebar-agent.ts
ls: browse/src/sidebar-agent.ts: No such file or directory
$ grep -rn "recordSkillUse" browse/src/ | grep -v "domain-skills.ts:"
$
(Last grep: zero callers of recordSkillUse(..., classifierFlagged: true) outside the module under test, confirming the L4 → flag_count → auto-promote chain is also broken — separately filed in #1369.)
Two acceptable shapes for a fix
Option A — re-wire the classifier:
Move L1-L3 envelope wrap and L4 / L4b scan into sidepanel-terminal.js before gstackInjectToTerminal reaches the PTY (extension-side call back to a daemon endpoint for ML, since onnxruntime-node can't load from bun --compile's temp extract dir per CLAUDE.md note).
Drop the testsavant / transcript / deberta layers from security.ts:getStatus() and from /health's security field.
Drop the "Sidebar security stack" table from CLAUDE.md and update "Sidebar architecture" / "Cross-pane PTY injection" to honestly describe the PTY path as the operator-trust surface it is.
Drop the GSTACK_SECURITY_OFF / GSTACK_SECURITY_ENSEMBLE env knobs and the model-cache paths from CLAUDE.md.
Either is fine. The current state — table claims defense, code provides none, /health reports defense as live — is what should not stand.
security: gate domain-skill auto-promote on classifier_score > 0 #1369 — security: gate domain-skill auto-promote on classifier_score > 0 — partial mitigation for one specific consequence of the dead classifier (without the gate, three benign uses promote any quarantined skill, including one authored under the influence of a poisoned page, into prompt context). Lands cleanly in either option above. If Option A is chosen, the gate re-opens automatically once L4 is rewired.
security: remove .svg from load-html extension allowlist #1153 (open since 2026-04-22) — .svg in load-html allowlist — escalates with this issue: an SVG payload in about:blank with no CSP is exactly the shape the L4 classifier was supposed to catch.
Summary
The "Sidebar security stack" documented in CLAUDE.md describes a layered ML defense (L4 testsavant + L4b Haiku transcript + L4c DeBERTa) hosted in
sidebar-agent.ts. CLAUDE.md "Sidebar architecture" elsewhere documents thatsidebar-agent.tswas ripped when the PTY proved out (the chat-queue path is gone,/sidebar-commandand/sidebar-chatand/sidebar-agent/eventendpoints are gone).The result is two surfaces that the table no longer matches:
browse/src/security-classifier.tsis unreferenced frombrowse/src/.grep -rn "from.*security-classifier" browse/src/returns zero hits. The file is loaded only bybrowse/test/*.test.ts. None ofloadTestsavant,scanPageContent,scanPageContentDeberta,combineVerdictruns against any production data path on7b4738b.The new
window.gstackInjectToTerminal(text)PTY path runs no L1-L3 either. Per CLAUDE.md "Cross-pane PTY injection," the toolbar Cleanup button and the Inspector "Send to Code" action pipe text directly to the liveclaudeREPL. That text is page-derived (i.e., influenced by whatever site the operator was inspecting). It does not pass throughwrapUntrustedPageContent/markHiddenElements/ the URL blocklist / the canary-injection step. The L1-L3 module (content-security.ts) still ships, but nothing on the PTY-injection path calls it.browse/src/server.ts:1165continues to surfacesecurity: getSecurityStatus()on/health, andgetSecurityStatus()(security.ts:582-604) reportslayers: { testsavant, transcript, canary }from~/.gstack/security/session-state.json. With sidebar-agent gone, nothing writes that file in the new architecture, so the layers shown to whatever consumer reads/healthreflect whatever the last sidebar-agent run wrote — possibly months ago, possibly empty, possibly'off'for everything.Concrete exploit chain (HIGH)
claudeREPL viagstackInjectToTerminalwith no envelope wrap, no hidden-strip, no classifier scan, no canary check.The chain is simpler than the L1-L6 stack was supposed to make it. The documented mitigation is the load-bearing defense. The fact that none of it is wired is the bug.
Mechanical evidence
(Last grep: zero callers of
recordSkillUse(..., classifierFlagged: true)outside the module under test, confirming the L4 → flag_count → auto-promote chain is also broken — separately filed in #1369.)Two acceptable shapes for a fix
Option A — re-wire the classifier:
sidepanel-terminal.jsbeforegstackInjectToTerminalreaches the PTY (extension-side call back to a daemon endpoint for ML, sinceonnxruntime-nodecan't load frombun --compile's temp extract dir per CLAUDE.md note).recordSkillUse(..., classifierFlagged: true)so the auto-promote gate (security: gate domain-skill auto-promote on classifier_score > 0 #1369) re-opens once L4 returns.Option B — delete + de-document:
browse/src/security-classifier.ts.testsavant/transcript/debertalayers fromsecurity.ts:getStatus()and from/health'ssecurityfield.GSTACK_SECURITY_OFF/GSTACK_SECURITY_ENSEMBLEenv knobs and the model-cache paths from CLAUDE.md.Either is fine. The current state — table claims defense, code provides none,
/healthreports defense as live — is what should not stand.Related PRs and findings
security: pass cwd to git via execFileSync, not interpolation through /bin/sh— separate finding, lands cleanly in either of the above options.security: gate domain-skill auto-promote on classifier_score > 0— partial mitigation for one specific consequence of the dead classifier (without the gate, three benign uses promote any quarantined skill, including one authored under the influence of a poisoned page, into prompt context). Lands cleanly in either option above. If Option A is chosen, the gate re-opens automatically once L4 is rewired..svginload-htmlallowlist — escalates with this issue: an SVG payload inabout:blankwith no CSP is exactly the shape the L4 classifier was supposed to catch.security-classifier.tsitself. If Option B is chosen, both can be closed as obsolete.Filed as an issue rather than a PR because the fix shape is a design call, not a one-line change.