security: documented sidebar security stack does not match shipped architecture; PTY-injection path bypasses every classifier layer

## Summary

The "Sidebar security stack" documented in CLAUDE.md describes a layered ML defense (L4 testsavant + L4b Haiku transcript + L4c DeBERTa) hosted in `sidebar-agent.ts`. CLAUDE.md "Sidebar architecture" elsewhere documents that `sidebar-agent.ts` was ripped when the PTY proved out (the chat-queue path is gone, `/sidebar-command` and `/sidebar-chat` and `/sidebar-agent/event` endpoints are gone).

The result is two surfaces that the table no longer matches:

1. **`browse/src/security-classifier.ts` is unreferenced from `browse/src/`.** `grep -rn "from.*security-classifier" browse/src/` returns zero hits. The file is loaded only by `browse/test/*.test.ts`. None of `loadTestsavant`, `scanPageContent`, `scanPageContentDeberta`, `combineVerdict` runs against any production data path on `7b4738b`.

2. **The new `window.gstackInjectToTerminal(text)` PTY path runs no L1-L3 either.** Per CLAUDE.md "Cross-pane PTY injection," the toolbar Cleanup button and the Inspector "Send to Code" action pipe text directly to the live `claude` REPL. That text is page-derived (i.e., influenced by whatever site the operator was inspecting). It does not pass through `wrapUntrustedPageContent` / `markHiddenElements` / the URL blocklist / the canary-injection step. The L1-L3 module (`content-security.ts`) still ships, but nothing on the PTY-injection path calls it.

`browse/src/server.ts:1165` continues to surface `security: getSecurityStatus()` on `/health`, and `getSecurityStatus()` (`security.ts:582-604`) reports `layers: { testsavant, transcript, canary }` from `~/.gstack/security/session-state.json`. With sidebar-agent gone, nothing writes that file in the new architecture, so the layers shown to whatever consumer reads `/health` reflect whatever the last sidebar-agent run wrote — possibly months ago, possibly empty, possibly `'off'` for everything.

## Concrete exploit chain (HIGH)

1. Operator opens an attacker-controlled page in the gstack browser.
2. Page contains a prompt-injection payload in DOM text, hidden elements, ARIA labels, or simply prose.
3. Operator clicks Inspector → "Send to Code" or the toolbar Cleanup button.
4. Page-derived text reaches the live `claude` REPL via `gstackInjectToTerminal` with no envelope wrap, no hidden-strip, no classifier scan, no canary check.
5. The REPL executes the injected prompt as if user-typed.

The chain is simpler than the L1-L6 stack was supposed to make it. The documented mitigation is the load-bearing defense. The fact that none of it is wired is the bug.

## Mechanical evidence

```
$ grep -rn "from.*security-classifier" browse/src/
$ grep -rn "import.*security-classifier" browse/src/
$ ls browse/src/sidebar-agent.ts
ls: browse/src/sidebar-agent.ts: No such file or directory
$ grep -rn "recordSkillUse" browse/src/ | grep -v "domain-skills.ts:"
$
```

(Last grep: zero callers of `recordSkillUse(..., classifierFlagged: true)` outside the module under test, confirming the L4 → flag_count → auto-promote chain is also broken — separately filed in #1369.)

## Two acceptable shapes for a fix

**Option A — re-wire the classifier:**
- Move L1-L3 envelope wrap and L4 / L4b scan into `sidepanel-terminal.js` before `gstackInjectToTerminal` reaches the PTY (extension-side call back to a daemon endpoint for ML, since `onnxruntime-node` can't load from `bun --compile`'s temp extract dir per CLAUDE.md note).
- Re-add a caller for `recordSkillUse(..., classifierFlagged: true)` so the auto-promote gate (#1369) re-opens once L4 returns.
- Keep the security stack table in CLAUDE.md.

**Option B — delete + de-document:**
- Remove `browse/src/security-classifier.ts`.
- Drop the `testsavant` / `transcript` / `deberta` layers from `security.ts:getStatus()` and from `/health`'s `security` field.
- Drop the "Sidebar security stack" table from CLAUDE.md and update "Sidebar architecture" / "Cross-pane PTY injection" to honestly describe the PTY path as the operator-trust surface it is.
- Drop the `GSTACK_SECURITY_OFF` / `GSTACK_SECURITY_ENSEMBLE` env knobs and the model-cache paths from CLAUDE.md.

Either is fine. The current state — table claims defense, code provides none, `/health` reports defense as live — is what should not stand.

## Related PRs and findings

- #1368 — `security: pass cwd to git via execFileSync, not interpolation through /bin/sh` — separate finding, lands cleanly in either of the above options.
- #1369 — `security: gate domain-skill auto-promote on classifier_score > 0` — partial mitigation for one specific consequence of the dead classifier (without the gate, three benign uses promote any quarantined skill, including one authored under the influence of a poisoned page, into prompt context). Lands cleanly in either option above. If Option A is chosen, the gate re-opens automatically once L4 is rewired.
- #1153 (open since 2026-04-22) — `.svg` in `load-html` allowlist — escalates with this issue: an SVG payload in `about:blank` with no CSP is exactly the shape the L4 classifier was supposed to catch.
- #1155, #1157 (open since 2026-04-22 / 2026-04-23) — both target `security-classifier.ts` itself. If Option B is chosen, both can be closed as obsolete.

Filed as an issue rather than a PR because the fix shape is a design call, not a one-line change.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security: documented sidebar security stack does not match shipped architecture; PTY-injection path bypasses every classifier layer #1370

Summary

Concrete exploit chain (HIGH)

Mechanical evidence

Two acceptable shapes for a fix

Related PRs and findings

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

security: documented sidebar security stack does not match shipped architecture; PTY-injection path bypasses every classifier layer #1370

Description

Summary

Concrete exploit chain (HIGH)

Mechanical evidence

Two acceptable shapes for a fix

Related PRs and findings

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions