Skip to content

v0.9.15: live escape hatch, self-explaining blocks, session-wedge FP fix#10

Closed
codehippie1 wants to merge 6 commits into
release/v0.9.14from
release/v0.9.15
Closed

v0.9.15: live escape hatch, self-explaining blocks, session-wedge FP fix#10
codehippie1 wants to merge 6 commits into
release/v0.9.14from
release/v0.9.15

Conversation

@codehippie1

@codehippie1 codehippie1 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

A follow-up shaped entirely by live dogfooding: one false-positive class could wedge a whole session, blocks didn't explain themselves well enough to judge, the escape hatch in the block message didn't actually work on a running daemon, and status surfaces showed misleading numbers in degraded states.

Fixed

  • A secret-shaped token in conversation history no longer wedges the session. Security data checks (credentials, cards, SSNs) now run only inside tool-call arguments — the agent action — never on prose or resent conversation history. Clients resend the full conversation every turn, so a key-shaped string merely quoted or discussed (e.g. an example key in a conversation summary) used to 403 every subsequent request until the session was abandoned. A credential inside a tool call — the real exfiltration vector — still blocks, on both the LLM and MCP paths.
  • Subscribers never see a notional dollar figure where the plan segment belongs. Stale readings (idle, or the proxy was briefly down) and fresh readings whose window reset behind the user both render last-known plan headroom marked ~ … idle instead of dropping to a session-cost figure that reads as real money. status frames a subscriber's spend as notional, not a budget breach.

Added

  • A live escape hatch: burnwall pause / resume / allow-once. A small auto-expiring state file the proxy checks per request, so protection flips on the running daemon — no daemon restart, no AI-tool restart, the agent session survives. allow-once lets exactly the next request through (atomic consume), then protection restores itself; pause [30s|5m|2h] is bounded (default 5m, capped 24h) and self-restores on expiry. The previous remediation — an environment variable plus a tool restart — never reached a backgrounded daemon and has been removed from all block messages, the init/enable-routing next-steps, and SECURITY.md.

Changed

  • Blocks explain themselves. A security block names the tool that tripped it, shows a masked recognisable preview of what matched (head/tail only, middle redacted — the raw value is never echoed or logged), and states why that rule class exists.
  • Degraded states look degraded. A pause renders a loud countdown chip on every surface for its whole window; a dead proxy drops the stale cost/plan/today segments and shows only the warning plus tool-reported gauges. status --json carries the pause state for the editor extension.

Validation

  • 648 Rust tests + extension tests pass; clippy clean across all targets (-D warnings).
  • New own-process integration suite covers pause/resume, allow-once single-use atomicity, and expired-state fail-closed; regression tests pin the history-secret pass/tool-call block split, masked previews, and every surface state.
  • Verified end-to-end against the running daemon: block → allow-once → retry passes → auto-restore re-blocks; pause/resume round-trip with live surface readouts.
  • Includes a one-time repository-wide rustfmt (2024 style edition) pass as a separate commit, keeping the feature commits readable and CI's format check green.

Base branch is the previous release line.

Formatting-only churn the current stable rustfmt applies under the 2024
style edition (import ordering, assert!/builder reflow). Committed
separately so the feature commits that follow stay readable. No
behavioral change.
…ining

A key-shaped token sitting in resent conversation history or prose
(system prompt, chat text, tool results, a /compact summary) no longer
403s the session. Clients resend the whole conversation every turn, so
a secret/DLP hit in settled text re-blocked every request until the
session was abandoned -- a live dogfooding wedge where an innocent
one-line question was rejected because the conversation merely
discussed an example AWS key. Data checks (secrets, cards, SSNs) now
follow the same latest-turn scoping as the command checks: they fire
only inside the in-flight tool round, the agent ACTION surface. A
credential inside a tool call -- the real exfiltration vector -- still
blocks, in both the LLM and MCP paths.

Blocks also explain themselves now: the violation carries the
originating tool name and, for secret/DLP hits, a masked recognisable
preview (AKIA...LKEY -- head/tail only, middle redacted). The raw value
is never echoed to logs or storage; the preview rides only in the 403
body to the local client. Each block states the one-line rationale for
its rule class, so a block reads as a reasoned decision instead of an
opaque refusal.
A small auto-expiring state file (~/.burnwall/pause.json) the proxy
checks per request, so protection can be paused and restored on the
RUNNING daemon: no daemon restart, no AI-tool restart, the agent
session and its context survive. The previous remediation -- set an
environment variable and restart the AI tool -- set the variable in the
tool shell, which a backgrounded daemon never sees; it cost the user
their session to discover it did nothing.

- `burnwall allow-once`: exactly the next request relays unchecked
  (the file delete is the atomic claim, so concurrent requests cannot
  double-spend it), then protection restores itself. Unused, it
  expires after 10 minutes.
- `burnwall pause [30s|5m|2h]`: bounded relay window, default 5m,
  capped at 24h. `burnwall resume` restores early; expiry restores
  automatically and self-cleans the file. Garbage or expired state
  fails closed (protection on).
- Fast path cost: one stat() of an absent file per request.
- Block remedies across all five block types now point at the runtime
  toggles, escalating inspect -> allow-once -> narrow -> pause -> stop.

End-to-end tests live in their own test binary: the proxy_test binary
flips the process-global BURNWALL_BYPASS env var, which would race the
pause assertions in shared-process runs.
- A known subscriber never sees a notional dollar figure where the
  plan segment belongs. Once any plan snapshot exists, the status line
  and `watch` stay in plan mode: fresh readings show live headroom
  with a countdown; stale readings (idle >12h, or the proxy was down)
  and fresh readings whose binding window reset behind the user both
  render last-known headroom marked `~ ... idle` -- no live countdown,
  no throttle claim. `status` frames a subscriber dollar figure as
  notional spend instead of a budget breach.
- A runtime pause renders loudly on every surface for its whole
  window: the ribbon shows a `PAUSED (unprotected)` chip with a
  countdown, `status` overrides the green heartbeat with the paused
  warning and the resume command, and `status --json` carries
  protection_paused / pause_resumes_in_secs for the editor extension.
- A down proxy looks down: when routing points at a dead port, the
  ribbon drops the cost/plan/today/block-count segments (nothing is
  being captured, so all of them would be stale) and keeps only the
  warning plus the tool-reported token and context gauges.
Version to 0.9.15 across the crate, extension, and MCP manifest.
CHANGELOG entry covers the session-wedge fix, self-explaining blocks,
the live escape hatch, and the surface honesty pass. README gains a
False positives section documenting allow-once / pause / resume /
report-bug. Registers the pause_test integration target.
init / enable-routing next-steps and SECURITY.md still advised
BURNWALL_BYPASS=1, which never reaches a running daemon (env is frozen
at spawn). Point all three at the live escape hatch instead.
@codehippie1

Copy link
Copy Markdown
Contributor Author

Superseded: this release branch was reconciled onto main via #12 (merge commit 6b9e53e). Every commit from this branch is contained in main (verified by patch-id), so there is nothing left to merge here. Closing as landed-upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant