Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,37 @@

All notable changes to Burnwall.

## [0.11.1] — 2026-06-18

A resilience release for the proxy lifecycle: stopping Burnwall can no longer
leave an already-running AI tool stranded on a dead port.

### Changed

- **`burnwall stop` no longer cuts off a tool that's still running.** By default
it now hands the proxy off to a pass-through relay and leaves the port serving,
so a tool mid-session keeps working instead of failing with a bare connection
error. The relay does no scanning, budget, or cost capture (protection is off)
and retires itself once traffic goes idle, freeing the port. Use
`burnwall stop --hard` to terminate immediately and free the port now.
- **Clearer recovery guidance.** When a shell is routed at a proxy that isn't
answering, `burnwall status` now points to `burnwall start` (revive — running
tools recover at once) and `burnwall recover` (go direct, then restart tools),
and shows a distinct "stopped (draining)" state instead of a misleading green.

### Added

- **Guard watchdog on by default.** `burnwall start --daemon` now also runs the
guard, which notices a silently-dead proxy within seconds and pauses shell
routing so new shells go direct (with a best-effort relaunch). Opt out with
`--no-guard`.

### Fixed

- The guard watched the configured port even when the proxy was started on a
different `--port`, so a non-default port could make it treat a healthy proxy
as dead. It now watches the proxy's actual port.

## [0.11.0] — 2026-06-18

A dashboard-polish release: clearer, more glanceable surfaces, plus two new
Expand Down
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "burnwall"
version = "0.11.0"
version = "0.11.1"
edition = "2024"
rust-version = "1.87"
description = "Local proxy for AI coding tools (Claude Code, Codex CLI, Aider): cache-aware cost tracking, path/command security checks, daily budget enforcement. Zero telemetry."
Expand Down
2 changes: 1 addition & 1 deletion editor/vscode/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "burnwall",
"displayName": "Burnwall",
"description": "Cost + security for your AI coding agents, at a glance — reads your local Burnwall CLI.",
"version": "0.11.0",
"version": "0.11.1",
"publisher": "intbot",
"license": "FSL-1.1-MIT",
"repository": { "type": "git", "url": "https://github.com/intbot/burnwall" },
Expand Down
2 changes: 1 addition & 1 deletion packaging/mcp/server.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"url": "https://github.com/intbot/burnwall",
"source": "github"
},
"version": "0.11.0",
"version": "0.11.1",
"packages": [
{
"registryType": "oci",
Expand Down
42 changes: 42 additions & 0 deletions src/bypass.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,12 @@ pub const DEFAULT_PAUSE_SECS: u64 = 5 * 60;
pub const MAX_PAUSE_SECS: u64 = 24 * 3600;
/// How long an unused allow-once stays armed before it expires.
pub const ALLOW_ONCE_TTL_SECS: u64 = 10 * 60;
/// Backstop expiry for a `Drain` (the relay a soft `burnwall stop` leaves
/// behind to keep already-running tools alive). The real teardown is the
/// proxy's idle-retire monitor; this is only a safety net so a drainer that
/// somehow never goes idle can't relay unchecked forever. A fresh `start`
/// also clears any stale drain on boot, so protection is never silently off.
pub const DRAIN_BACKSTOP_SECS: u64 = 12 * 3600;

/// On-disk shape. Tiny and stable: a mode tag plus an absolute expiry.
#[derive(Debug, Serialize, Deserialize)]
Expand All @@ -59,6 +65,11 @@ struct StateFile {
enum Mode {
Pause,
AllowOnce,
/// Soft-`stop` drain: relay everything unchecked, like `Pause`, but with no
/// auto-resume — the proxy is on its way out and only stays up to keep
/// already-running tools off a dead port. The proxy's idle-retire monitor
/// shuts it down once traffic stops; `DRAIN_BACKSTOP_SECS` is the safety net.
Drain,
}

/// The live bypass state, as the proxy and status surfaces see it.
Expand All @@ -71,6 +82,10 @@ pub enum Bypass {
/// The next request relays unchecked (consume-on-use), then protection
/// restores. Expires unused after the TTL.
AllowOnce { expires_in_secs: i64 },
/// A soft `burnwall stop` left the proxy up as a pure relay so
/// already-running tools don't hit a dead port. Relays unchecked; the
/// proxy retires itself once traffic goes idle. No auto-resume.
Draining,
}

/// Default state-file path (`<data dir>/pause.json`), `None` if no data dir
Expand Down Expand Up @@ -105,9 +120,17 @@ pub fn read_at(path: &Path, now: i64) -> Bypass {
Mode::AllowOnce => Bypass::AllowOnce {
expires_in_secs: remaining,
},
Mode::Drain => Bypass::Draining,
}
}

/// True if a drain (soft-stop relay) is currently in effect at the default
/// path. Used by `start` (to retire a stale drainer and take over the port)
/// and by the proxy's idle-retire monitor.
pub fn is_draining(now: i64) -> bool {
matches!(read(now), Bypass::Draining)
}

/// Read the bypass state at the default path.
pub fn read(now: i64) -> Bypass {
match default_path() {
Expand Down Expand Up @@ -135,6 +158,13 @@ pub fn arm_allow_once(now: i64) -> std::io::Result<i64> {
write_state(Mode::AllowOnce, now + ALLOW_ONCE_TTL_SECS as i64)
}

/// Enter drain (soft `burnwall stop`): the running proxy relays unchecked and
/// retires itself when idle. Backstopped at [`DRAIN_BACKSTOP_SECS`] so it can
/// never silently relay forever. Returns the expiry timestamp written.
pub fn drain(now: i64) -> std::io::Result<i64> {
write_state(Mode::Drain, now + DRAIN_BACKSTOP_SECS as i64)
}

/// Clear any pause / armed allow-once. `Ok(true)` if a file was removed.
pub fn clear() -> std::io::Result<bool> {
let Some(path) = default_path() else {
Expand Down Expand Up @@ -244,6 +274,18 @@ mod tests {
assert_eq!(read_at(&p, 1000), Bypass::None);
}

#[test]
fn drain_reads_as_draining_until_backstop() {
let p = temp_path("drain-active.json");
write_at(&p, Mode::Drain, 5000);
assert_eq!(read_at(&p, 1000), Bypass::Draining);
// Past the backstop it self-clears (protection restores) just like the
// other modes — a drainer can never relay unchecked forever.
write_at(&p, Mode::Drain, 1000);
assert_eq!(read_at(&p, 1000), Bypass::None);
assert!(!p.exists());
}

#[test]
fn expired_allow_once_is_none() {
let p = temp_path("allow-once-expired.json");
Expand Down
8 changes: 6 additions & 2 deletions src/cli/accuracy.rs
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,12 @@ fn write_table(w: &mut impl Write, r: &AccuracyReport) -> std::io::Result<()> {
let cards = [
Card::new("On-wire", &format!("${:.2}", r.total_real), "cache-aware")
.with_value_color(Color::Green),
Card::new("Naive tally", &format!("${:.2}", r.total_naive), "sticker rate")
.with_value_color(Color::Yellow),
Card::new(
"Naive tally",
&format!("${:.2}", r.total_naive),
"sticker rate",
)
.with_value_color(Color::Yellow),
Card::new(
"Overstated",
&format!("{:.0}%", pct),
Expand Down
127 changes: 122 additions & 5 deletions src/cli/daemon.rs
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,107 @@ pub fn running_pid() -> anyhow::Result<Option<u32>> {
}
}

/// Decide whether a fresh `start` may proceed. Returns `Some(pid)` if a
/// fully-protecting proxy is already running — the caller must refuse to start a
/// second one. Returns `None` if the path is clear: either nothing was running,
/// or a DRAIN-only relay (left by a soft `burnwall stop` to keep already-running
/// tools alive) was retired here to free the port. Shared by the foreground
/// `start` and the `--daemon` launcher so `stop` → `start` re-arms protection
/// instead of failing "already running".
pub fn protecting_proxy_blocking_start() -> anyhow::Result<Option<u32>> {
let Some(pid) = running_pid()? else {
return Ok(None);
};
if !crate::bypass::is_draining(chrono::Utc::now().timestamp()) {
return Ok(Some(pid)); // a real, protecting proxy — caller should bail
}
tracing::info!("retiring the draining proxy (PID {pid}) to start a protected one");
let _ = request_graceful_shutdown(pid);
let deadline = Instant::now() + Duration::from_secs(12);
while process_is_alive(pid) && Instant::now() < deadline {
std::thread::sleep(Duration::from_millis(100));
}
if process_is_alive(pid) {
let _ = terminate_process(pid);
}
remove_pid_file().ok();
clear_shutdown_file();
Ok(None)
}

// ───────────────────────── guard watchdog lifecycle ─────────────────────────
//
// `start --daemon` spawns a `burnwall guard` watchdog alongside the proxy
// (unless `--no-guard`). It outlives a proxy crash and auto-pauses routing so a
// silently-dead proxy (the classic Windows AV-quarantine case) can't keep
// stranding new shells. Tracked by its own PID file so `stop` can retire it and
// a second `start` doesn't stack duplicates.

/// Absolute path to the guard watchdog's PID file
/// (`<data dir>/burnwall.guard.pid`).
pub fn guard_pid_file_path() -> anyhow::Result<PathBuf> {
Ok(data_dir()
.context("locating the Burnwall data directory")?
.join("burnwall.guard.pid"))
}

/// PID of a live guard watchdog, if one is running. A file pointing at a dead
/// (or reused, non-burnwall) PID is stale — removed, and `None` returned.
pub fn running_guard_pid() -> anyhow::Result<Option<u32>> {
let path = guard_pid_file_path()?;
let contents = match fs::read_to_string(&path) {
Ok(c) => c,
Err(e) if e.kind() == std::io::ErrorKind::NotFound => return Ok(None),
Err(e) => return Err(e).with_context(|| format!("reading {}", path.display())),
};
match contents.trim().parse::<u32>() {
Ok(pid) if pid > 0 && process_is_alive(pid) => Ok(Some(pid)),
_ => {
let _ = fs::remove_file(&path);
Ok(None)
}
}
}

/// Spawn the guard watchdog as a detached process (if one isn't already
/// running) and record its PID. Best-effort restart of a crashed proxy is on
/// (`--restart`): the guard's primary action, pausing routing, always happens
/// first, so a quarantined binary fails the relaunch safely rather than
/// stranding shells. Returns the guard PID.
pub fn spawn_guard(port: u16) -> anyhow::Result<u32> {
if let Some(pid) = running_guard_pid()? {
return Ok(pid); // already watching
}
let exe = std::env::current_exe().context("locating the burnwall executable")?;
let pid = spawn_detached(
&exe,
&[
"guard".to_string(),
"--port".to_string(),
port.to_string(),
"--restart".to_string(),
],
)
.context("spawning the guard watchdog")?;
let path = guard_pid_file_path()?;
if let Some(parent) = path.parent() {
let _ = fs::create_dir_all(parent);
}
let _ = fs::write(&path, pid.to_string());
Ok(pid)
}

/// Retire the guard watchdog (called by `stop`): terminate it if running and
/// clear its PID file. Best-effort — a stop must never fail on guard cleanup.
pub fn stop_guard() {
if let Ok(Some(pid)) = running_guard_pid() {
let _ = terminate_process(pid);
}
if let Ok(path) = guard_pid_file_path() {
let _ = fs::remove_file(path);
}
}

/// How the previous daemon run ended, inferred at the next start.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum PriorExit {
Expand Down Expand Up @@ -170,7 +271,9 @@ pub fn note_clean_exit() {
/// Re-exec `burnwall start` (without `--daemon`) as a detached background
/// process, then wait for it to write its PID file before returning.
pub async fn spawn_background(args: &StartArgs) -> anyhow::Result<()> {
if let Some(pid) = running_pid()? {
// A fully-protecting proxy blocks a second start; a soft-stop drain relay is
// retired here so `stop` → `start --daemon` re-arms protection seamlessly.
if let Some(pid) = protecting_proxy_blocking_start()? {
anyhow::bail!(
"Burnwall is already running (PID {pid}). Use `burnwall stop` to stop it first."
);
Expand Down Expand Up @@ -204,6 +307,17 @@ pub async fn spawn_background(args: &StartArgs) -> anyhow::Result<()> {
resolved_port(args)
));
}
// Guard watchdog (default): outlives a proxy crash and auto-pauses
// routing so a silently-dead proxy can't keep stranding new shells.
// Opt out with `--no-guard`.
if !args.no_guard {
match spawn_guard(resolved_port(args)) {
Ok(gpid) => println!(
" Watchdog: guard running (PID {gpid}) — auto-recovers routing if the proxy dies."
),
Err(e) => tracing::warn!("could not start the guard watchdog: {e}"),
}
}
// Name the log file so a later crash is diagnosable (L-H2) —
// before this, a dead daemon left nothing to look at.
if let Some(log) = resolved_child_log_path() {
Expand Down Expand Up @@ -542,13 +656,16 @@ pub fn process_is_alive(pid: u32) -> bool {
#[cfg(unix)]
fn process_is_burnwall(pid: u32) -> bool {
// Linux: /proc/<pid>/exe symlink. macOS: no /proc — fall back to `ps`.
// Match against the FULL image path, not just the file name: the real
// binary's path always contains "burnwall" (its dir and/or file name), and
// this keeps the three platforms consistent — Windows checks the full image
// path and macOS's `ps -o comm=` returns the full path too. A bare file-name
// check diverged on Linux and read a binary launched from a burnwall checkout
// (e.g. the `daemon_test-*` integration runner) as "not burnwall".
#[cfg(target_os = "linux")]
{
match std::fs::read_link(format!("/proc/{pid}/exe")) {
Ok(p) => p
.file_name()
.map(|n| n.to_string_lossy().contains("burnwall"))
.unwrap_or(true),
Ok(p) => p.to_string_lossy().contains("burnwall"),
Err(_) => true,
}
}
Expand Down
Loading