daemon leaks pty masters — long-running daemon exhausts macOS pty pool ("out of pty devices")

## Summary

A long-running daemon accumulates pty master fds it never closes. On macOS the default pool is `kern.tty.ptmx_max = 511`; once the daemon's leaked masters (plus other apps) hit that cap, every `taskmux start <task>` fails with:

```
Error: out of pty devices
```

and no stopped task can be restarted anywhere — across all registered projects.

## Evidence (v0.9.10, macOS 24.6.0 / Darwin)

Daemon up for a few hours, 18 projects registered, ~20 tasks actually running:

```
$ ls /dev/ttys* | wc -l
527
$ sysctl kern.tty.ptmx_max
kern.tty.ptmx_max: 511
$ lsof /dev/ptmx | awk '{print $1, $2}' | sort | uniq -c | sort -rn | head
 498 python3.1 76376     <- taskmux daemon
  59 iTerm2 27729
  10 iTermServ 27773
```

**498 pty masters held for ~20 live tasks.** Immediately after `sudo taskmux daemon restart`, the fresh daemon (same workload, all auto_start tasks back up) holds **23** — so ~475 of those fds were leaked, not in use:

```
  23 python3.1 90453     <- daemon after restart
```

## Likely cause

Master fds aren't closed when a task exits/stops/restarts. Workloads with churny tasks (auto-restart loops, crashing dev servers, periodic `restart`) leak one master per task start, so uptime × churn eventually exhausts the pool. The supervisor's own restart cycles presumably leak too, which would explain reaching ~475 within hours.

## Impact / workaround

- Hard failure of `taskmux start`/`restart` for **all** projects once the pool is exhausted.
- Recovery requires `sudo taskmux daemon restart` (root needed for :443/:80 + resolver), bouncing every registered project's tasks.

## Related observation (cosmetic but bit us)

While a task's upstream is down, the HTTPS proxy answers requests with its own plain-text body:

```
taskmux: no upstream for pagecog.localhost hint: run `taskmux start <task>` for the host '@' in project 'pagecog'.
```

Any app code that surfaces fetch-error/response text verbatim ends up showing that internal hint string in its UI (we shipped exactly that into a wizard error banner before sanitizing). Suggest serving it as a proper `502` with `content-type: text/html` error page (and maybe an `x-taskmux: 1` header) so app-level error handling can distinguish proxy chrome from upstream responses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

daemon leaks pty masters — long-running daemon exhausts macOS pty pool ("out of pty devices") #3

Summary

Evidence (v0.9.10, macOS 24.6.0 / Darwin)

Likely cause

Impact / workaround

Related observation (cosmetic but bit us)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

daemon leaks pty masters — long-running daemon exhausts macOS pty pool ("out of pty devices") #3

Description

Summary

Evidence (v0.9.10, macOS 24.6.0 / Darwin)

Likely cause

Impact / workaround

Related observation (cosmetic but bit us)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions