perf: SSH connection churn + render/query inefficiencies (audited, batch fix)

## Summary

Audit of the app surfaced a cluster of efficiency issues. The biggest is **SSH connection churn**: `SSHKit.run/2` opens and closes a fresh SSH connection on *every* command (verified in `deps/sshkit/lib/sshkit.ex:313-318` — `connect → run → close`), and the wrapper docstring claiming per-process connection caching is **factually wrong**. `ConnectionCheck.probe/1` pays this 6x per server per tick.

We want to fix these in one clean slice, with a benchmark task so we can measure before/after.

---

## Verification harness (build this first)

Add a one-off mix task so we can measure without waiting on ticks:

```
mix mast.bench --server <host-or-id> [--runs N] [--worker connection_check|stats_collect|patch_scan|app_probe]
```

- Runs the chosen worker's probe path against one real server N times (default ~10).
- Prints **SSH call count** and **min / median / max wall time** per worker.
- Defaults to running all SSH-bound workers so one invocation gives a full picture.
- Lives under `lib/mix/tasks/mast.bench.ex`, dev/test env only, no Oban enqueue — call the probe functions directly so we measure the SSH path, not the queue.

Expected baseline (what we're fixing):
```
ConnectionCheck: 6 calls, median ~4s
StatsCollect:    1 call,  median ~0.9s   # already correct, use as the model
PatchScan:       2 calls
```

Acceptance: after the fixes below, `ConnectionCheck` reports **1 call** and a markedly lower median; rerun the task and paste output into the PR.

---

## Fixes (priority order)

### 1. ConnectionCheck: 6 SSH connections → 1 combined command  — HIGH
**`lib/mast/workers/connection_check.ex:56-62`**

`probe/1` issues 6 separate `SSH.run/2` calls (os-release, top, free, df, loadavg, nproc). Each is a full TCP+SSH handshake + DB key load + key decryption. `StatsCollect` already solves this: one command with `echo __MARKER__` delimiters, parsed by section (`lib/mast/workers/stats_collect.ex:29-36`). Mirror that pattern here.

- RED: stub-backed test asserting `probe/1` issues exactly **one** `SSH.run`. (Currently 6.)
- GREEN: combined heredoc command with markers; split output, route each section to the existing `OS.parse` / `Metrics.parse_*` calls.
- Keep the `with`-style error handling: a non-zero exit on the combined command still marks the server unreachable.

### 2. Correct the SSHKit docstring — HIGH (do alongside #1)
**`lib/mast/ssh/sshkit.ex:6-9`**

The "Connection caching: SSHKit opens a connection on first use within the calling process... yields connection reuse" claim is false. `SSHKit.run/2` connects and closes per call. Rewrite to state that each `run/2` is a fresh connection, so callers must batch multiple commands into a single command string (and point at StatsCollect as the reference). This misled the audit itself.

### 3. PatchScan: 2 connections → 1 — MED
**`lib/mast/workers/patch_scan.ex:88-89`**

`apt-get update` and `apt list --upgradable` run as two `SSH.run/2` calls. Collapse to `apt-get update && apt list --upgradable` (preserve `LANG=C` and the `sudo -n` prefix per the security guardrails). One connection.

### 4. DashboardLive: memoize derived data out of render/1 — HIGH (cheap)
**`lib/mast_web/live/dashboard_live.ex:156-157`**

`render/1` recomputes `filtered/2` + `fleet_stats/2` (full list scans) on every render, including diffs unrelated to the server list. Compute `:visible_servers` and `:stats` in assigns, refreshed only when `:servers` or `:filter` changes (mount + the relevant handle_* clauses).

### 5. ServerLive: stop over-fetching on PubSub — HIGH
**`lib/mast_web/live/server_live.ex:74-87, 132-133`**

`handle_info(:server_updated, ...)` reloads audit activity (`:132`) and re-runs `assign_series`/`Fleet.list_stats` (`:133`) on every status/metrics broadcast — neither of which a metrics update changes. Gate the stats refetch to range changes / stat-collection events, and only reload activity on explicit navigation or audit-affecting events.

### 6. ~~Release handle lookups: filter in SQL~~ → split to #28  — MOVED

This item needed a migration (a Postgres generated column + unique index) to be done correctly, and the original "replace the advisory `taken?/3` check" framing was wrong — `taken?/3` is kept as the UX layer, the DB constraint is added as the race backstop. Settled in a `/domain-plan` session (ADR 0008 amended, CONTEXT.md gains an **Effective Handle** term). Tracked separately in **#28**. It is out of this clean slice.

### 7. Apps.upsert_from_probe: duplicate query — MED
**`lib/mast/apps.ex:54, 94`**

Calls `list_for_release/1` twice per probe (diff, then return). Reuse the post-write in-memory list instead of re-querying.

---

## Lower priority (note here, fix opportunistically)

- `ReleaseLive` log_buffer (`lib/mast_web/live/release_live.ex:240-269`) accumulates streamed lines in a regular assign — diffed per line. Use `stream`/`temporary_assigns`. — MED
- `AuditLive` re-presents the full event list each render (`lib/mast_web/live/audit_live.ex:99`). — MED
- `StatsDownsample.process_transition/1` (`lib/mast/workers/stats_downsample.ex:46-57`) groups all source rows in memory; move to SQL aggregation as fleet grows. — MED
- Oban `unique: [period: 50-55]` is shorter than the trigger window in `connection_check.ex:22`, `stats_collect.ex:20`, `app_probe.ex:14` — a job running longer than the period can enqueue a duplicate. Set period ≥ window. — LOW
- Ticker default 30s (`config/dev.exs:25`) fans out into `checks: 5` queue (`config/config.exs:22`); fixing #1 (6→1 connections) ~6x's the headroom. Confirm prod has no 30s override. — LOW

---

## Done =

- [ ] `mix mast.bench` task exists and reports SSH call count + timing per worker
- [ ] ConnectionCheck issues 1 SSH call (test-asserted); benchmark shows the drop
- [ ] PatchScan issues 1 SSH call
- [ ] SSHKit docstring corrected
- [ ] DashboardLive + ServerLive render/refetch fixes
- [ ] Release SQL-filter items resolved or split out (see open question)
- [ ] `mix precommit` green



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: SSH connection churn + render/query inefficiencies (audited, batch fix) #27

Summary

Verification harness (build this first)

Fixes (priority order)

1. ConnectionCheck: 6 SSH connections → 1 combined command — HIGH

2. Correct the SSHKit docstring — HIGH (do alongside #1)

3. PatchScan: 2 connections → 1 — MED

4. DashboardLive: memoize derived data out of render/1 — HIGH (cheap)

5. ServerLive: stop over-fetching on PubSub — HIGH

6. Release handle lookups: filter in SQL → split to #28 — MOVED

7. Apps.upsert_from_probe: duplicate query — MED

Lower priority (note here, fix opportunistically)

Done =

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

perf: SSH connection churn + render/query inefficiencies (audited, batch fix) #27

Description

Summary

Verification harness (build this first)

Fixes (priority order)

1. ConnectionCheck: 6 SSH connections → 1 combined command — HIGH

2. Correct the SSHKit docstring — HIGH (do alongside #1)

3. PatchScan: 2 connections → 1 — MED

4. DashboardLive: memoize derived data out of render/1 — HIGH (cheap)

5. ServerLive: stop over-fetching on PubSub — HIGH

6. Release handle lookups: filter in SQL → split to #28 — MOVED

7. Apps.upsert_from_probe: duplicate query — MED

Lower priority (note here, fix opportunistically)

Done =

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions