theexperiential · dylanroscover · May 25, 2026 · May 22, 2026 · May 22, 2026 · May 22, 2026
diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
@@ -4,6 +4,35 @@ Owlette is a cloud-connected Windows process management and remote deployment sy
 
 **Version**: 2.12.3 | **License**: FSL-1.1-Apache-2.0
 
+---
+
+## Critical Guardrails (non-negotiable — read first)
+
+**Files you must not touch:**
+- `firestore.rules` — don't modify without explicit request
+- `.tokens.enc` / credential files — never read, log, or commit
+- `owlette_installer.iss` — only modify if you understand the full build pipeline (see `.claude/skills/build-system.md`)
+
+**Agent landmines:**
+- **Never import `firebase_admin`** — we use a custom REST client
+- **Never log OAuth tokens** — not even in debug, not even partially
+- **Never modify the `firebase` section** of `config.json` during remote config updates — breaks agent registration
+- **Never use blocking operations** in the 10-second main service loop — stalls all monitoring
+- **Never spawn reconnection logic** outside `ConnectionManager` — it has circuit breaker and backoff
+
+**Web landmines:**
+- **Never call Firestore directly from components** — use hooks in `web/hooks/`
+- **Never hardcode colors** — use CSS variables / Tailwind theme tokens
+- **Never add icon libraries** beyond `lucide-react`
+
+**Workflow:**
+- **Don't push to `main` directly** — all work through `dev`, then PR
+- **Don't create new `docs/*.md` files** without being asked
+- **Don't install new npm/pip packages** without confirming first
+- **Don't modify `.claude/hooks/` or `.claude/settings.json`** without explicit request
+
+---
+
 ## In-Flight Major Initiative: roost (project distribution v2)
 
 A multi-quarter rewrite of project distribution into a content-addressed sync platform (Cloudflare R2, immutable manifests, atomic deploy, rollback). Branded as "roost" (always lowercase). Plan + tasks live at `dev/active/project-distribution-v2/`. Memory: `project_roost.md`.
@@ -67,6 +96,10 @@ Version files: `/VERSION`, `agent/VERSION`, `web/package.json`, `firestore.rules
 
 **Lint as you go — don't let errors accumulate.** After editing any web file, run `npx eslint <file>` on that file (or `npm run lint` for a broader change) and fix every error and warning you introduced before moving on. Never commit new lint errors, and never rationalise them as "pre-existing" if your edit touched the same file. The repo has historical lint debt — your job is to not add to it, and to clean up any issues in lines you modified. Same principle for TypeScript: if `tsc` / IDE diagnostics flag your change, fix it before the next edit, not at commit time.
 
+**E2E verification (two layers).** The `playwright e2e` GitHub Action ([.github/workflows/e2e.yml](../.github/workflows/e2e.yml)) gates pushes to `dev`/`main` that touch `web/**`, `firestore.rules`, or `firebase.json`.
+- **Proactive (preferred):** before pushing such changes, run `/preflight` — it runs lint, typecheck, unit tests, and the local e2e suite (the exact mirror of CI, ~45s steady-state). Fix reds locally; don't ship them to a branch that auto-deploys.
+- **Reactive (safety net):** after a `git push` to `dev`/`main` in e2e scope, the `post-push-e2e.mjs` hook reminds you to watch the triggered run with `gh run watch <id> --exit-status` (run it in the background). On failure: `gh run view <id> --log-failed`, diagnose, and **propose** a fix — never auto-fix-and-repush (`dev` auto-deploys, `main` is protected).
+
 ---
 
 ## Agent Authentication (Device Code Pairing)
@@ -99,98 +132,23 @@ Agents authenticate via a device code flow — no browser login on the target ma
 
 **Failover load balancer**: `owlette.app` is fronted by a Cloudflare LB (Railway primary, Vercel standby) defined as Terraform in `infra/cloudflare/`. Health probe is `/api/health`. Apply workflow, token scope, and the origin-hostname gotchas: `.claude/skills/cf-load-balancing.md`.
 
-**IMPORTANT: Always version up AND update the changelog BEFORE building the installer.** Bump with `node scripts/sync-versions.js X.Y.Z` and commit BEFORE running `build_installer_full.bat` — the installer bakes the version into the exe filename and binary.
-
-**IMPORTANT: `docs/changelog.md` MUST be updated before every installer build.** Add a new `## [X.Y.Z] - YYYY-MM-DD` section summarising all changes since the last release. Never build or upload an installer without a matching changelog entry.
+**IMPORTANT — installer release order (do not reorder):** bump the version (`node scripts/sync-versions.js X.Y.Z`) **and** add the `## [X.Y.Z] - YYYY-MM-DD` entry to `docs/changelog.md`, then commit — *before* building. `build_installer_full.bat` bakes the version into the exe filename and binary, and an installer must never ship without a matching changelog entry.
 
-**Agent Installer Release** (build + upload to Firebase):
-```bash
-# 1. Update changelog, bump version, commit, push
-# Edit docs/changelog.md → add [X.Y.Z] section
-node scripts/sync-versions.js X.Y.Z
-git add -A && git commit -m "chore: bump version to X.Y.Z" && git push origin dev
-
-# 2. Build installer (~5 min, non-interactive)
-# build_installer_full.bat ends with `pause` and has `pause` on every error
-# branch, so it MUST be run with stdin redirected from NUL or it will hang
-# the harness forever. Invoke by FULL PATH (cmd /c won't reliably cd via
-# PowerShell quote-stripping) and capture the log explicitly. Run in the
-# background — exit code 0 means the .exe is built; check the log on failure.
-#
-#   powershell (foreground/background):
-#     cmd /c "C:\Users\admin\Documents\Git\Owlette\agent\build_installer_full.bat < NUL > C:\Users\admin\AppData\Local\Temp\installer-build.log 2>&1"
-#
-#   bash:
-#     cd c:/Users/admin/Documents/Git/Owlette/agent && cmd //c "build_installer_full.bat" < /dev/null > /tmp/installer-build.log 2>&1
-#     # (if //c gets mangled by Git Bash, fall back to the powershell cmd /c form above)
-#
-# DO NOT use `cd agent && powershell -Command "& './build_installer_full.bat'"` —
-# the trailing pause will hang non-interactive shells indefinitely.
-# Output: agent/build/installer_output/Owlette-Installer-vX.Y.Z.exe
-
-# 3. Compute checksum
-sha256sum agent/build/installer_output/Owlette-Installer-vX.Y.Z.exe
-
-# 4. Upload via API (3-step: request URL → upload binary → finalize)
-# Endpoint is `/api/installer/upload` (api-sprint route — old `/api/admin/installer/upload` was removed).
-# Auth: api key with `installer=*:write` scope (superadmin-only at minting). `x-api-key` or `Authorization: Bearer owk_…` both work.
-# Idempotency-Key REQUIRED on both POST and PUT — the route is wrapped in `withIdempotency(..., { requireKey: true })`.
-API_KEY=$(grep OWLETTE_API_KEY .claude/.env.local | cut -d= -f2)
-BASE_URL="https://dev.owlette.app"  # or https://owlette.app for prod
-
-# Step 1: Get signed upload URL
-curl -s -X POST "$BASE_URL/api/installer/upload" \
-  -H "Content-Type: application/json" \
-  -H "x-api-key: $API_KEY" \
-  -H "Idempotency-Key: installer-upload-X.Y.Z-$(date +%s)" \
-  -d '{"version":"X.Y.Z","fileName":"Owlette-Installer-vX.Y.Z.exe","releaseNotes":"...","setAsLatest":true}'
-# → returns uploadUrl, uploadId, storagePath, expiresAt (15-min window)
-
-# Step 2: Upload binary to the signed GCS URL (no Idempotency-Key here — it's a direct GCS PUT)
-curl -X PUT "$UPLOAD_URL" -H "Content-Type: application/octet-stream" \
-  --data-binary @agent/build/installer_output/Owlette-Installer-vX.Y.Z.exe
-
-# Step 3: Finalize (verifies file in storage, computes/checks checksum, writes installer_metadata, sets as latest)
-curl -s -X PUT "$BASE_URL/api/installer/upload" \
-  -H "Content-Type: application/json" \
-  -H "x-api-key: $API_KEY" \
-  -H "Idempotency-Key: installer-finalize-X.Y.Z-$(date +%s)" \
-  -d '{"uploadId":"<from step 1>","checksum_sha256":"<sha256 from earlier>"}'
-# checksum_sha256 is optional — server computes it if omitted, but providing it gets a 412 `checksum_mismatch` on corruption.
-```
+**Full release recipe** — the non-interactive build invocation (the `pause`-hang gotcha) plus the 3-step signed-URL upload → finalize API flow — lives in `.claude/skills/build-system.md` → "Agent Installer Release". That skill auto-activates on installer/release/version work.
 
 ---
 
-## Don'ts / Guardrails
-
-### Files You Must Not Touch
-- `web/components/ui/*` — auto-generated by shadcn/ui
-- `firestore.rules` — don't modify without explicit request
-- `.tokens.enc` / credential files — never read, log, or commit
-- `owlette_installer.iss` — only modify if you understand the full build pipeline
-
-### Agent Landmines
-- **Never import `firebase_admin`** — we use a custom REST client
-- **Never log OAuth tokens** — not even in debug, not even partially
-- **Never modify the `firebase` section** of `config.json` during remote config updates — breaks agent registration
-- **Never use blocking operations** in the 10-second main service loop — stalls all monitoring
-- **Never spawn reconnection logic** outside `ConnectionManager` — it has circuit breaker and backoff
+## Conventions & Review Discipline
 
 ### UI Copy Style
 - **All user-facing copy is lowercase** — page titles, buttons, dialog headings, labels, descriptions, tooltips, placeholder text, empty-state copy, toasts. Match the voice of the rest of the UI.
 - Exceptions (keep normal casing): proper nouns/product names in external contexts, acronyms (`LLM`, `API`, `URL`, `GPU`, `OAuth`), code identifiers, machine IDs / site IDs / user-entered strings, and legal/compliance text where casing is load-bearing.
 - When adding new copy, default to lowercase. When editing existing strings, match the surrounding casing — don't mix sentence case into a lowercase screen or vice versa.
 
-### Web Landmines
-- **Never call Firestore directly from components** — use hooks in `web/hooks/`
-- **Never hardcode colors** — use CSS variables / Tailwind theme tokens
-- **Never add icon libraries** beyond `lucide-react`
-
-### General
-- **Don't push to `main` directly** — all work through `dev`, then PR
-- **Don't create new `docs/*.md` files** without being asked
-- **Don't install new npm/pip packages** without confirming first
-- **Don't modify `.claude/hooks/` or `.claude/settings.json`** without explicit request
+### Design System (shadcn/ui)
+- `web/components/ui/*` are shadcn primitives **copied into the repo — we own them.** Editing them for theming, variants, hover/focus states, and standardization is the *intended* shadcn workflow (they're scaffolding you customize, not auto-generated black boxes — they've been hand-tuned before).
+- Caveats when editing: changes are **app-wide**, so verify broadly; re-running `npx shadcn add <component>` **overwrites** that file, so port upstream fixes by hand; prefer CSS-variable tokens (`web/app/globals.css`) over hardcoded values.
+- **`button.tsx` variants are the single source of truth for button styling.** Standardize there — don't sprinkle per-instance `hover:*`/`bg-*` overrides on individual `<Button>`s (that's how the styling diverged).
 
 ### Review Discipline (code review, security review, audits)
 Reviews are judged on calibration, not volume. Three accurate findings are more valuable than twenty marginal ones, and inflated severities devalue every subsequent review on this codebase. Apply the following standard on every pass.
@@ -257,4 +215,4 @@ Be real, not flattering. If something was mid, say so. If it was genuinely great
 
 ---
 
-**Last Updated**: 2026-05-19
+**Last Updated**: 2026-05-24
diff --git a/.claude/commands/preflight.md b/.claude/commands/preflight.md
@@ -0,0 +1,55 @@
+---
+description: Pre-push gate — lint, typecheck, and run the playwright e2e suite locally before pushing web changes
+---
+
+Run the same checks CI runs, locally, *before* pushing — so a red "playwright e2e" run on dev/main becomes rare. The local e2e suite is the exact mirror of CI (same Playwright tests against the same Firebase emulators), so green here means green there for everything except cold-cache / ubuntu-vs-Windows quirks.
+
+Run this before any `git push` that touches `web/**`, `firestore.rules`, or `firebase.json`.
+
+## Process
+
+### Step 1: Scope check
+Determine whether the pending changes are in the e2e path filter (`web/**`, `firestore.rules`, `firebase.json`, `.github/workflows/e2e.yml`):
+```bash
+git status --porcelain
+git diff --name-only @{u}..HEAD 2>/dev/null   # committed-but-unpushed, if upstream exists
+```
+- If **nothing** in scope touches web/firestore/firebase → e2e won't run in CI. Skip to Step 2 for lint/typecheck only, then report "no e2e needed".
+- Otherwise continue through all steps.
+
+### Step 2: Lint + typecheck (fast — always run)
+```bash
+cd web && npm run lint
+```
+```bash
+cd web && npx tsc --noEmit
+```
+Fix every error/warning your changes introduced before proceeding (per CLAUDE.md "lint as you go").
+
+### Step 3: Unit tests (fast)
+```bash
+cd web && npm test
+```
+
+### Step 4: E2E suite (the authoritative gate)
+This does the production build, spins up the emulators, and runs Playwright (~45s steady-state after the first build; the first run pays for `npm run build`). Run in the BACKGROUND with a generous timeout — don't block the session synchronously:
+```bash
+cd web && npm run e2e
+```
+Prereqs (already set up on this machine): JDK 21 on PATH, `firebase-tools@13` global, `npx playwright install chromium` done once. If the emulator ports (Auth :9099, Firestore :8080, Storage :9199) are busy, a stale emulator is the usual culprit — kill it by PID (never by name) and retry.
+
+### Step 5: Report + gate
+```
+## Preflight
+- Scope:      [e2e-relevant / web-only-no-e2e / out-of-scope]
+- Lint:       [PASS/FAIL]
+- Typecheck:  [PASS/FAIL]
+- Unit tests: [PASS/FAIL — X passed]
+- E2E:        [PASS/FAIL — X passed, Y failed] (or "skipped — no e2e-relevant changes")
+
+Verdict: [SAFE TO PUSH / DO NOT PUSH]
+```
+- **All green** → say it's safe to push (don't push unless the user asked).
+- **Any red** → DO NOT push. Show the failing output, diagnose the root cause, and propose a fix. The whole point of preflight is to fix it here, where the loop is 45s, instead of after a 6+ min CI round-trip on a branch that auto-deploys.
+
+On failure, the Playwright HTML report + traces land in `web/e2e/.output/` — open `web/e2e/.output/report/` for the post-mortem.
diff --git a/.claude/hooks/post-push-e2e.mjs b/.claude/hooks/post-push-e2e.mjs
@@ -0,0 +1,123 @@
+/**
+ * PostToolUse Hook — Watch the Playwright E2E run after a push
+ *
+ * After a successful `git push` to dev/main that touches files in the
+ * .github/workflows/e2e.yml path filter (web/**, firestore.rules,
+ * firebase.json, the workflow itself), injects an instruction telling Claude
+ * to find the triggered "playwright e2e" run, watch it to completion via the
+ * gh CLI, and — on failure — pull the failing logs, diagnose the root cause,
+ * and PROPOSE a fix (no auto-repush; dev auto-deploys and main is protected).
+ *
+ * The hook does NOT poll CI itself — a 6–30 min `gh run watch` would hang the
+ * harness. It only detects the push and hands Claude a recipe, matching the
+ * sibling post-push-installer.mjs pattern.
+ */
+
+import { execSync } from 'child_process'
+
+// The e2e workflow's path filter. Keep in sync with .github/workflows/e2e.yml.
+const PATH_FILTER = [
+  (f) => f.startsWith('web/'),
+  (f) => f === 'firestore.rules',
+  (f) => f === 'firebase.json',
+  (f) => f === '.github/workflows/e2e.yml',
+]
+
+const matchesFilter = (file) => PATH_FILTER.some((m) => m(file))
+
+let input = ''
+for await (const chunk of process.stdin) {
+  input += chunk
+}
+
+try {
+  const data = JSON.parse(input)
+  const command = data.tool_input?.command || ''
+
+  // Only real pushes — skip non-push, dry-run, and ref deletions.
+  if (!/\bgit\s+push\b/.test(command)) process.exit(0)
+  if (/--dry-run\b/.test(command) || /--delete\b/.test(command) || /\s:\S/.test(command)) {
+    process.exit(0)
+  }
+
+  // Push must have succeeded for CI to have been triggered.
+  if (typeof data.tool_result?.exit_code === 'number' && data.tool_result.exit_code !== 0) {
+    process.exit(0)
+  }
+
+  // e2e's `push:` trigger only fires on dev/main. (PR branches trigger via
+  // pull_request — out of scope here; the dev/main path is the documented flow.)
+  const branch = getCurrentBranch()
+  if (branch !== 'dev' && branch !== 'main') process.exit(0)
+
+  // Best-effort: did this push touch the e2e path filter? If we can't tell,
+  // fail open (inject anyway) — better to over-verify than miss a red run.
+  const { files, known } = getPushedFiles(branch)
+  if (known && !files.some(matchesFilter)) {
+    // Push had changes but none in the e2e scope — CI won't run the suite.
+    process.exit(0)
+  }
+
+  const sha = getHeadSha()
+  const scopeNote = known
+    ? `Changed files in e2e scope: ${files.filter(matchesFilter).join(', ')}`
+    : `(Could not determine the pushed diff — verify whether a run was actually triggered.)`
+
+  const message = [
+    `Pushed to ${branch} (${sha.slice(0, 8)}). This touched the playwright e2e path filter, so the "playwright e2e" workflow (.github/workflows/e2e.yml) should run. Verify it succeeded:`,
+    '',
+    `1. Find the run for this push:`,
+    `   gh run list --workflow="playwright e2e" --branch ${branch} --limit 5 --json databaseId,headSha,status,conclusion,createdAt`,
+    `   Pick the run whose headSha starts with ${sha.slice(0, 8)}. If none has appeared yet, GitHub can lag a few seconds — wait ~15s and retry once.`,
+    `2. Watch it to completion (run in the BACKGROUND — cold CI can take up to ~30 min, target <6 min):`,
+    `   gh run watch <databaseId> --exit-status`,
+    `3. On SUCCESS: report green and stop.`,
+    `4. On FAILURE:`,
+    `   - gh run view <databaseId> --log-failed   # failing-step logs only`,
+    `   - if needed: gh run download <databaseId> -n playwright-report   # HTML report + traces`,
+    `   - diagnose the root cause, then PROPOSE a fix and wait for review. Do NOT auto-fix-and-repush.`,
+    '',
+    scopeNote,
+  ].join('\n')
+
+  process.stderr.write(`[post-push-e2e] ${branch} push in e2e scope — reminding Claude to watch the run\n`)
+  process.stdout.write(JSON.stringify({ message }))
+} catch (err) {
+  process.stderr.write(`[post-push-e2e] Error: ${err.message}\n`)
+}
+
+process.exit(0)
+
+function getCurrentBranch() {
+  try {
+    return execSync('git rev-parse --abbrev-ref HEAD', { encoding: 'utf-8', timeout: 5000 }).trim()
+  } catch {
+    return ''
+  }
+}
+
+function getHeadSha() {
+  try {
+    return execSync('git rev-parse HEAD', { encoding: 'utf-8', timeout: 5000 }).trim()
+  } catch {
+    return ''
+  }
+}
+
+/**
+ * Diff the just-pushed range using the remote-tracking ref's reflog
+ * (origin/<branch>@{1} = its value before this push). Returns { files, known }
+ * where known=false means we couldn't determine it and the caller should fail open.
+ */
+function getPushedFiles(branch) {
+  try {
+    const out = execSync(
+      `git diff --name-only "origin/${branch}@{1}..origin/${branch}"`,
+      { encoding: 'utf-8', timeout: 5000, stdio: ['pipe', 'pipe', 'pipe'] }
+    )
+    const files = out.split('\n').map((s) => s.trim()).filter(Boolean)
+    return { files, known: true }
+  } catch {
+    return { files: [], known: false }
+  }
+}
diff --git a/.claude/settings.json b/.claude/settings.json
@@ -28,6 +28,11 @@
             "type": "command",
             "command": "node \"$CLAUDE_PROJECT_DIR/.claude/hooks/post-push-installer.mjs\"",
             "timeout": 10
+          },
+          {
+            "type": "command",
+            "command": "node \"$CLAUDE_PROJECT_DIR/.claude/hooks/post-push-e2e.mjs\"",
+            "timeout": 10
           }
         ]
       }