fix: harden crew dispatch profile enforcement#159
Merged
Conversation
JTInventory
added a commit
to JTInventory/firstmate
that referenced
this pull request
Jun 30, 2026
* feat(x-mode): add X mention completion follow-ups (kunchenguid#113) * feat(x-mode): X-mention completion follow-up flow Acknowledge an actionable X mention first, do the work, then post one follow-up reply when it completes. - fm-x-reply.sh: add --followup mode posting to the relay's /connector/followup endpoint; reuses thread-split, payload shape, dry-run (with a self-describing endpoint marker), and never-inline safety. Answer path unchanged. - fm-x-link.sh: link a spawned task to its originating mention via x_request/x_request_ts in state/<id>.meta (atomic, preserves other lines). - fm-x-followup.sh: --check detection plus post-and-clear on terminal completion; honors the 24h window (skip+prune past it), keeps the link on a failed post for retry. - fm-x-lib.sh: shared meta link get/set/clear helpers. - Docs: fmx-respond reads as one ack-first -> act -> follow-up flow; AGENTS.md §14 + supervision pointer document the link, completion follow-up, and 24h public-safe window. - Tests: cover --followup endpoint/payload/dry-run, link, and the followup helper; shellcheck clean. * no-mistakes(review): Captain, fix atomic X meta rewrites * no-mistakes(document): Document X completion follow-ups * feat(x-mode): dismiss skipped X mentions through the relay (kunchenguid#120) * feat(x-mode): dismiss skipped mentions at the relay The relay now exposes POST /connector/dismiss: acknowledge a pending mention without replying - it drops the request, posts nothing, and stops re-offering it. Wire firstmate to use it on the skip path so a deliberately unanswered mention no longer churns every poll and times out to the relay's "offline" auto-reply. - bin/fm-x-dismiss.sh: new client modeled on fm-x-reply.sh. POSTs {request_id} (no body) to /connector/dismiss with the bearer; echoes the request_id on 2xx, exits non-zero on non-2xx/transport failure. Honors FMX_DRY_RUN (records the would-be POST to state/x-outbox/ with an endpoint:"dismiss" marker, posts nothing) and rejects unsafe request_ids. - fmx-respond skill: the skip path now calls bin/fm-x-dismiss.sh before clearing the inbox file; answer and follow-up paths unchanged. - AGENTS.md section 14: documents that a skipped mention is dismissed at the relay, not just locally cleared. - tests: dismiss posts {request_id} to /connector/dismiss with the bearer and echoes it; dry-run records and posts nothing; non-2xx and transport failures exit non-zero; unsafe id and bad args rejected. * chore(no-mistakes): run the bash suite directly as the test step The test step had no configured test command, so it delegated to an agent; that agent-driven run crashed the no-mistakes daemon mid-step on this repo. Configure commands.test to run the firstmate behavior suite deterministically instead, mirroring .github/workflows/ci.yml: iterate every tests/*.test.sh, run each, and fail the step if any exits non-zero. This removes the agent from the test step entirely (no crash) and makes the gate's test baseline match CI. Same pattern myfirstmate uses (commands.test: mix deps.get && mix test). * no-mistakes(review): Fix X dismiss docs and gate preflight * no-mistakes(document): Document X dismiss and gate tests * feat(watcher): absorb wakes only when the crew is provably working (kunchenguid#126) * feat(watcher): absorb wakes only when the crew is provably working The no-verb triage path (a bare turn-end, a working: note, a non-terminal stale) used to be benign by default and surfaced only on a captain-relevant status verb. A crew that finished but reported through interactive pane menus (no done: status) had its final turn-end absorbed, so firstmate was never woken and the finish was missed. Invert the rule: absorb a no-verb turn-end or non-terminal stale ONLY when the crew shows positive evidence it is still working - its no-mistakes run for its branch is in an actively-running step, or its pane shows the harness busy signature. Otherwise surface it so firstmate peeks (done, waiting, or wedged). - fm-classify-lib.sh: add crew_is_provably_working (reuses fm-crew-state.sh, no run-step duplication) and signal_crew_provably_working; FM_CREW_STATE_BIN override for tests. - fm-watch.sh: signal path surfaces a no-verb wake whose crew is not provably working (costly check runs only on the no-verb, non-afk path); non-terminal stale surfaces immediately when not provably working, else absorbs with the wedge timer (run-step read only on first sight of a stale hash). - afk path unchanged: the watcher stays one-shot and skips the provably-working read; the daemon keeps its bounded-latency stale backstop. - tests: cover every required semantic (mid-pipeline absorb, finished/parked surface, no-running-pipeline idle surface, busy absorb, captain-verb surface) as classifier unit tests and behavioral watcher runs; queue-safety test for the new immediate-surface stale path. - AGENTS.md section 8: document absorb-only-when-provably-working. * no-mistakes(document): Sync watcher documentation * feat: add grok crewmate harness support (kunchenguid#143) * feat(harness): add grok (Grok Build) as a verified crewmate adapter Empirically verified against grok 0.2.73 and encoded across the machinery: - fm-harness.sh: detect grok via GROK_AGENT=1 env marker (grok does not set CLAUDECODE) and `grok` command-name ancestry. - fm-spawn.sh: grok launch template (`grok --always-approve "$(cat BRIEF)"`, fully autonomous, no permission gate) and a turn-end Stop hook. grok only loads project hooks after a manual folder-trust grant, so the hook is a single firstmate-owned global hook (~/.grok/hooks/fm-turn-end.json, always trusted) that is a guarded no-op unless the workspace holds a per-task .fm-grok-turnend pointer; fm-spawn drops that gitignored pointer naming state/<id>.turn-ended. Hook stays outside the worktree, needs no trust grant. - fm-watch.sh + fm-tmux-lib.sh: grok busy signature `Ctrl+c:cancel` (the mid-turn cancel hint; ASCII, present iff a turn runs). - harness-adapters skill: grok facts section (busy, exit=Ctrl+Q x2, interrupt=Ctrl+C, skill invocation /<skill>, resume) and /no-mistakes form. Gating question confirmed: grok invokes /no-mistakes and drives a real no-mistakes axi run, so grok is usable for no-mistakes-mode tasks. End-to-end verified through fm-spawn: autonomous launch past the dir picker into the worktree, brief processed, busy->idle and turn-end signal detected, fm-send steer lands, clean Ctrl+Q exit and teardown. config/crew-harness is left unchanged; this only makes grok available as a verified option. * no-mistakes(review): Captain, harden Grok hook lifecycle * no-mistakes(review): Captain, make Grok harness test executable * no-mistakes(review): Captain, bound Grok pointer reads * no-mistakes(test): Captain, harden crew-state and watcher-lock timing * no-mistakes(document): Document Grok harness support * feat(harness): split secondmate harness configuration (kunchenguid#144) * feat(harness): split secondmate harness and inherit primary config into secondmate homes Add config/secondmate-harness so secondmates can run on a different adapter than crewmates. fm-harness.sh gains a `secondmate` mode resolving the chain config/secondmate-harness -> config/crew-harness -> own; `crew` mode is unchanged. fm-spawn resolves a --secondmate launch through that mode (durable: every respawn re-resolves), while an explicit per-spawn harness arg still wins and the unverified-adapter guard still holds. Add a generic, extensible inheritable-config mechanism (fm-config-inherit-lib.sh) that pushes the primary's declared LOCAL config into each secondmate home's config/ at secondmate spawn and on the bootstrap secondmate sweep. Exactly one item is wired today: config/crew-harness, so a secondmate's own crewmates use the primary's setting. Primary-authoritative (re-pushed every convergence, mirrors absence); config/secondmate-harness is deliberately not inherited since secondmates never spawn secondmates. config/ is gitignored, so this is a copy separate from the tracked-files fast-forward. Update AGENTS.md (layout, bootstrap, harness, spawn), the harness-adapters skill, docs/scripts.md, and .gitignore. New tests cover secondmate resolution and fallback, spawn/respawn honoring config/secondmate-harness, config propagation on spawn and sweep, the unverified-adapter guard, and backward compatibility. * no-mistakes(review): Surface inherited config propagation failures * no-mistakes(review): Harden inherited config propagation * no-mistakes(review): Document literal harness inheritance requirement * no-mistakes(document): Document secondmate harness config * feat(backlog): default backlog operations to tasks-axi (kunchenguid#145) * feat(backlog): default to tasks-axi backend * no-mistakes(document): Sync backlog backend docs * fix(spawn): set per-task GOTMPDIR so interrupted Go builds don't leak /tmp (kunchenguid#36) * fix(spawn): set per-task GOTMPDIR so interrupted Go builds don't leak /tmp Go's GOTMPDIR is unset, so every go build/test creates numbered /tmp/go-build* dirs. Go cleans them on a clean exit but LEAVES THEM when interrupted (signal, timeout, OOM, full disk), accumulating and filling the disk over time. Give each task its own temp root at /tmp/fm-<id>/ with Go's build temp nested at gotmp/. fm-spawn creates the dir (Go won't mkdir GOTMPDIR), exports GOTMPDIR into the crewmate pane so the agent and child processes inherit it, and records tasktmp= in meta. fm-teardown reads tasktmp= and removes the whole root on cleanup, deterministically. GOTMPDIR (not TMPDIR) is the targeted knob: TMPDIR is too broad (affects every program's temp). The nested root is extensible: other per-task temp can live under /tmp/fm-<id>/ later. Backward compat: tasks spawned before this change have no tasktmp= in meta; teardown tolerates the empty value as a no-op. The daily fm-disk-cleanup.sh cron remains a safety net for any pre-fix stray dirs. * fix(tests): silence SC2016 for literal grep -F patterns in fm-gotmp test The structural grep -F assertions deliberately match literal $TASK_TMP in the fm-spawn source; add per-line shellcheck disable=SC2016 (the codebase's existing pattern, e.g. bin/fm-spawn.sh) so CI lint passes. * no-mistakes(document): docs: document tasktmp= meta field for per-task GOTMPDIR --------- Co-authored-by: e-jung <8334081+e-jung@users.noreply.github.com> * fix: accept landed squash-merged PR heads (kunchenguid#149) * fix(teardown): accept landed squash-merge PR heads * no-mistakes(document): Document teardown landing behavior * no-mistakes: apply CI fixes * fix(test): pass explicit teardown git identity * feat(dispatch): add dynamic crew profiles (kunchenguid#154) * feat(dispatch): add dynamic crew profiles * no-mistakes(review): Captain, document dispatch profile inheritance * no-mistakes(review): Captain, guard stale dispatch inheritance * no-mistakes(document): Sync dispatch profile docs * no-mistakes: apply CI fixes * fix: harden crew dispatch profile enforcement (kunchenguid#159) * Harden crew dispatch profile enforcement * no-mistakes(document): Captain, synced crew dispatch docs * feat: add live secondmate config push (kunchenguid#161) * feat(config): add live secondmate config push * no-mistakes(document): Document config push behavior * no-mistakes(lint): Clean changed shell lint * no-mistakes: apply CI fixes * feat: support image attachments in X replies (kunchenguid#162) * feat(x): add image attachments to reply helpers * no-mistakes(review): Stream X image replies safely * no-mistakes(review): Captain, clean X reply temp tracking * no-mistakes(document): Document X reply image support * Harden cleanup and image payload limits * no-mistakes(review): Captain, validate spawn task IDs * no-mistakes(document): Document X image cap --------- Co-authored-by: Kun Chen <3233006+kunchenguid@users.noreply.github.com> Co-authored-by: e-jung <e-jung@users.noreply.github.com> Co-authored-by: e-jung <8334081+e-jung@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Intent
Harden dispatch-profile consultation so config/crew-dispatch.json cannot be silently bypassed. Bootstrap should print a concise active-rules block for valid crew-dispatch configs while preserving the invalid JSON/schema error path. fm-spawn should require an explicit resolved harness for crewmate and scout dispatches whenever that file exists, while preserving no-file fallback to config/crew-harness, preserving explicit --harness, positional harness, raw launch command, and batch behavior, and exempting --secondmate launches. Update AGENTS.md and focused behavior tests for those contracts.
What Changed
config/crew-dispatch.jsonrules during bootstrap while keeping malformed or invalid config diagnostics intact.Risk Assessment
✅ Low: Captain, the change is narrowly scoped to bootstrap reporting and spawn-time harness enforcement, and I found no material correctness or compatibility issues in the reviewed paths.
Testing
Captain, the configured full-suite baseline was already green per prompt; I reran the focused bootstrap and spawn dispatch-profile behavior tests, then captured CLI evidence showing the intended end-user contracts working in isolated homes with fake tmux and real git worktrees. All focused checks and manual evidence-producing scenarios passed.
Evidence: Manual CLI evidence transcript
Evidence: Focused bootstrap behavior test log
Evidence: Focused spawn dispatch-profile behavior test log
Pipeline
Updates from git push no-mistakes
✅ **intent** - passed
✅ No issues found.
✅ **Rebase** - passed
✅ No issues found.
✅ **Review** - passed
✅ No issues found.
✅ **Test** - passed
✅ No issues found.
command -v tmux >/dev/null || { echo "tmux is required for e2e tests" >&2; exit 1; }; tmux -V; rc=0; for t in tests/*.test.sh; do echo "== $t =="; bash "$t" || rc=1; done; exit "$rc"Baseline already successful per prompt:command -v tmux >/dev/null || { echo "tmux is required for e2e tests" >&2; exit 1; }; tmux -V; rc=0; for t in tests/*.test.sh; do echo "== $t =="; bash "$t" || rc=1; done; exit "$rc"bash tests/fm-bootstrap.test.sh > /var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/fm-bootstrap-test.log 2>&1bash tests/fm-spawn-dispatch-profile.test.sh > /var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/fm-spawn-dispatch-profile-test.log 2>&1Manual isolated CLI verification saved todispatch-profile-cli-transcript.txt:fm-bootstrap.shvalid active rules, malformed JSON error path,fm-spawn.shno-harness refusal with no meta, explicit codex spawn with captured launch command, no-profileconfig/crew-harnessfallback, and--secondmateexemption.✅ **Document** - passed
✅ No issues found.
✅ **Lint** - passed
✅ No issues found.
✅ **Push** - passed
✅ No issues found.