Skip to content

fix: harden crew dispatch profile enforcement#159

Merged
kunchenguid merged 2 commits into
mainfrom
fm/crew-dispatch-harden-b6
Jun 30, 2026
Merged

fix: harden crew dispatch profile enforcement#159
kunchenguid merged 2 commits into
mainfrom
fm/crew-dispatch-harden-b6

Conversation

@kunchenguid

Copy link
Copy Markdown
Owner

Intent

Harden dispatch-profile consultation so config/crew-dispatch.json cannot be silently bypassed. Bootstrap should print a concise active-rules block for valid crew-dispatch configs while preserving the invalid JSON/schema error path. fm-spawn should require an explicit resolved harness for crewmate and scout dispatches whenever that file exists, while preserving no-file fallback to config/crew-harness, preserving explicit --harness, positional harness, raw launch command, and batch behavior, and exempting --secondmate launches. Update AGENTS.md and focused behavior tests for those contracts.

What Changed

  • Captain, surfaces valid config/crew-dispatch.json rules during bootstrap while keeping malformed or invalid config diagnostics intact.
  • Requires crewmate and scout spawns to pass an explicit resolved harness whenever dispatch profiles are present, while preserving explicit harness, positional harness, raw launch command, batch, fallback, and secondmate behavior.
  • Updates firstmate docs and focused shell tests around dispatch-profile enforcement and bootstrap reporting.

Risk Assessment

✅ Low: Captain, the change is narrowly scoped to bootstrap reporting and spawn-time harness enforcement, and I found no material correctness or compatibility issues in the reviewed paths.

Testing

Captain, the configured full-suite baseline was already green per prompt; I reran the focused bootstrap and spawn dispatch-profile behavior tests, then captured CLI evidence showing the intended end-user contracts working in isolated homes with fake tmux and real git worktrees. All focused checks and manual evidence-producing scenarios passed.

Evidence: Manual CLI evidence transcript
## Bootstrap surfaces active dispatch rules
$ FM_HOME=$BOOT_HOME FM_ROOT_OVERRIDE=$BOOT_HOME bin/fm-bootstrap.sh
CREW_DISPATCH: active config/crew-dispatch.json
  rule: fresh news -> grok
  rule: big feature -> codex/gpt-5.5/high
  default: claude/haiku/low

## Bootstrap preserves malformed JSON error path
$ FM_HOME=$BOOT_HOME FM_ROOT_OVERRIDE=$BOOT_HOME bin/fm-bootstrap.sh
CREW_DISPATCH: invalid config/crew-dispatch.json - malformed JSON

## fm-spawn refuses crewmate spawn when dispatch profile is active and no harness was passed
$ bin/fm-spawn.sh evidence-ship-z1 <project>
exit=1
error: config/crew-dispatch.json is active - pass an explicit harness resolved from the dispatch rules (the consultation backstop, so the rules are never silently skipped).
meta exists? no

## fm-spawn accepts explicit resolved harness and records launch details
$ bin/fm-spawn.sh evidence-explicit-z2 <project> --harness codex --model gpt-5 --effort high
exit=0
spawned evidence-explicit-z2 harness=codex kind=ship mode=no-mistakes yolo=off window=firstmate:fm-evidence-explicit-z2 worktree=/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/manual-cli/worktree
harness=codex
kind=ship
model=gpt-5
effort=high
captured launch command:
codex --model 'gpt-5' -c 'model_reasoning_effort="high"' --dangerously-bypass-approvals-and-sandbox -c "notify=[\"bash\",\"-c\",\"touch '/private/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/manual-cli/spawn-home/state/evidence-explicit-z2.turn-ended'\"]" "$(cat '/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/manual-cli/spawn-home/data/evidence-explicit-z2/brief.md')"

## fm-spawn preserves no-profile fallback to config/crew-harness
$ rm config/crew-dispatch.json; bin/fm-spawn.sh evidence-fallback-z3 <project>
exit=0
spawned evidence-fallback-z3 harness=codex kind=ship mode=no-mistakes yolo=off window=firstmate:fm-evidence-fallback-z3 worktree=/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/manual-cli/worktree
harness=codex
kind=ship
model=default
effort=default

## fm-spawn exempts secondmate launches from the crewmate dispatch backstop
$ bin/fm-spawn.sh evidence-secondmate-z4 <secondmate-home> --secondmate
exit=0
warning: secondmate evidence-secondmate-z4 sync skipped before launch: primary default-branch commit cannot be resolved
spawned evidence-secondmate-z4 harness=codex kind=secondmate mode=secondmate yolo=off window=firstmate:fm-evidence-secondmate-z4 worktree=/private/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/manual-cli/secondmate-home
harness=codex
kind=secondmate
model=default
effort=default
home=/private/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/manual-cli/secondmate-home
captured launch command:
FM_ROOT_OVERRIDE= FM_STATE_OVERRIDE= FM_DATA_OVERRIDE= FM_PROJECTS_OVERRIDE= FM_CONFIG_OVERRIDE= FM_HOME='/private/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/manual-cli/secondmate-home' codex --dangerously-bypass-approvals-and-sandbox "$(cat '/private/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/manual-cli/secondmate-home/data/charter.md')"
Evidence: Focused bootstrap behavior test log
ok - bootstrap reports treehouse lease + tasks-axi default/backend contracts
ok - bootstrap enforces no-mistakes minimum version
ok - bootstrap surfaces active crew-dispatch rules and default
ok - bootstrap validates crew-dispatch.json and reports malformed or unverified configs
Evidence: Focused spawn dispatch-profile behavior test log
ok - no --model/--effort records defaults and keeps the claude launch byte-identical
ok - active crew-dispatch profile requires an explicit harness for ship spawns
ok - active crew-dispatch profile requires an explicit harness for scout spawns
ok - active crew-dispatch profile allows an explicit resolved harness
ok - active crew-dispatch profile allows the legacy positional harness form
ok - active crew-dispatch profile allows the raw launch-command escape hatch
ok - claude receives --model and --effort profile flags
ok - codex receives --model and model_reasoning_effort profile flags
ok - codex omits unsupported max effort instead of passing a bad config value
ok - grok receives --model and --reasoning-effort profile flags
ok - grok omits unsupported max reasoning effort
ok - opencode receives --model and omits the unsupported effort axis
ok - pi threads model and omits unsupported max effort
ok - batch dispatch forwards shared --harness, --model, and --effort to every pair
ok - active crew-dispatch profile does not block secondmate launches
# all fm-spawn-dispatch-profile tests passed

Pipeline

Updates from git push no-mistakes

✅ **intent** - passed

✅ No issues found.

✅ **Rebase** - passed

✅ No issues found.

✅ **Review** - passed

✅ No issues found.

✅ **Test** - passed

✅ No issues found.

  • command -v tmux >/dev/null || { echo "tmux is required for e2e tests" >&2; exit 1; }; tmux -V; rc=0; for t in tests/*.test.sh; do echo "== $t =="; bash "$t" || rc=1; done; exit "$rc"
  • Baseline already successful per prompt: command -v tmux &gt;/dev/null || { echo &#34;tmux is required for e2e tests&#34; &gt;&amp;2; exit 1; }; tmux -V; rc=0; for t in tests/*.test.sh; do echo &#34;== $t ==&#34;; bash &#34;$t&#34; || rc=1; done; exit &#34;$rc&#34;
  • bash tests/fm-bootstrap.test.sh > /var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/fm-bootstrap-test.log 2>&1
  • bash tests/fm-spawn-dispatch-profile.test.sh > /var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KWBNS83NX0YWR86G0RNFPEYP/fm-spawn-dispatch-profile-test.log 2>&1
  • Manual isolated CLI verification saved to dispatch-profile-cli-transcript.txt: fm-bootstrap.sh valid active rules, malformed JSON error path, fm-spawn.sh no-harness refusal with no meta, explicit codex spawn with captured launch command, no-profile config/crew-harness fallback, and --secondmate exemption.
✅ **Document** - passed

✅ No issues found.

✅ **Lint** - passed

✅ No issues found.

✅ **Push** - passed

✅ No issues found.

@kunchenguid kunchenguid merged commit 008d65c into main Jun 30, 2026
4 checks passed
@kunchenguid kunchenguid deleted the fm/crew-dispatch-harden-b6 branch June 30, 2026 07:34
JTInventory added a commit to JTInventory/firstmate that referenced this pull request Jun 30, 2026
* feat(x-mode): add X mention completion follow-ups (kunchenguid#113)

* feat(x-mode): X-mention completion follow-up flow

Acknowledge an actionable X mention first, do the work, then post one
follow-up reply when it completes.

- fm-x-reply.sh: add --followup mode posting to the relay's
  /connector/followup endpoint; reuses thread-split, payload shape,
  dry-run (with a self-describing endpoint marker), and never-inline
  safety. Answer path unchanged.
- fm-x-link.sh: link a spawned task to its originating mention via
  x_request/x_request_ts in state/<id>.meta (atomic, preserves other
  lines).
- fm-x-followup.sh: --check detection plus post-and-clear on terminal
  completion; honors the 24h window (skip+prune past it), keeps the link
  on a failed post for retry.
- fm-x-lib.sh: shared meta link get/set/clear helpers.
- Docs: fmx-respond reads as one ack-first -> act -> follow-up flow;
  AGENTS.md §14 + supervision pointer document the link, completion
  follow-up, and 24h public-safe window.
- Tests: cover --followup endpoint/payload/dry-run, link, and the
  followup helper; shellcheck clean.

* no-mistakes(review): Captain, fix atomic X meta rewrites

* no-mistakes(document): Document X completion follow-ups

* feat(x-mode): dismiss skipped X mentions through the relay (kunchenguid#120)

* feat(x-mode): dismiss skipped mentions at the relay

The relay now exposes POST /connector/dismiss: acknowledge a pending
mention without replying - it drops the request, posts nothing, and stops
re-offering it. Wire firstmate to use it on the skip path so a deliberately
unanswered mention no longer churns every poll and times out to the relay's
"offline" auto-reply.

- bin/fm-x-dismiss.sh: new client modeled on fm-x-reply.sh. POSTs
  {request_id} (no body) to /connector/dismiss with the bearer; echoes the
  request_id on 2xx, exits non-zero on non-2xx/transport failure. Honors
  FMX_DRY_RUN (records the would-be POST to state/x-outbox/ with an
  endpoint:"dismiss" marker, posts nothing) and rejects unsafe request_ids.
- fmx-respond skill: the skip path now calls bin/fm-x-dismiss.sh before
  clearing the inbox file; answer and follow-up paths unchanged.
- AGENTS.md section 14: documents that a skipped mention is dismissed at the
  relay, not just locally cleared.
- tests: dismiss posts {request_id} to /connector/dismiss with the bearer
  and echoes it; dry-run records and posts nothing; non-2xx and transport
  failures exit non-zero; unsafe id and bad args rejected.

* chore(no-mistakes): run the bash suite directly as the test step

The test step had no configured test command, so it delegated to an agent;
that agent-driven run crashed the no-mistakes daemon mid-step on this repo.
Configure commands.test to run the firstmate behavior suite deterministically
instead, mirroring .github/workflows/ci.yml: iterate every tests/*.test.sh,
run each, and fail the step if any exits non-zero. This removes the agent from
the test step entirely (no crash) and makes the gate's test baseline match CI.
Same pattern myfirstmate uses (commands.test: mix deps.get && mix test).

* no-mistakes(review): Fix X dismiss docs and gate preflight

* no-mistakes(document): Document X dismiss and gate tests

* feat(watcher): absorb wakes only when the crew is provably working (kunchenguid#126)

* feat(watcher): absorb wakes only when the crew is provably working

The no-verb triage path (a bare turn-end, a working: note, a non-terminal
stale) used to be benign by default and surfaced only on a captain-relevant
status verb. A crew that finished but reported through interactive pane menus
(no done: status) had its final turn-end absorbed, so firstmate was never
woken and the finish was missed.

Invert the rule: absorb a no-verb turn-end or non-terminal stale ONLY when the
crew shows positive evidence it is still working - its no-mistakes run for its
branch is in an actively-running step, or its pane shows the harness busy
signature. Otherwise surface it so firstmate peeks (done, waiting, or wedged).

- fm-classify-lib.sh: add crew_is_provably_working (reuses fm-crew-state.sh,
  no run-step duplication) and signal_crew_provably_working; FM_CREW_STATE_BIN
  override for tests.
- fm-watch.sh: signal path surfaces a no-verb wake whose crew is not provably
  working (costly check runs only on the no-verb, non-afk path); non-terminal
  stale surfaces immediately when not provably working, else absorbs with the
  wedge timer (run-step read only on first sight of a stale hash).
- afk path unchanged: the watcher stays one-shot and skips the provably-working
  read; the daemon keeps its bounded-latency stale backstop.
- tests: cover every required semantic (mid-pipeline absorb, finished/parked
  surface, no-running-pipeline idle surface, busy absorb, captain-verb surface)
  as classifier unit tests and behavioral watcher runs; queue-safety test for
  the new immediate-surface stale path.
- AGENTS.md section 8: document absorb-only-when-provably-working.

* no-mistakes(document): Sync watcher documentation

* feat: add grok crewmate harness support (kunchenguid#143)

* feat(harness): add grok (Grok Build) as a verified crewmate adapter

Empirically verified against grok 0.2.73 and encoded across the machinery:

- fm-harness.sh: detect grok via GROK_AGENT=1 env marker (grok does not set
  CLAUDECODE) and `grok` command-name ancestry.
- fm-spawn.sh: grok launch template (`grok --always-approve "$(cat BRIEF)"`,
  fully autonomous, no permission gate) and a turn-end Stop hook. grok only
  loads project hooks after a manual folder-trust grant, so the hook is a
  single firstmate-owned global hook (~/.grok/hooks/fm-turn-end.json, always
  trusted) that is a guarded no-op unless the workspace holds a per-task
  .fm-grok-turnend pointer; fm-spawn drops that gitignored pointer naming
  state/<id>.turn-ended. Hook stays outside the worktree, needs no trust grant.
- fm-watch.sh + fm-tmux-lib.sh: grok busy signature `Ctrl+c:cancel` (the
  mid-turn cancel hint; ASCII, present iff a turn runs).
- harness-adapters skill: grok facts section (busy, exit=Ctrl+Q x2,
  interrupt=Ctrl+C, skill invocation /<skill>, resume) and /no-mistakes form.

Gating question confirmed: grok invokes /no-mistakes and drives a real
no-mistakes axi run, so grok is usable for no-mistakes-mode tasks. End-to-end
verified through fm-spawn: autonomous launch past the dir picker into the
worktree, brief processed, busy->idle and turn-end signal detected, fm-send
steer lands, clean Ctrl+Q exit and teardown. config/crew-harness is left
unchanged; this only makes grok available as a verified option.

* no-mistakes(review): Captain, harden Grok hook lifecycle

* no-mistakes(review): Captain, make Grok harness test executable

* no-mistakes(review): Captain, bound Grok pointer reads

* no-mistakes(test): Captain, harden crew-state and watcher-lock timing

* no-mistakes(document): Document Grok harness support

* feat(harness): split secondmate harness configuration (kunchenguid#144)

* feat(harness): split secondmate harness and inherit primary config into secondmate homes

Add config/secondmate-harness so secondmates can run on a different adapter
than crewmates. fm-harness.sh gains a `secondmate` mode resolving the chain
config/secondmate-harness -> config/crew-harness -> own; `crew` mode is
unchanged. fm-spawn resolves a --secondmate launch through that mode (durable:
every respawn re-resolves), while an explicit per-spawn harness arg still wins
and the unverified-adapter guard still holds.

Add a generic, extensible inheritable-config mechanism (fm-config-inherit-lib.sh)
that pushes the primary's declared LOCAL config into each secondmate home's
config/ at secondmate spawn and on the bootstrap secondmate sweep. Exactly one
item is wired today: config/crew-harness, so a secondmate's own crewmates use
the primary's setting. Primary-authoritative (re-pushed every convergence,
mirrors absence); config/secondmate-harness is deliberately not inherited since
secondmates never spawn secondmates. config/ is gitignored, so this is a copy
separate from the tracked-files fast-forward.

Update AGENTS.md (layout, bootstrap, harness, spawn), the harness-adapters
skill, docs/scripts.md, and .gitignore. New tests cover secondmate resolution
and fallback, spawn/respawn honoring config/secondmate-harness, config
propagation on spawn and sweep, the unverified-adapter guard, and backward
compatibility.

* no-mistakes(review): Surface inherited config propagation failures

* no-mistakes(review): Harden inherited config propagation

* no-mistakes(review): Document literal harness inheritance requirement

* no-mistakes(document): Document secondmate harness config

* feat(backlog): default backlog operations to tasks-axi (kunchenguid#145)

* feat(backlog): default to tasks-axi backend

* no-mistakes(document): Sync backlog backend docs

* fix(spawn): set per-task GOTMPDIR so interrupted Go builds don't leak /tmp (kunchenguid#36)

* fix(spawn): set per-task GOTMPDIR so interrupted Go builds don't leak /tmp

Go's GOTMPDIR is unset, so every go build/test creates numbered /tmp/go-build*
dirs. Go cleans them on a clean exit but LEAVES THEM when interrupted (signal,
timeout, OOM, full disk), accumulating and filling the disk over time.

Give each task its own temp root at /tmp/fm-<id>/ with Go's build temp nested at
gotmp/. fm-spawn creates the dir (Go won't mkdir GOTMPDIR), exports GOTMPDIR into
the crewmate pane so the agent and child processes inherit it, and records
tasktmp= in meta. fm-teardown reads tasktmp= and removes the whole root on
cleanup, deterministically.

GOTMPDIR (not TMPDIR) is the targeted knob: TMPDIR is too broad (affects every
program's temp). The nested root is extensible: other per-task temp can live
under /tmp/fm-<id>/ later.

Backward compat: tasks spawned before this change have no tasktmp= in meta;
teardown tolerates the empty value as a no-op. The daily fm-disk-cleanup.sh cron
remains a safety net for any pre-fix stray dirs.

* fix(tests): silence SC2016 for literal grep -F patterns in fm-gotmp test

The structural grep -F assertions deliberately match literal $TASK_TMP in the
fm-spawn source; add per-line shellcheck disable=SC2016 (the codebase's existing
pattern, e.g. bin/fm-spawn.sh) so CI lint passes.

* no-mistakes(document): docs: document tasktmp= meta field for per-task GOTMPDIR

---------

Co-authored-by: e-jung <8334081+e-jung@users.noreply.github.com>

* fix: accept landed squash-merged PR heads (kunchenguid#149)

* fix(teardown): accept landed squash-merge PR heads

* no-mistakes(document): Document teardown landing behavior

* no-mistakes: apply CI fixes

* fix(test): pass explicit teardown git identity

* feat(dispatch): add dynamic crew profiles (kunchenguid#154)

* feat(dispatch): add dynamic crew profiles

* no-mistakes(review): Captain, document dispatch profile inheritance

* no-mistakes(review): Captain, guard stale dispatch inheritance

* no-mistakes(document): Sync dispatch profile docs

* no-mistakes: apply CI fixes

* fix: harden crew dispatch profile enforcement (kunchenguid#159)

* Harden crew dispatch profile enforcement

* no-mistakes(document): Captain, synced crew dispatch docs

* feat: add live secondmate config push (kunchenguid#161)

* feat(config): add live secondmate config push

* no-mistakes(document): Document config push behavior

* no-mistakes(lint): Clean changed shell lint

* no-mistakes: apply CI fixes

* feat: support image attachments in X replies (kunchenguid#162)

* feat(x): add image attachments to reply helpers

* no-mistakes(review): Stream X image replies safely

* no-mistakes(review): Captain, clean X reply temp tracking

* no-mistakes(document): Document X reply image support

* Harden cleanup and image payload limits

* no-mistakes(review): Captain, validate spawn task IDs

* no-mistakes(document): Document X image cap

---------

Co-authored-by: Kun Chen <3233006+kunchenguid@users.noreply.github.com>
Co-authored-by: e-jung <e-jung@users.noreply.github.com>
Co-authored-by: e-jung <8334081+e-jung@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant