Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 52 additions & 23 deletions .agents/skills/fmx-respond/SKILL.md

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions .no-mistakes.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
# Per-repo no-mistakes overrides.

# Run the firstmate bash behavior suite deterministically as the test-step
# baseline, instead of delegating to an agent (an agent-driven test step has
# crashed the daemon). Mirrors .github/workflows/ci.yml: iterate every
# tests/*.test.sh, run each, and fail the step if any one exits non-zero. The
# e2e tests need tmux on PATH, which the firstmate environment provides.
commands:
test: 'command -v tmux >/dev/null || { echo "tmux is required for e2e tests" >&2; exit 1; }; tmux -V; rc=0; for t in tests/*.test.sh; do echo "== $t =="; bash "$t" || rc=1; done; exit "$rc"'

# Keep test evidence out of this repo; it stays in a temp dir instead.
test:
evidence:
Expand Down
59 changes: 40 additions & 19 deletions AGENTS.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,14 @@ Tracked changes to firstmate itself - `AGENTS.md`, `README.md`, `CONTRIBUTING.md
When supervising live crewmates, keep firstmate's own long validation or build commands in the background so watcher wakes can still be handled.
Crewmate validation follows the installed no-mistakes version's SKILL.md and live `axi` help instead of duplicating gate mechanics in firstmate docs.
Firstmate's wrapper still matters: `ask-user` findings route to the captain through firstmate, and crewmates avoid `--yes` because it silently resolves captain-owned decisions without escalation.
Local `.no-mistakes/` state and test evidence stay out of this repo; `.no-mistakes.yaml` keeps evidence in a temp directory instead.
Local `.no-mistakes/` state and test evidence stay out of this repo; `.no-mistakes.yaml` keeps evidence in a temp directory and pins the gate's test command to the same bash behavior suite as CI.

Check and test the toolbelt before pushing:

```sh
bash -n bin/*.sh # syntax-check the toolbelt
shellcheck bin/*.sh tests/*.sh # lint the toolbelt and behavior tests; CI enforces this
for test_script in tests/*.test.sh; do "$test_script"; done # behavior tests, matching CI
for test_script in tests/*.test.sh; do bash "$test_script"; done # behavior tests, matching CI and no-mistakes commands.test
tests/fm-wake-queue.test.sh # durable wake queue losslessness, catch-up, double-drain, duplicate-collapse, and drain liveness guard tests
tests/fm-watcher-lock.test.sh # watcher singleton, lock-race, watch-arm liveness, and guard-warning tests
tests/fm-watch-triage.test.sh # always-on watcher triage: benign absorb, actionable surface, stale wedge threshold, heartbeat backstop, and afk one-shot coherence
Expand All @@ -71,7 +71,7 @@ tests/fm-composer-ghost.test.sh # dim-ghost stripping, ghost-only comp
tests/fm-afk-inject-e2e.test.sh # private-socket end-to-end test of the afk injection path (partial-input deferral, swallowed-Enter retry)
tests/fm-bootstrap.test.sh # bootstrap dependency and feature-probe tests
tests/fm-fleet-sync.test.sh # project clone refresh: safe detached recovery, STUCK drift reports, benign skips, and bootstrap relay
tests/fm-x-mode.test.sh # X-mode poll, inbox context round-trip, reply threading, dry-run preview, and .env-presence activation tests
tests/fm-x-mode.test.sh # X-mode poll, inbox context round-trip, reply threading, dismiss, dry-run preview, and .env-presence activation tests
tests/fm-tangle-guard.test.sh # primary-checkout tangle detection and spawn/brief isolation tests
tests/fm-spawn-batch.test.sh # batch dispatch and FM_HOME project-path scoping tests
tests/fm-update.test.sh # fast-forward-only self-update, reread, nudge, dedup, and skip-safety tests
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ This is.. a directory that turns any agent into your firstmate, and you the capt
- **Explicit project modes** - each project ships via `no-mistakes`, `direct-PR`, or `local-only`, with an optional `+yolo` autonomy flag.
- **Optional secondmates** - opt in to persistent domain supervisors that run from isolated firstmate homes with their own `FM_HOME`, state, projects, and session lock, kept on the primary firstmate version by guarded local fast-forwards.
- **Event-driven, zero-token supervision** - a bash watcher sleeps on the fleet and wakes the first mate only when something needs you.
- **Optional X mode** - opt in with one local `.env` token so firstmate can answer your public `@myfirstmate` mentions, act on normal reversible mention requests through the same lifecycle as chat requests, and report public-safe outcomes without changing non-X behavior; dry-run preview records would-be replies locally before go-live.
- **Optional X mode** - opt in with one local `.env` token so firstmate can answer your public `@myfirstmate` mentions, act on normal reversible mention requests through the same lifecycle as chat requests, acknowledge spawned work, and post one public-safe completion follow-up without changing non-X behavior; dry-run preview records would-be replies and dismissals locally before go-live.
- **Guarded by construction** - the first mate is read-only over your projects outside guarded clone refreshes, safe branch pruning, and approved `local-only` fast-forward merges; crewmates make every project change behind your merge approval.
- **Restart-proof** - all state lives on disk and in tmux; kill the session anytime and the next one reconciles and carries on.

Expand Down Expand Up @@ -115,7 +115,9 @@ A presence-gated sub-supervisor (`/afk`) can self-handle routine events and batc
An opt-in X mode can also use the watcher check path to answer your public `@myfirstmate` mentions and act on normal reversible mention requests from the current fleet state, with `FMX_DRY_RUN` available to test the poll -> compose -> would-post loop without publishing.
The relay routes only the owner's own mentions to that owner's firstmate home; parent-thread context may still include other public accounts.
The token is standing authorization for those autonomous replies and eligible lifecycle actions; destructive, irreversible, or security-sensitive asks are flagged for trusted-channel confirmation instead of being executed from a public mention.
It preserves parent-tweet context for follow-ups and skips pure acknowledgments without posting.
Requests that finish immediately get one public-safe outcome reply.
Requests that spawn longer-running work get an acknowledgement first, a task link in local state, and one completion follow-up within the relay's 24h window when that task lands, reports, or fails.
It preserves parent-tweet context for conversational replies and dismisses pure acknowledgments at the relay without posting.
Long replies stay text-only: the reply client splits them into bounded numbered threads when needed.
When firstmate works on itself, spawn-time isolation checks and a primary-checkout tangle alarm keep the operating checkout on its default branch and stop a crewmate that did not land in a separate worktree.

Expand Down
114 changes: 95 additions & 19 deletions bin/fm-classify-lib.sh
Original file line number Diff line number Diff line change
@@ -1,19 +1,39 @@
#!/usr/bin/env bash
# Shared wake classifier: the single source of truth for deciding whether a
# watcher wake is captain-relevant (must reach firstmate's LLM) or benign
# (absorbed in bash). Sourced by BOTH the always-on watcher (bin/fm-watch.sh)
# and the away-mode daemon (bin/fm-supervise-daemon.sh) so the triage policy
# lives in one place instead of two copies that can drift apart.
# Shared wake classifier: the common source of truth for captain-relevant status
# tests and, for the always-on watcher, the provably-working predicate that makes
# no-verb wakes safe to absorb. Sourced by BOTH the always-on watcher
# (bin/fm-watch.sh) and the away-mode daemon (bin/fm-supervise-daemon.sh) so the
# overlapping triage policy lives in one place instead of two copies that can
# drift apart.
#
# Every function is a pure, side-effect-free read of status files: it takes what
# it needs as arguments and touches no globals beyond the optional FM_CAPTAIN_RE
# override. Consumers layer their own dedup/marker state on top (the daemon keeps
# its escalation-digest seen-markers; the watcher keeps its .seen-* signatures).
# Most functions are pure, side-effect-free reads of status files: each takes
# what it needs as arguments and touches no globals beyond the optional
# FM_CAPTAIN_RE override. Consumers layer their own dedup/marker state on top (the
# daemon keeps its escalation-digest seen-markers; the watcher keeps its .seen-*
# signatures).
#
# The one exception is the "provably working" predicate (crew_is_provably_working
# and its signal-path wrapper). It is NOT a pure status-file read: it reuses
# bin/fm-crew-state.sh, which may make a bounded no-mistakes call, to decide
# whether a crew that just stopped its turn shows positive evidence it is still
# working. Callers run it ONLY on the no-verb (turn-end / non-terminal stale)
# path, never on every wake, so the per-wake triage stays cheap.

# Directory of this library, used to locate the sibling fm-crew-state.sh reader.
# Resolved at source time from BASH_SOURCE so it works whether sourced by a
# bin/ script (which sets its own SCRIPT_DIR) or directly by a test.
_FM_CLASSIFY_LIB_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd 2>/dev/null)" || _FM_CLASSIFY_LIB_DIR="."

# The crew current-state reader used for the "provably working" decision.
# Overridable so tests can stub the run-step/pane verdict without a real worktree
# or no-mistakes install; absent, it points at the real sibling script.
FM_CREW_STATE_BIN="${FM_CREW_STATE_BIN:-$_FM_CLASSIFY_LIB_DIR/fm-crew-state.sh}"

# Captain-relevant status verbs. A status line carrying any of these is work
# firstmate must see; everything else (working: notes, bare turn-ended) is
# benign. FM_CAPTAIN_RE overrides the whole set when a home needs a custom verb
# vocabulary; absent, this default applies.
# firstmate must see. Lines without these verbs are no-verb signals: the watcher
# absorbs them only with positive provably-working evidence, while the daemon uses
# its away-mode classification. FM_CAPTAIN_RE overrides the whole set when a home
# needs a custom verb vocabulary; absent, this default applies.
FM_CLASSIFY_CAPTAIN_RE_DEFAULT='done:|needs-decision:|blocked:|failed:|PR ready|checks green|ready in branch|merged'

# Return the last non-blank line of a status file (empty if missing/blank).
Expand All @@ -37,10 +57,11 @@ window_to_task() {
}

# 0 (actionable) if ANY status file listed in a "signal:" wake carries a
# captain-relevant last line; 1 (benign) otherwise. Pass the space-separated file
# list that follows the "signal:" prefix. Non-.status arguments (e.g. .turn-ended
# markers, which never carry a verb) are skipped, so a bare turn-end wake is
# benign.
# captain-relevant last line; 1 otherwise. Pass the space-separated file list that
# follows the "signal:" prefix. Non-.status arguments (e.g. .turn-ended markers,
# which never carry a verb) are skipped. A 1 here is NOT "benign" on its own: a
# no-verb signal (a bare turn-end, a working: note) is only benign when the crew is
# also provably working (signal_crew_provably_working below); otherwise it surfaces.
signal_reason_is_actionable() { # <file> ...
local f last
for f in "$@"; do
Expand All @@ -53,10 +74,65 @@ signal_reason_is_actionable() { # <file> ...
return 1
}

# 0 if crew <id> shows POSITIVE evidence it is still working; 1 otherwise. This is
# the "provably working" predicate at the heart of absorb-only-when-provably-working:
# a no-verb turn-end or non-terminal stale wake is absorbed ONLY when this returns
# 0, and SURFACED otherwise (the crew may be done, waiting on a decision, or wedged).
#
# It reuses bin/fm-crew-state.sh rather than duplicating its run-step logic, and
# treats the crew as provably working in exactly two cases, both read straight from
# that helper's one canonical line ("state: <s> · source: <src> · <detail>"):
# (a) state working from source run-step - the crew's no-mistakes run for its
# branch is in an actively-running step (running/fixing/ci), NOT terminal,
# parked, passed, or failed; OR
# (b) state working from source pane - the pane shows the harness busy
# signature.
# Everything else - a terminal/parked/failed run, an idle pane that fell back to a
# stale "working:" status-log line (source status-log), a torn-down or unknown
# crew, or an unreadable verdict - is NOT provably working, so the wake surfaces.
# NOT a pure read: fm-crew-state.sh may make a bounded no-mistakes call, so this
# runs only on the no-verb path. FM_CREW_STATE_BIN lets tests stub the verdict.
crew_is_provably_working() { # <id>
local id=$1 line state src
[ -n "$id" ] || return 1
line=$("$FM_CREW_STATE_BIN" "$id" 2>/dev/null) || true
case "$line" in state:*) ;; *) return 1 ;; esac
state=${line#state: }; state=${state%% *}
[ "$state" = working ] || return 1
src=${line#*source: }; src=${src%% *}
case "$src" in
run-step|pane) return 0 ;;
*) return 1 ;;
esac
}

# 0 (benign/absorb) if EVERY task referenced by a no-verb "signal:" wake is provably
# working; 1 (actionable/surface) if any is not, or no task can be resolved. Pass the
# same space-separated file list as signal_reason_is_actionable. Files are mapped to
# task ids by stripping the .status / .turn-ended suffix; a no-verb wake with nothing
# provably working must surface, so an empty/unresolvable list returns 1.
signal_crew_provably_working() { # <file> ...
local f base task seen=""
for f in "$@"; do
base=${f##*/}
case "$base" in
*.status) task=${base%.status} ;;
*.turn-ended) task=${base%.turn-ended} ;;
*) continue ;;
esac
[ -n "$task" ] || continue
case " $seen " in *" $task "*) continue ;; esac
seen="$seen $task"
crew_is_provably_working "$task" || return 1
done
[ -n "$seen" ] || return 1
return 0
}

# 0 (terminal/actionable) if a stale window's last status line is
# captain-relevant; 1 (non-terminal/benign) otherwise, including the no-status
# case. A non-terminal stale is a crew gone quiet mid-work: benign on first sight,
# but the caller bounds it with an idle-time escalation threshold.
# captain-relevant; 1 otherwise, including the no-status case. A 1 only means
# "non-terminal"; the always-on watcher then applies crew_is_provably_working,
# while the away-mode daemon applies its persistence recheck.
stale_is_terminal() { # <window> <state>
local win=$1 state=$2 last
last=$(last_status_line "$state/$(window_to_task "$win").status")
Expand Down
85 changes: 85 additions & 0 deletions bin/fm-cognee-lookup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#!/usr/bin/env bash
# Local dry-run wrapper for future Cognee lookup integration.
#
# This script deliberately does not call Cognee. It accepts a local answer
# fixture, treats it as an untrusted hint, and asks the local manifest checker to
# prove whether any cited source can be reopened and checksum-verified.
set -eu

usage() {
cat >&2 <<'USAGE'
usage: fm-cognee-lookup.sh --dry-run --query <text> [--manifest <manifest.tsv> --answer-file <answer.txt>]

No live mode exists yet. Without --dry-run this command fails closed before any
network, environment, MCP, or config access can happen.
USAGE
}

die() {
echo "error: $*" >&2
exit 1
}

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
DRY_RUN=false
QUERY=
MANIFEST=
ANSWER_FILE=

while [ $# -gt 0 ]; do
case "$1" in
--dry-run)
DRY_RUN=true
shift
;;
--query)
QUERY=${2:-}
[ -n "$QUERY" ] || die "--query requires text"
shift 2
;;
--manifest)
MANIFEST=${2:-}
[ -n "$MANIFEST" ] || die "--manifest requires a path"
shift 2
;;
--answer-file)
ANSWER_FILE=${2:-}
[ -n "$ANSWER_FILE" ] || die "--answer-file requires a path"
shift 2
;;
--help|-h)
usage
exit 0
;;
*)
usage
exit 1
;;
esac
done

if ! "$DRY_RUN"; then
echo "label=blocked_missing_proof reason=live_cognee_lookup_not_implemented external_action_authorized=false" >&2
exit 2
fi

[ -n "$QUERY" ] || die "--query is required in dry-run mode"

query_hash=$(printf '%s' "$QUERY" | sha256sum | awk '{print $1}')
query_bytes=$(printf '%s' "$QUERY" | wc -c | tr -d ' ')

echo "mode=dry-run"
echo "query_sha256=$query_hash"
echo "query_bytes=$query_bytes"
echo "cognee_answer_status=hint_only"
echo "external_action_authorized=false"

if [ -z "$MANIFEST" ] && [ -z "$ANSWER_FILE" ]; then
echo "label=hint_only reason=no_manifest_or_answer_fixture external_action_authorized=false"
exit 0
fi

[ -n "$MANIFEST" ] || die "--manifest is required when --answer-file is used"
[ -n "$ANSWER_FILE" ] || die "--answer-file is required when --manifest is used"

"$SCRIPT_DIR/fm-cognee-manifest-check.sh" --manifest "$MANIFEST" --answer-file "$ANSWER_FILE"
Loading