Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- run: shellcheck bin/*.sh tests/*.sh
- run: shellcheck -x -P SCRIPTDIR bin/*.sh tests/*.sh

tests:
name: Behavior tests
Expand All @@ -36,7 +36,7 @@ jobs:
- run: |
set -eu
for test_script in tests/*.test.sh; do
"$test_script"
bash "$test_script"
done

invariants:
Expand Down
3 changes: 2 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,7 @@ On wake, in order of cheapness:
5. `heartbeat:` a heartbeat wake now reaches you only when the watcher's bash fleet-scan caught a captain-relevant status the per-wake path missed (no-change heartbeats are absorbed in bash, never surfaced), so treat it as "something turned up" and review the whole fleet: read each crewmate's current state with `bin/fm-crew-state.sh <id>` (the cheap first read - it reconciles the authoritative run-step over a possibly-stale status-log line, so a crewmate whose gate you already resolved no longer reads as still parked), peek panes that look off, check PR-ready tasks for merge, reconcile data/backlog.md, then re-arm the watcher.
Do not report that the fleet is unchanged.

When the picture is unclear or a display surface needs the shared decision model, run `bin/fm-supervise.sh` for a read-only checklist or `bin/fm-supervise.sh --json` for the `firstmate.supervision.v1` model. The command may report watcher proof as `unknown` when the current sandbox cannot see the watcher process; prove liveness with `bin/fm-watch-arm.sh` or `bin/fm-watch-session.sh --status` before treating that as an actual down watcher.
When the picture is unclear or a display surface needs the shared decision model, run `bin/fm-supervise.sh` for a read-only checklist or `bin/fm-supervise.sh --json` for the `firstmate.supervision.v1` model. For PRs, its `ci_state` combines GitHub commit status and check-runs; failing, cancelled, timed-out, action-required, startup-failure, or stale check-runs are not green. The command may report watcher proof as `unknown` when the current sandbox cannot see the watcher process; prove liveness with `bin/fm-watch-arm.sh` or `bin/fm-watch-session.sh --status` before treating that as an actual down watcher.

Heartbeats back off exponentially while they are the only wakes firing (600s doubling to a 2h cap - an idle fleet stops burning turns); any signal, stale, or check wake resets the cadence to the base interval.
Due per-task checks run before signal scanning so chatty crewmate status updates cannot starve slow polls like merge detection.
Expand Down Expand Up @@ -632,6 +632,7 @@ Map firstmate's real backlog operations to the approved commands:
- Manage dependencies: `tasks-axi block <id> --by <other>` and `tasks-axi unblock <id> --by <other>`, then `tasks-axi ready` to list queued work with no unresolved blockers.
This is a dependency check only; future-dated items still stay queued until their date arrives.
- Read an item's full notes: `tasks-axi show <id> --full`.
- Do not invent undocumented flags such as `tasks-axi list --json` or `tasks-axi ready --json`; use each command's `--help` before adding flags, because not every verb supports JSON output.
- Hand a task off to a secondmate home: keep using `bin/fm-backlog-handoff.sh <secondmate-id> <item-key>...`; do not call bare `tasks-axi mv` for this path, because the helper resolves and validates the secondmate home before moving anything.
- Normalize the file: `tasks-axi render` rewrites every id'd task in canonical form and leaves free-form lines untouched.

Expand Down
17 changes: 15 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ See the [no-mistakes quick start](https://kunchenguid.github.io/no-mistakes/star
- Helper scripts in `bin/` are plain bash.
Each starts with a usage header comment; keep it accurate when you change behavior.
Test scripts and helpers in `tests/` are plain bash too.
`shellcheck bin/*.sh tests/*.sh` must pass, and CI enforces it.
`shellcheck -x -P SCRIPTDIR bin/*.sh tests/*.sh` must pass, and CI enforces it.
- Changes to harness adapters (launch templates in `bin/fm-spawn.sh`, facts in `.agents/skills/harness-adapters/SKILL.md`) must be verified empirically against the real harness, never written from documentation alone.
- In Markdown, put each full sentence on its own line.

Expand All @@ -57,7 +57,7 @@ Check and test the toolbelt before pushing:

```sh
bash -n bin/*.sh # syntax-check the toolbelt
shellcheck bin/*.sh tests/*.sh # lint the toolbelt and behavior tests; CI enforces this
shellcheck -x -P SCRIPTDIR bin/*.sh tests/*.sh # lint the toolbelt and behavior tests; CI enforces this
for test_script in tests/*.test.sh; do "$test_script"; done # behavior tests, matching CI
tests/fm-wake-queue.test.sh # durable wake queue losslessness, catch-up, double-drain, duplicate-collapse, and drain liveness guard tests
tests/fm-watcher-lock.test.sh # watcher singleton, lock-race, watch-arm liveness, and guard-warning tests
Expand All @@ -71,15 +71,28 @@ tests/fm-composer-ghost.test.sh # dim-ghost stripping, ghost-only comp
tests/fm-afk-inject-e2e.test.sh # private-socket end-to-end test of the afk injection path (partial-input deferral, swallowed-Enter retry)
tests/fm-bootstrap.test.sh # bootstrap dependency and feature-probe tests
tests/fm-fleet-sync.test.sh # project clone refresh: safe detached recovery, STUCK drift reports, benign skips, and bootstrap relay
tests/fm-backlog-audit.test.sh # read-only backlog/state drift audit findings and no-change contract
tests/fm-route.test.sh # deterministic route profiles, overrides, risk flags, and downgrade handling
tests/fm-x-mode.test.sh # X-mode poll, inbox context round-trip, reply threading, dry-run preview, and .env-presence activation tests
tests/fm-memory-lookup.test.sh # manual Cognee memory lookup fallback, source-path verification, and optional brief append
tests/fm-cognee-lookup-gate.test.sh # fail-closed Cognee automatic/manual gate markers and unsafe-evidence rejection
tests/fm-cognee-lookup.test.sh # Cognee dry-run/live lookup wrapper, redacted telemetry, retry, and source verification behavior
tests/fm-cognee-session-cost-probe.test.sh # disabled Cognee session/cost probe planner, endpoint allowlist, and redacted JSONL output
tests/fm-cognee-source-verify.test.sh # Cognee answer reference parsing, manifest matching, local source reopen, and telemetry
tests/fm-cognee-telemetry.test.sh # secret-safe Cognee telemetry schema, redaction flags, IDs, and env-file loading
tests/fm-cognee-brief-rules.test.sh # generated briefs include the trial-only, hint-only Cognee memory rules
tests/fm-tangle-guard.test.sh # primary-checkout tangle detection and spawn/brief isolation tests
tests/fm-spawn-batch.test.sh # batch dispatch and FM_HOME project-path scoping tests
tests/fm-spawn-route.test.sh # spawn records route profile/model/effort metadata without changing launch behavior
tests/fm-update.test.sh # fast-forward-only self-update, reread, nudge, dedup, and skip-safety tests
tests/fm-secondmate-sync.test.sh # local-HEAD secondmate sync, no-fetch, bootstrap nudge gating, and spawn hook tests
tests/fm-secondmate-lifecycle-e2e.test.sh # persistent secondmate routing, seeding, backlog handoff, spawn, recovery, teardown, and FM_HOME flow tests
tests/fm-secondmate-safety.test.sh # secondmate home safety, idle charter, handoff validation, and teardown boundary tests
tests/fm-teardown.test.sh # fm-teardown.sh landed-work safety and reminder checks: fork-remote allow, squash/content landings, dirty and unlanded refusals, PR-head metadata, tasks-axi reminder, --force override
tests/fm-crew-state.test.sh # fm-crew-state.sh current-state reconciliation: run-step authority including closed panes, stale needs-decision/blocked superseded by a resumed run, genuine-parked, cross-branch attribution, pane/status-log fallback, scout skip, torn-down/missing-meta graceful
tests/fm-task-identity.test.sh # task branch/meta identity guard for PR check, diff review, and teardown helpers
tests/fm-watch-session.test.sh # durable home-scoped watcher tmux runner start, status, stop, restart, and AFK behavior
tests/fm-supervision-model.test.sh # read-only supervision checklist and `firstmate.supervision.v1` JSON/schema output
[ "$(readlink CLAUDE.md)" = "AGENTS.md" ]
[ "$(readlink .claude/skills)" = "../.agents/skills" ]
tmp=$(mktemp -d) && printf 'done: smoke\n' > "$tmp/smoke.status" && FM_STATE_OVERRIDE="$tmp" FM_SIGNAL_GRACE=1 FM_POLL=1 FM_HEARTBEAT=999999 bin/fm-watch-arm.sh # watcher re-arm smoke test (prints arm status, then an actionable signal)
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ Outside tmux, crewmates land in a detached `firstmate` session you can attach to
You chat with the first mate.
It routes each request to a crewmate in its own tmux window and git worktree, supervises the fleet with a zero-token event-driven watcher, and brings you finished PRs, approved local merges, or investigation reports.
When the current fleet state is unclear, `bin/fm-supervise.sh` gives a passive read-only checklist, and `bin/fm-supervise.sh --json` exposes the same shared model for display tools such as Radar.
For PRs, that model combines GitHub commit status and check-runs before deciding whether CI is green, pending, failed, absent, or unknown.
Persistent secondmate homes are linked firstmate worktrees; startup syncs live ones and secondmate launch syncs the target home to the primary default-branch commit without fetching from origin when it is safe.
When a routed request goes to a secondmate, firstmate marks it so the answer returns through status or a document pointer; direct typing into that secondmate window stays conversational.
A presence-gated sub-supervisor (`/afk`) can self-handle routine events and batch only what matters while you step away.
Expand Down Expand Up @@ -139,6 +140,7 @@ Agent-only reference skills live under `.agents/skills/` and are loaded by first

- [docs/architecture.md](docs/architecture.md) - how the crew, supervision, worktrees, secondmates, and project modes work.
- [docs/configuration.md](docs/configuration.md) - environment variables, `FM_HOME`, optional X mode, the files you set, and harness support.
- [docs/cognee-policy.md](docs/cognee-policy.md) - the trial-only, hint-only Cognee memory policy and production gates.
- [docs/scripts.md](docs/scripts.md) - the `bin/` toolbelt reference.
- [`AGENTS.md`](AGENTS.md) - firstmate's full operating manual for the orchestrator agent.
- [CONTRIBUTING.md](CONTRIBUTING.md) - how to contribute, including the dev/test commands.
Expand Down
1 change: 1 addition & 0 deletions bin/fm-brief.sh
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ STATUS_FILE=$(shell_quote "$STATE/$ID.status")
COGNEE_BRIEF_RULES=$(cat <<'EOF'
# Cognee memory hints
Cognee is memory/context only. It is not proof, source of truth, durable archive, or action authority.
Official docs expose raw readback and session/model cost surfaces, but Firstmate still treats raw retention/source-authority guarantees and per-wrapper-call cost correlation as unproven.
Do not run automatic Cognee lookup for every task.
Use a Cognee hint only when this brief says all of these are true:
- Firstmate manually performed the lookup.
Expand Down
5 changes: 5 additions & 0 deletions bin/fm-cognee-lookup-gate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ Evidence reports must contain these exact gate markers:

Trial-only evidence is recognized but still blocks automatic lookup:
FM_COGNEE_GATE_COST_USAGE_EVIDENCE=session_window_only

Official Cognee docs expose raw readback and session/model cost surfaces. Those
surfaces are not enough for automatic lookup unless the local evidence set also
proves production raw source-authority guarantees and safe per-wrapper-call
cost correlation.
EOF
}

Expand Down
22 changes: 20 additions & 2 deletions bin/fm-cognee-lookup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@ usage() {
usage: fm-cognee-lookup.sh [--dry-run] --query <text> [--manifest <manifest.tsv|manifest.jsonl> --answer-file <answer.txt>]
fm-cognee-lookup.sh <query text>

Live mode uses only already-exported environment variables:
Live mode uses already-exported environment variables, plus allowlisted names
from FM_COGNEE_ENV_FILE when set:
COGNEE_BASE_URL
COGNEE_API_KEY
COGNEE_DATASET_ID or FM_COGNEE_DATASET_ALIAS
FM_COGNEE_MANIFEST or --manifest
FM_COGNEE_TIMEOUT_MS defaults to 30000 and sets connect/request timeouts

It can be used through:
FM_COGNEE_LOOKUP_CMD=/absolute/path/to/bin/fm-cognee-lookup.sh
Expand Down Expand Up @@ -60,6 +62,17 @@ dataset_id_hash() {
fi
}

fm_cognee_timeout_ms() {
local value=${FM_COGNEE_TIMEOUT_MS:-30000}
case "$value" in ''|*[!0-9]*) value=30000 ;; esac
[ "$value" -ge 1 ] || value=30000
printf '%s' "$value"
}

fm_cognee_timeout_seconds() {
awk -v ms="$(fm_cognee_timeout_ms)" 'BEGIN { printf "%.3f", ms / 1000 }'
}

has_live_dataset_selector() {
if [ -n "${COGNEE_DATASET_ID:-}" ] && is_uuid "$COGNEE_DATASET_ID"; then
return 0
Expand Down Expand Up @@ -91,7 +104,7 @@ live_telemetry_log() {
fm_cognee_telemetry_log_api_attempt \
search POST /api/v1/search false "$status" "$error_class" "$http_status" "$retryable" \
"$attempt_number" "${FM_COGNEE_MAX_ATTEMPTS:-3}" "$is_retry" "$retry_reason" \
"$latency_ms" "${FM_COGNEE_TIMEOUT_MS:-}" "$verification_outcome" true "$parsed_source_count" \
"$latency_ms" "$(fm_cognee_timeout_ms)" "$verification_outcome" true "$parsed_source_count" \
"" unknown missing_vendor_metadata false "$FM_COGNEE_RUN_ID" "$request_id" "$FM_COGNEE_LOGICAL_SEARCH_ID" \
"$dataset_alias_value" "$dataset_id_hash_value" "${FM_COGNEE_SEARCH_TYPE:-RAG_COMPLETION}" "$top_k" true \
"$request_body_bytes" "$response_body_bytes" "$parsed_source_count" "$final_attempt"
Expand Down Expand Up @@ -281,6 +294,7 @@ if ! "$DRY_RUN"; then
fi

TMP_DIR=$(mktemp -d "${TMPDIR:-/tmp}/fm-cognee-live.XXXXXX")
# shellcheck disable=SC2317 # Invoked by trap.
cleanup_live() { rm -rf "$TMP_DIR"; }
trap cleanup_live EXIT
PAYLOAD="$TMP_DIR/search.json"
Expand All @@ -300,6 +314,7 @@ if ! "$DRY_RUN"; then
http_status=0
retryable=false
curl_rc=0
timeout_seconds=$(fm_cognee_timeout_seconds)
request_body_bytes=$(wc -c < "$PAYLOAD" | tr -d ' ')
response_body_bytes=0
attempt_latency=0
Expand All @@ -311,6 +326,8 @@ if ! "$DRY_RUN"; then
attempt_start_ms=$(fm_cognee_telemetry_now_ms)
set +e
http_status=$(curl -sS -o "$BODY" -w '%{http_code}' \
--connect-timeout "$timeout_seconds" \
--max-time "$timeout_seconds" \
-X POST "$endpoint" \
-H "X-Api-Key: $COGNEE_API_KEY" \
-H "Content-Type: application/json" \
Expand Down Expand Up @@ -405,6 +422,7 @@ fi
[ -n "$ANSWER_FILE" ] || die "--answer-file is required when --manifest is used"

TMP_OUT=$(mktemp "${TMPDIR:-/tmp}/fm-cognee-lookup.XXXXXX")
# shellcheck disable=SC2317 # Invoked by trap.
cleanup_lookup() { rm -f "$TMP_OUT"; }
trap cleanup_lookup EXIT

Expand Down
4 changes: 2 additions & 2 deletions bin/fm-cognee-manifest-check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ for field in $required_fields; do
done

field_value() {
local name=$1 idx=${IDX[$1]}
local idx=${IDX[$1]}
printf '%s' "${COLS[$idx]:-}"
}

Expand Down Expand Up @@ -176,7 +176,7 @@ validate_current_row() {
source_path=$(field_value source_path)
source_path_lc=$(printf '%s' "$source_path" | tr '[:upper:]' '[:lower:]')
case "$source_path_lc" in
*secret*|*token*|*api_key*|*password*|*credential*|*auth*|*bearer*|*cookie*|*private_key*|*.env*|*session*|*oauth*|*signed*)
*secret*|*token*|*api_key*|*password*|*credential*|*auth*|*bearer*|*cookie*|*private_key*|*.env*|*session*|*signed*)
TELEMETRY_STATUS=blocked
TELEMETRY_ERROR_CLASS=path_risk_scan_failed
TELEMETRY_SOURCE_OUTCOME=path_risk_scan_failed
Expand Down
6 changes: 4 additions & 2 deletions bin/fm-cognee-session-cost-probe.sh
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,9 @@ done
[ -n "$WINDOW_START_UTC" ] || die missing_required_args
[ -n "$WINDOW_END_UTC" ] || die missing_required_args
[ -n "$OUTPUT_JSONL" ] || die missing_required_args
[ -r "$TELEMETRY" ] && [ ! -d "$TELEMETRY" ] || die telemetry_unreadable
if [ ! -r "$TELEMETRY" ] || [ -d "$TELEMETRY" ]; then
die telemetry_unreadable
fi

case "$MAX_SESSIONS" in
''|*[!0-9]*) die invalid_max_sessions ;;
Expand Down Expand Up @@ -151,7 +153,7 @@ validate_endpoint() {
fi
[ "$method" = GET ] || return 1
case "$path" in
/health|/openapi.json|/api/v1/sessions|/api/v1/sessions/{session_id}|/api/v1/sessions/cost-by-model)
/health|/openapi.json|/api/v1/sessions|"/api/v1/sessions/{session_id}"|/api/v1/sessions/cost-by-model)
;;
*)
return 1
Expand Down
15 changes: 11 additions & 4 deletions bin/fm-cognee-telemetry-lib.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
#!/usr/bin/env bash
# Secret-safe local JSONL telemetry helpers for Cognee wrapper operations.
#
# Callers pass only labels, counters, timings, and cost classifications. This
# helper never receives prompt text, answer bodies, source bodies, auth headers,
# API keys, cookies, signed URLs, bearer tokens, or secret values.
# Telemetry callers pass only labels, counters, timings, and cost classifications.
# This helper's safe env-file loader may read allowlisted Cognee connection names,
# but telemetry events never receive or write prompt text, answer bodies, source
# bodies, auth headers, API keys, cookies, signed URLs, bearer tokens, base URLs,
# or secret values.

fm_cognee_env_trim() {
local value=$1
Expand Down Expand Up @@ -63,7 +65,9 @@ fm_cognee_load_env_file() {
value=$(fm_cognee_env_trim "$value")
case "$key" in
''|[!A-Za-z_]*|*[!A-Za-z0-9_]*)
# shellcheck disable=SC2034 # Read by callers after fm_cognee_load_env_file returns.
FM_COGNEE_ENV_FILE_LOAD_ERROR=env_file_malformed
# shellcheck disable=SC2034 # Read by callers after fm_cognee_load_env_file returns.
FM_COGNEE_ENV_FILE_LOAD_LINE=$line_no
return 1
;;
Expand All @@ -75,8 +79,11 @@ fm_cognee_load_env_file() {
last=${value#"${value%?}"}
if [ "$first" = "'" ] || [ "$first" = '"' ]; then
if [ "$last" != "$first" ] || [ "${#value}" -lt 2 ]; then
# shellcheck disable=SC2034 # Read by callers after fm_cognee_load_env_file returns.
FM_COGNEE_ENV_FILE_LOAD_ERROR=env_file_malformed
# shellcheck disable=SC2034 # Read by callers after fm_cognee_load_env_file returns.
FM_COGNEE_ENV_FILE_LOAD_LINE=$line_no
# shellcheck disable=SC2034 # Read by callers after fm_cognee_load_env_file returns.
FM_COGNEE_ENV_FILE_LOAD_KEY=$key
return 1
fi
Expand All @@ -87,7 +94,7 @@ fm_cognee_load_env_file() {

if [ -z "${!key+x}" ] || [ -z "${!key}" ]; then
printf -v "$key" '%s' "$value"
export "$key"
export "${key?}"
fi
done < "$env_file"
return 0
Expand Down
3 changes: 2 additions & 1 deletion bin/fm-cognee-verify-source.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ set -eu
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
. "$SCRIPT_DIR/fm-cognee-telemetry-lib.sh"
export FM_COGNEE_TELEMETRY_FILE="${FM_COGNEE_TELEMETRY_FILE:-$(fm_cognee_telemetry_default_path)}"
export FM_COGNEE_TELEMETRY_START_MS="$(fm_cognee_telemetry_now_ms)"
FM_COGNEE_TELEMETRY_START_MS=$(fm_cognee_telemetry_now_ms)
export FM_COGNEE_TELEMETRY_START_MS

usage() {
echo "usage: fm-cognee-verify-source.sh --manifest <manifest.jsonl> --answer <answer.txt>" >&2
Expand Down
2 changes: 1 addition & 1 deletion bin/fm-memory-lookup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
set -eu

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
FM_ROOT="${FM_ROOT_OVERRIDE:-$(cd "$SCRIPT_DIR/.." && pwd)}"
. "$SCRIPT_DIR/fm-cognee-telemetry-lib.sh"
TELEMETRY_START_MS=$(fm_cognee_telemetry_now_ms)

Expand Down Expand Up @@ -150,6 +149,7 @@ append_brief_section() {
}

TMP_DIR=$(mktemp -d "${TMPDIR:-/tmp}/fm-memory-lookup.XXXXXX")
# shellcheck disable=SC2317 # Invoked by trap.
cleanup() { rm -rf "$TMP_DIR"; }
trap cleanup EXIT

Expand Down
2 changes: 1 addition & 1 deletion bin/fm-route.sh
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ contains_git_danger() {
}

join_reasons() {
local result= part
local result="" part
for part in "$@"; do
[ -n "$part" ] || continue
if [ -z "$result" ]; then
Expand Down
Loading
Loading