Multi-harness agent governance — Phase 0 + 1 (dogfooded)#1
Conversation
…T0010, T0012) Phase 0+1 batch 1 of the harness-governance plan (docs/plans/ harness-governance.md A.1/A.3). Additive only: - registry: capability_tags, cost_tier, autonomy_class, auth_requirement, health_probe, drift_pins appended to HARNESS_REGISTRY_FIELDS and populated for all 12 harnesses from the A.3 profile matrix; empty-safe (unset -> ''); frozen probe/field/oneshot verbs and the existing 8 fields untouched - engine config: HarnessPolicy dataclass (allow/deny, cost_ceiling, autonomy_cap, role_tag_map) on EngineConfig as engine.harness_policy; empty policy == today's pass-through, merged via the existing _merge recursion which ignores unknown keys Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… 1 @ b7cac92) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…0011, T0013) Phase 0+1 batch 2 of the harness-governance plan. Additive only; frozen probe/field/oneshot verbs and the status contract untouched: - registry: 'roster [--json]' (every harness + governance fields + present flag, contract_version 1) and 'health <name>' (single word ok|missing|unauthenticated|unhealthy; no declared probe = PATH check = today). Sourced helpers harness_present/harness_health. - fake registry answers roster/health (unstubbed = empty roster / ok) - gate: pure classify_harness(action, config, roster=None) above the shape rules per plan A.2 (deny/unknown -> blocked; missing/sick/over cost-ceiling/over autonomy-cap/high-drift+unattended+high-risk -> destructive), plus govern_add_lanes role-default rewrite emitting governance events. roster=None or empty policy = exact pass-through. - policy: role_defaults + high_risk_roles (default [infra]) fields - loop: per-cycle roster snapshot via substrate.harness_roster(), only resolved when a non-empty policy is configured (call profile and behavior byte-identical for the empty policy); roster failure degrades to pass-through with an event Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Coord acceptance of batch 2 (T0011 roster/health verbs, T0013 classify_harness gate), committed at 99c6596. Gate verified green: 384 passed / 0 failed at the T0011+T0013 committed state (393/0 including the uncommitted T0014 work tree). A flip alone fails loop-task-lint.sh, so both files move to tasks/archive/ (same pattern as 56e58f7 for T0010+T0012). Coord ruling — batch-2 judgment call 1 (empty-policy semantics): HarnessPolicy() defaults are PASS-THROUGH by design; typo/unknown-to-roster blocking is opt-in via an explicit non-default policy field. The behavior built in 99c6596 is correct as-is — the proposed one-line change was NOT applied. Invariant recorded in the T0013 task notes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
STEP 2 of the batch-2 acceptance sequence. F3 was already re-logged at 65b3175; this adds the missing F2 entry. F2's content is unrecoverable — an exhaustive search (git history, loop page, checkpoint, log, coord lane, all processed mailbox messages) found only "F2 lost/unconfirmed" references, never its substance — so the gap is recorded explicitly per the coord directive. Also brings the sprint status table in line with the tasks/ source of truth: T0010 and T0012 (accepted at b7cac92/56e58f7) and T0011/T0013 (accepted at 99c6596, flipped at 15b8650) are done; T0014 remains open pending acceptance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 1, last task of the multi-harness governance dogfood. ADDITIVE: empty/default policy = today's behavior, byte-identical prompt. (1) Boot-time fail-fast (plan A.2): validate_boot_config() checks the brain harness — and the ingest harness when ingest runs headless — against HarnessPolicy.brain_allow (new field; empty list = unrestricted) AND against a non-empty registry oneshot_template, so a misconfigured brain (e.g. pi, which has no one-shot) aborts at boot with a clear message instead of raising on the first cycle. cmd_once/cmd_watch/cmd_restart return exit 2 on failure. An env override (LOOP_ENGINE_BRAIN_CMD/INGEST_CMD) replaces the registry one-shot, so the template check is skipped for that role. (2) Brain-prompt roster + rubric (plan A.4): _assemble_prompt now appends the allowed+present+healthy harness roster and the condensed "when we choose X" selection rubric — but only when a non-empty harness_policy is configured, so the empty policy keeps the prompt and per-cycle call profile unchanged. The per-cycle roster snapshot is resolved once before the prompt and shared with the gate (previously resolved post-decision for the gate alone). Frozen surfaces untouched (status word, registry probe/field/oneshot verbs). Gate green: make check 14/14, ruff check + format clean, pytest 393 passed / 0 failed (384 baseline + 9 new T0014 tests; all pre-existing tests unchanged). No reinstall, no daemon touch, no push. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Coord accepted T0014 (boot validation + brain-prompt roster rubric) this cycle, pending gate green. Gate verified on HEAD b864d0d: make check-all → bash 14/14, ruff clean, pytest 393 passed / 0 failed. Frontmatter flipped open -> done with accepted: 2026-06-13; moved to tasks/archive/ (a done file in tasks/ fails loop-task-lint.sh's status<->location invariant — same move as T0010–T0013). This completes Phase 0+1: T0010–T0014 all accepted and archived. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request implements Phase 0 and Phase 1 of the Multi-Harness Agent Governance plan, adding per-harness governance fields, policy configurations, gate classification, boot-time validation, and a selection rubric in the brain prompt. The review feedback highlights several robustness improvements, including preventing a boot-time crash when a harness is empty, avoiding string formatting issues where explicit null values could format as "None", and escaping double quotes in the bash-based roster CLI to prevent malformed JSON generation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| for label, harness, override_var in checks: | ||
| if allow and harness not in allow: |
There was a problem hiding this comment.
If harness is configured as None or an empty string in the YAML configuration, passing it directly to substrate.harness_field will result in a TypeError inside subprocess.run (since None is not a valid argument type). This TypeError is not caught by the except SubstrateError block, causing the engine to crash at boot. Adding a check for falsy/empty values prevents this crash.
for label, harness, override_var in checks:
if not harness:
failures.append(f"{label}.harness is empty or not specified")
continue
if allow and harness not in allow:| if policy.allow and harness not in policy.allow: | ||
| return False | ||
| if role and role in policy.role_tag_map: | ||
| tags = set(str(entry.get("capability_tags", "")).split(",")) |
There was a problem hiding this comment.
Using str(entry.get("capability_tags", "")) converts an explicit None (parsed from JSON null) into the string "None". This can lead to unexpected behavior when matching tags. Using entry.get("capability_tags") or "" is safer and more robust.
| tags = set(str(entry.get("capability_tags", "")).split(",")) | |
| tags = set((entry.get("capability_tags") or "").split(",")) |
| lines.append( | ||
| f"{name} tags={entry.get('capability_tags', '')} " | ||
| f"cost={entry.get('cost_tier', '')} autonomy={entry.get('autonomy_class', '')} " | ||
| f"drift={entry.get('drift_pins', '')}" | ||
| ) |
There was a problem hiding this comment.
If any of the registry fields (like capability_tags, cost_tier, etc.) are explicitly None (parsed from JSON null), entry.get(..., '') will return None, which then formats as "None" in the brain prompt. Using or '' instead of , '' ensures they default to empty strings.
| lines.append( | |
| f"{name} tags={entry.get('capability_tags', '')} " | |
| f"cost={entry.get('cost_tier', '')} autonomy={entry.get('autonomy_class', '')} " | |
| f"drift={entry.get('drift_pins', '')}" | |
| ) | |
| lines.append( | |
| f"{name} tags={entry.get('capability_tags') or ''} " | |
| f"cost={entry.get('cost_tier') or ''} autonomy={entry.get('autonomy_class') or ''} " | |
| f"drift={entry.get('drift_pins') or ''}" | |
| ) |
| printf ' {"name": "%s", "present": %s, "capability_tags": "%s", "cost_tier": "%s", "autonomy_class": "%s", "auth_requirement": "%s", "health_probe": "%s", "drift_pins": "%s"}' \ | ||
| "$n" "$present" \ | ||
| "$(harness_field "$n" capability_tags)" \ | ||
| "$(harness_field "$n" cost_tier)" \ | ||
| "$(harness_field "$n" autonomy_class)" \ | ||
| "$(harness_field "$n" auth_requirement)" \ | ||
| "$(harness_field "$n" health_probe)" \ | ||
| "$(harness_field "$n" drift_pins)" |
There was a problem hiding this comment.
The JSON roster generation uses plain printf interpolation. While most fields are static identifiers, health_probe is a shell command and is highly likely to contain double quotes (e.g., pgrep -f "process" or curl -H "Auth..."). If it does, the generated JSON will be malformed. Escaping double quotes in health_probe prevents this.
| printf ' {"name": "%s", "present": %s, "capability_tags": "%s", "cost_tier": "%s", "autonomy_class": "%s", "auth_requirement": "%s", "health_probe": "%s", "drift_pins": "%s"}' \ | |
| "$n" "$present" \ | |
| "$(harness_field "$n" capability_tags)" \ | |
| "$(harness_field "$n" cost_tier)" \ | |
| "$(harness_field "$n" autonomy_class)" \ | |
| "$(harness_field "$n" auth_requirement)" \ | |
| "$(harness_field "$n" health_probe)" \ | |
| "$(harness_field "$n" drift_pins)" | |
| local probe | |
| probe="$(harness_field "$n" health_probe)" | |
| printf ' {"name": "%s", "present": %s, "capability_tags": "%s", "cost_tier": "%s", "autonomy_class": "%s", "auth_requirement": "%s", "health_probe": "%s", "drift_pins": "%s"}' \ | |
| "$n" "$present" \ | |
| "$(harness_field "$n" capability_tags)" \ | |
| "$(harness_field "$n" cost_tier)" \ | |
| "$(harness_field "$n" autonomy_class)" \ | |
| "$(harness_field "$n" auth_requirement)" \ | |
| "${probe//\"/\\\"}" \ | |
| "$(harness_field "$n" drift_pins)" |
Summary
Phase 0 + 1 of multi-harness agent governance — making harness selection a governed, enforceable decision instead of declarative-inert + dynamic-unchecked. Design spec:
docs/plans/harness-governance.md. All additive (empty policy = today's behavior exactly), gate green (make check+ ruff + 393 pytest, +48 over baseline).What shipped
capability_tags,cost_tier,autonomy_class,auth_requirement,health_probe, drift markers), empty-safe; frozenprobe/field/oneshotverbs untouched.harness-registry roster+healthCLI verbs (additive; the test fake learns them).HarnessPolicyonEngineConfig(allow/deny, cost ceiling, autonomy cap, role↔tag map); defaults reproduce today.classify_harnessgate pass (pure;roster=Nonedefault = no change) — the same defense-in-depth shape as the security gate, applied to harness choice.brain/ingestharness validation + the harness roster & selection rubric in the brain prompt.Dogfooded — built by loop-orchestrator orchestrating itself
This was built by a self-hosted
governloop session (claude brain, isolated worktree so it couldn't touch the live daemons), through a live dual-vendor model outage (Claude Fable 5 + codex both went unavailable mid-build) — it failed over to a sonnet brain + opus build lane and lost zero project state, validating the stateless-boot continuity model under real duress.Findings the dogfood surfaced (in
docs/board/harness-governance.md) — Phase 2 inputsadd_laneharness choice.model-unavailableshould be its ownfailure_kind; registry should support per-harness model failover; availability is per-(harness, model).Deferred (needs sign-off — not in this PR)
Phase 2 (per-harness readiness/health contract — the linchpin, touches the frozen status contract) and Phases 3–5 (deck, conditional worktree isolation, lane handoff). Build the governance now; buy the parallelism machinery only when concurrency > 1.
🤖 Generated with Claude Code