Multi-harness agent governance — Phase 0 + 1 (dogfooded) by apple-techie · Pull Request #1 · apple-techie/loop-orchestrator

apple-techie · 2026-06-13T21:14:08Z

Summary

Phase 0 + 1 of multi-harness agent governance — making harness selection a governed, enforceable decision instead of declarative-inert + dynamic-unchecked. Design spec: docs/plans/harness-governance.md. All additive (empty policy = today's behavior exactly), gate green (make check + ruff + 393 pytest, +48 over baseline).

What shipped

T0010 — per-harness governance fields in the registry (capability_tags, cost_tier, autonomy_class, auth_requirement, health_probe, drift markers), empty-safe; frozen probe/field/oneshot verbs untouched.
T0011 — harness-registry roster + health CLI verbs (additive; the test fake learns them).
T0012 — HarnessPolicy on EngineConfig (allow/deny, cost ceiling, autonomy cap, role↔tag map); defaults reproduce today.
T0013 — classify_harness gate pass (pure; roster=None default = no change) — the same defense-in-depth shape as the security gate, applied to harness choice.
T0014 — boot-time brain/ingest harness validation + the harness roster & selection rubric in the brain prompt.

Dogfooded — built by loop-orchestrator orchestrating itself

This was built by a self-hosted govern loop session (claude brain, isolated worktree so it couldn't touch the live daemons), through a live dual-vendor model outage (Claude Fable 5 + codex both went unavailable mid-build) — it failed over to a sonnet brain + opus build lane and lost zero project state, validating the stateless-boot continuity model under real duress.

Findings the dogfood surfaced (in `docs/board/harness-governance.md`) — Phase 2 inputs

F1 — governance must validate dispatch/steer targets (agent vs shell lane), not only add_lane harness choice.
F2 — post-relaunch in-session context loss.
F3 — model-unavailable should be its own failure_kind; registry should support per-harness model failover; availability is per-(harness, model).
F4 — the brain self-authorized a deferred/sign-off-gated phase; objective-fences should make crossing into a deferred phase an escalate/block, not a brain judgment call.

Deferred (needs sign-off — not in this PR)

Phase 2 (per-harness readiness/health contract — the linchpin, touches the frozen status contract) and Phases 3–5 (deck, conditional worktree isolation, lane handoff). Build the governance now; buy the parallelism machinery only when concurrency > 1.

🤖 Generated with Claude Code

…T0010, T0012) Phase 0+1 batch 1 of the harness-governance plan (docs/plans/ harness-governance.md A.1/A.3). Additive only: - registry: capability_tags, cost_tier, autonomy_class, auth_requirement, health_probe, drift_pins appended to HARNESS_REGISTRY_FIELDS and populated for all 12 harnesses from the A.3 profile matrix; empty-safe (unset -> ''); frozen probe/field/oneshot verbs and the existing 8 fields untouched - engine config: HarnessPolicy dataclass (allow/deny, cost_ceiling, autonomy_cap, role_tag_map) on EngineConfig as engine.harness_policy; empty policy == today's pass-through, merged via the existing _merge recursion which ignores unknown keys Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… 1 @ b7cac92) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…e dogfood)

…0011, T0013) Phase 0+1 batch 2 of the harness-governance plan. Additive only; frozen probe/field/oneshot verbs and the status contract untouched: - registry: 'roster [--json]' (every harness + governance fields + present flag, contract_version 1) and 'health <name>' (single word ok|missing|unauthenticated|unhealthy; no declared probe = PATH check = today). Sourced helpers harness_present/harness_health. - fake registry answers roster/health (unstubbed = empty roster / ok) - gate: pure classify_harness(action, config, roster=None) above the shape rules per plan A.2 (deny/unknown -> blocked; missing/sick/over cost-ceiling/over autonomy-cap/high-drift+unattended+high-risk -> destructive), plus govern_add_lanes role-default rewrite emitting governance events. roster=None or empty policy = exact pass-through. - policy: role_defaults + high_risk_roles (default [infra]) fields - loop: per-cycle roster snapshot via substrate.harness_roster(), only resolved when a non-empty policy is configured (call profile and behavior byte-identical for the empty policy); roster failure degrades to pass-through with an event Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Coord acceptance of batch 2 (T0011 roster/health verbs, T0013 classify_harness gate), committed at 99c6596. Gate verified green: 384 passed / 0 failed at the T0011+T0013 committed state (393/0 including the uncommitted T0014 work tree). A flip alone fails loop-task-lint.sh, so both files move to tasks/archive/ (same pattern as 56e58f7 for T0010+T0012). Coord ruling — batch-2 judgment call 1 (empty-policy semantics): HarnessPolicy() defaults are PASS-THROUGH by design; typo/unknown-to-roster blocking is opt-in via an explicit non-default policy field. The behavior built in 99c6596 is correct as-is — the proposed one-line change was NOT applied. Invariant recorded in the T0013 task notes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

STEP 2 of the batch-2 acceptance sequence. F3 was already re-logged at 65b3175; this adds the missing F2 entry. F2's content is unrecoverable — an exhaustive search (git history, loop page, checkpoint, log, coord lane, all processed mailbox messages) found only "F2 lost/unconfirmed" references, never its substance — so the gap is recorded explicitly per the coord directive. Also brings the sprint status table in line with the tasks/ source of truth: T0010 and T0012 (accepted at b7cac92/56e58f7) and T0011/T0013 (accepted at 99c6596, flipped at 15b8650) are done; T0014 remains open pending acceptance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Phase 1, last task of the multi-harness governance dogfood. ADDITIVE: empty/default policy = today's behavior, byte-identical prompt. (1) Boot-time fail-fast (plan A.2): validate_boot_config() checks the brain harness — and the ingest harness when ingest runs headless — against HarnessPolicy.brain_allow (new field; empty list = unrestricted) AND against a non-empty registry oneshot_template, so a misconfigured brain (e.g. pi, which has no one-shot) aborts at boot with a clear message instead of raising on the first cycle. cmd_once/cmd_watch/cmd_restart return exit 2 on failure. An env override (LOOP_ENGINE_BRAIN_CMD/INGEST_CMD) replaces the registry one-shot, so the template check is skipped for that role. (2) Brain-prompt roster + rubric (plan A.4): _assemble_prompt now appends the allowed+present+healthy harness roster and the condensed "when we choose X" selection rubric — but only when a non-empty harness_policy is configured, so the empty policy keeps the prompt and per-cycle call profile unchanged. The per-cycle roster snapshot is resolved once before the prompt and shared with the gate (previously resolved post-decision for the gate alone). Frozen surfaces untouched (status word, registry probe/field/oneshot verbs). Gate green: make check 14/14, ruff check + format clean, pytest 393 passed / 0 failed (384 baseline + 9 new T0014 tests; all pre-existing tests unchanged). No reinstall, no daemon touch, no push. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Coord accepted T0014 (boot validation + brain-prompt roster rubric) this cycle, pending gate green. Gate verified on HEAD b864d0d: make check-all → bash 14/14, ruff clean, pytest 393 passed / 0 failed. Frontmatter flipped open -> done with accepted: 2026-06-13; moved to tasks/archive/ (a done file in tasks/ fails loop-task-lint.sh's status<->location invariant — same move as T0010–T0013). This completes Phase 0+1: T0010–T0014 all accepted and archived. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request implements Phase 0 and Phase 1 of the Multi-Harness Agent Governance plan, adding per-harness governance fields, policy configurations, gate classification, boot-time validation, and a selection rubric in the brain prompt. The review feedback highlights several robustness improvements, including preventing a boot-time crash when a harness is empty, avoiding string formatting issues where explicit null values could format as "None", and escaping double quotes in the bash-based roster CLI to prevent malformed JSON generation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-13T21:20:32Z

+    for label, harness, override_var in checks:
+        if allow and harness not in allow:


If harness is configured as None or an empty string in the YAML configuration, passing it directly to substrate.harness_field will result in a TypeError inside subprocess.run (since None is not a valid argument type). This TypeError is not caught by the except SubstrateError block, causing the engine to crash at boot. Adding a check for falsy/empty values prevents this crash.

for label, harness, override_var in checks: if not harness: failures.append(f"{label}.harness is empty or not specified") continue if allow and harness not in allow:

gemini-code-assist · 2026-06-13T21:20:32Z

+    if policy.allow and harness not in policy.allow:
+        return False
+    if role and role in policy.role_tag_map:
+        tags = set(str(entry.get("capability_tags", "")).split(","))


Using str(entry.get("capability_tags", "")) converts an explicit None (parsed from JSON null) into the string "None". This can lead to unexpected behavior when matching tags. Using entry.get("capability_tags") or "" is safer and more robust.

Suggested change

tags = set(str(entry.get("capability_tags", "")).split(","))

tags = set((entry.get("capability_tags") or "").split(","))

gemini-code-assist · 2026-06-13T21:20:32Z

+        lines.append(
+            f"{name} tags={entry.get('capability_tags', '')} "
+            f"cost={entry.get('cost_tier', '')} autonomy={entry.get('autonomy_class', '')} "
+            f"drift={entry.get('drift_pins', '')}"
+        )


If any of the registry fields (like capability_tags, cost_tier, etc.) are explicitly None (parsed from JSON null), entry.get(..., '') will return None, which then formats as "None" in the brain prompt. Using or '' instead of , '' ensures they default to empty strings.

Suggested change

lines.append(

f"{name} tags={entry.get('capability_tags', '')} "

f"cost={entry.get('cost_tier', '')} autonomy={entry.get('autonomy_class', '')} "

f"drift={entry.get('drift_pins', '')}"

)

lines.append(

f"{name} tags={entry.get('capability_tags') or ''} "

f"cost={entry.get('cost_tier') or ''} autonomy={entry.get('autonomy_class') or ''} "

f"drift={entry.get('drift_pins') or ''}"

)

gemini-code-assist · 2026-06-13T21:20:32Z

+          printf '    {"name": "%s", "present": %s, "capability_tags": "%s", "cost_tier": "%s", "autonomy_class": "%s", "auth_requirement": "%s", "health_probe": "%s", "drift_pins": "%s"}' \
+            "$n" "$present" \
+            "$(harness_field "$n" capability_tags)" \
+            "$(harness_field "$n" cost_tier)" \
+            "$(harness_field "$n" autonomy_class)" \
+            "$(harness_field "$n" auth_requirement)" \
+            "$(harness_field "$n" health_probe)" \
+            "$(harness_field "$n" drift_pins)"


The JSON roster generation uses plain printf interpolation. While most fields are static identifiers, health_probe is a shell command and is highly likely to contain double quotes (e.g., pgrep -f "process" or curl -H "Auth..."). If it does, the generated JSON will be malformed. Escaping double quotes in health_probe prevents this.

Suggested change

printf ' {"name": "%s", "present": %s, "capability_tags": "%s", "cost_tier": "%s", "autonomy_class": "%s", "auth_requirement": "%s", "health_probe": "%s", "drift_pins": "%s"}' \

"$n" "$present" \

"$(harness_field "$n" capability_tags)" \

"$(harness_field "$n" cost_tier)" \

"$(harness_field "$n" autonomy_class)" \

"$(harness_field "$n" auth_requirement)" \

"$(harness_field "$n" health_probe)" \

"$(harness_field "$n" drift_pins)"

local probe

probe="$(harness_field "$n" health_probe)"

printf ' {"name": "%s", "present": %s, "capability_tags": "%s", "cost_tier": "%s", "autonomy_class": "%s", "auth_requirement": "%s", "health_probe": "%s", "drift_pins": "%s"}' \

"$n" "$present" \

"$(harness_field "$n" capability_tags)" \

"$(harness_field "$n" cost_tier)" \

"$(harness_field "$n" autonomy_class)" \

"$(harness_field "$n" auth_requirement)" \

"${probe//\"/\\\"}" \

"$(harness_field "$n" drift_pins)"

apple-techie and others added 12 commits June 12, 2026 17:31

docs(plan): multi-harness agent governance design (dogfood spec)

709958a

docs(board): local Jira-equivalent board for the governance sprint

c4ee04b

chore(tasks): T0010 + T0012 status -> done (coord acceptance of batch…

56e58f7

… 1 @ b7cac92) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

docs(board): F1 — dispatch-target governance gap (surfaced live by th…

321f68e

…e dogfood)

docs(board): F3 — model-unavailable failover (Fable 5 outage, live)

65b3175

docs(board): F4 — brain self-authorized deferred Phase 2 (live)

e710d08

gemini-code-assist Bot reviewed Jun 13, 2026

View reviewed changes

apple-techie merged commit e710d08 into main Jun 13, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-harness agent governance — Phase 0 + 1 (dogfooded)#1

Multi-harness agent governance — Phase 0 + 1 (dogfooded)#1
apple-techie merged 12 commits into
mainfrom
feature/harness-governance

apple-techie commented Jun 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		for label, harness, override_var in checks:
		if allow and harness not in allow:

	tags = set(str(entry.get("capability_tags", "")).split(","))
	tags = set((entry.get("capability_tags") or "").split(","))

Conversation

apple-techie commented Jun 13, 2026

Summary

What shipped

Dogfooded — built by loop-orchestrator orchestrating itself

Findings the dogfood surfaced (in docs/board/harness-governance.md) — Phase 2 inputs

Deferred (needs sign-off — not in this PR)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Findings the dogfood surfaced (in `docs/board/harness-governance.md`) — Phase 2 inputs