Skip to content

Multi-harness agent governance — Phase 0 + 1 (dogfooded)#1

Merged
apple-techie merged 12 commits into
mainfrom
feature/harness-governance
Jun 13, 2026
Merged

Multi-harness agent governance — Phase 0 + 1 (dogfooded)#1
apple-techie merged 12 commits into
mainfrom
feature/harness-governance

Conversation

@apple-techie

Copy link
Copy Markdown
Owner

Summary

Phase 0 + 1 of multi-harness agent governance — making harness selection a governed, enforceable decision instead of declarative-inert + dynamic-unchecked. Design spec: docs/plans/harness-governance.md. All additive (empty policy = today's behavior exactly), gate green (make check + ruff + 393 pytest, +48 over baseline).

What shipped

  • T0010 — per-harness governance fields in the registry (capability_tags, cost_tier, autonomy_class, auth_requirement, health_probe, drift markers), empty-safe; frozen probe/field/oneshot verbs untouched.
  • T0011harness-registry roster + health CLI verbs (additive; the test fake learns them).
  • T0012HarnessPolicy on EngineConfig (allow/deny, cost ceiling, autonomy cap, role↔tag map); defaults reproduce today.
  • T0013classify_harness gate pass (pure; roster=None default = no change) — the same defense-in-depth shape as the security gate, applied to harness choice.
  • T0014 — boot-time brain/ingest harness validation + the harness roster & selection rubric in the brain prompt.

Dogfooded — built by loop-orchestrator orchestrating itself

This was built by a self-hosted govern loop session (claude brain, isolated worktree so it couldn't touch the live daemons), through a live dual-vendor model outage (Claude Fable 5 + codex both went unavailable mid-build) — it failed over to a sonnet brain + opus build lane and lost zero project state, validating the stateless-boot continuity model under real duress.

Findings the dogfood surfaced (in docs/board/harness-governance.md) — Phase 2 inputs

  • F1 — governance must validate dispatch/steer targets (agent vs shell lane), not only add_lane harness choice.
  • F2 — post-relaunch in-session context loss.
  • F3model-unavailable should be its own failure_kind; registry should support per-harness model failover; availability is per-(harness, model).
  • F4 — the brain self-authorized a deferred/sign-off-gated phase; objective-fences should make crossing into a deferred phase an escalate/block, not a brain judgment call.

Deferred (needs sign-off — not in this PR)

Phase 2 (per-harness readiness/health contract — the linchpin, touches the frozen status contract) and Phases 3–5 (deck, conditional worktree isolation, lane handoff). Build the governance now; buy the parallelism machinery only when concurrency > 1.

🤖 Generated with Claude Code

apple-techie and others added 12 commits June 12, 2026 17:31
…T0010, T0012)

Phase 0+1 batch 1 of the harness-governance plan (docs/plans/
harness-governance.md A.1/A.3). Additive only:

- registry: capability_tags, cost_tier, autonomy_class, auth_requirement,
  health_probe, drift_pins appended to HARNESS_REGISTRY_FIELDS and populated
  for all 12 harnesses from the A.3 profile matrix; empty-safe (unset -> '');
  frozen probe/field/oneshot verbs and the existing 8 fields untouched
- engine config: HarnessPolicy dataclass (allow/deny, cost_ceiling,
  autonomy_cap, role_tag_map) on EngineConfig as engine.harness_policy;
  empty policy == today's pass-through, merged via the existing _merge
  recursion which ignores unknown keys

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… 1 @ b7cac92)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…0011, T0013)

Phase 0+1 batch 2 of the harness-governance plan. Additive only; frozen
probe/field/oneshot verbs and the status contract untouched:

- registry: 'roster [--json]' (every harness + governance fields +
  present flag, contract_version 1) and 'health <name>' (single word
  ok|missing|unauthenticated|unhealthy; no declared probe = PATH check =
  today). Sourced helpers harness_present/harness_health.
- fake registry answers roster/health (unstubbed = empty roster / ok)
- gate: pure classify_harness(action, config, roster=None) above the
  shape rules per plan A.2 (deny/unknown -> blocked; missing/sick/over
  cost-ceiling/over autonomy-cap/high-drift+unattended+high-risk ->
  destructive), plus govern_add_lanes role-default rewrite emitting
  governance events. roster=None or empty policy = exact pass-through.
- policy: role_defaults + high_risk_roles (default [infra]) fields
- loop: per-cycle roster snapshot via substrate.harness_roster(), only
  resolved when a non-empty policy is configured (call profile and
  behavior byte-identical for the empty policy); roster failure degrades
  to pass-through with an event

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Coord acceptance of batch 2 (T0011 roster/health verbs, T0013 classify_harness
gate), committed at 99c6596. Gate verified green: 384 passed / 0 failed at the
T0011+T0013 committed state (393/0 including the uncommitted T0014 work tree).
A flip alone fails loop-task-lint.sh, so both files move to tasks/archive/
(same pattern as 56e58f7 for T0010+T0012).

Coord ruling — batch-2 judgment call 1 (empty-policy semantics): HarnessPolicy()
defaults are PASS-THROUGH by design; typo/unknown-to-roster blocking is opt-in
via an explicit non-default policy field. The behavior built in 99c6596 is
correct as-is — the proposed one-line change was NOT applied. Invariant recorded
in the T0013 task notes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
STEP 2 of the batch-2 acceptance sequence. F3 was already re-logged at 65b3175;
this adds the missing F2 entry. F2's content is unrecoverable — an exhaustive
search (git history, loop page, checkpoint, log, coord lane, all processed
mailbox messages) found only "F2 lost/unconfirmed" references, never its
substance — so the gap is recorded explicitly per the coord directive. Also
brings the sprint status table in line with the tasks/ source of truth: T0010
and T0012 (accepted at b7cac92/56e58f7) and T0011/T0013 (accepted at 99c6596,
flipped at 15b8650) are done; T0014 remains open pending acceptance.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 1, last task of the multi-harness governance dogfood. ADDITIVE:
empty/default policy = today's behavior, byte-identical prompt.

(1) Boot-time fail-fast (plan A.2): validate_boot_config() checks the brain
harness — and the ingest harness when ingest runs headless — against
HarnessPolicy.brain_allow (new field; empty list = unrestricted) AND against a
non-empty registry oneshot_template, so a misconfigured brain (e.g. pi, which
has no one-shot) aborts at boot with a clear message instead of raising on the
first cycle. cmd_once/cmd_watch/cmd_restart return exit 2 on failure. An env
override (LOOP_ENGINE_BRAIN_CMD/INGEST_CMD) replaces the registry one-shot, so
the template check is skipped for that role.

(2) Brain-prompt roster + rubric (plan A.4): _assemble_prompt now appends the
allowed+present+healthy harness roster and the condensed "when we choose X"
selection rubric — but only when a non-empty harness_policy is configured, so
the empty policy keeps the prompt and per-cycle call profile unchanged. The
per-cycle roster snapshot is resolved once before the prompt and shared with
the gate (previously resolved post-decision for the gate alone).

Frozen surfaces untouched (status word, registry probe/field/oneshot verbs).
Gate green: make check 14/14, ruff check + format clean, pytest 393 passed /
0 failed (384 baseline + 9 new T0014 tests; all pre-existing tests unchanged).
No reinstall, no daemon touch, no push.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Coord accepted T0014 (boot validation + brain-prompt roster rubric) this cycle,
pending gate green. Gate verified on HEAD b864d0d: make check-all → bash 14/14,
ruff clean, pytest 393 passed / 0 failed. Frontmatter flipped open -> done with
accepted: 2026-06-13; moved to tasks/archive/ (a done file in tasks/ fails
loop-task-lint.sh's status<->location invariant — same move as T0010–T0013).

This completes Phase 0+1: T0010–T0014 all accepted and archived.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements Phase 0 and Phase 1 of the Multi-Harness Agent Governance plan, adding per-harness governance fields, policy configurations, gate classification, boot-time validation, and a selection rubric in the brain prompt. The review feedback highlights several robustness improvements, including preventing a boot-time crash when a harness is empty, avoiding string formatting issues where explicit null values could format as "None", and escaping double quotes in the bash-based roster CLI to prevent malformed JSON generation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +169 to +170
for label, harness, override_var in checks:
if allow and harness not in allow:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If harness is configured as None or an empty string in the YAML configuration, passing it directly to substrate.harness_field will result in a TypeError inside subprocess.run (since None is not a valid argument type). This TypeError is not caught by the except SubstrateError block, causing the engine to crash at boot. Adding a check for falsy/empty values prevents this crash.

    for label, harness, override_var in checks:
        if not harness:
            failures.append(f"{label}.harness is empty or not specified")
            continue
        if allow and harness not in allow:

if policy.allow and harness not in policy.allow:
return False
if role and role in policy.role_tag_map:
tags = set(str(entry.get("capability_tags", "")).split(","))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using str(entry.get("capability_tags", "")) converts an explicit None (parsed from JSON null) into the string "None". This can lead to unexpected behavior when matching tags. Using entry.get("capability_tags") or "" is safer and more robust.

Suggested change
tags = set(str(entry.get("capability_tags", "")).split(","))
tags = set((entry.get("capability_tags") or "").split(","))

Comment on lines +115 to +119
lines.append(
f"{name} tags={entry.get('capability_tags', '')} "
f"cost={entry.get('cost_tier', '')} autonomy={entry.get('autonomy_class', '')} "
f"drift={entry.get('drift_pins', '')}"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If any of the registry fields (like capability_tags, cost_tier, etc.) are explicitly None (parsed from JSON null), entry.get(..., '') will return None, which then formats as "None" in the brain prompt. Using or '' instead of , '' ensures they default to empty strings.

Suggested change
lines.append(
f"{name} tags={entry.get('capability_tags', '')} "
f"cost={entry.get('cost_tier', '')} autonomy={entry.get('autonomy_class', '')} "
f"drift={entry.get('drift_pins', '')}"
)
lines.append(
f"{name} tags={entry.get('capability_tags') or ''} "
f"cost={entry.get('cost_tier') or ''} autonomy={entry.get('autonomy_class') or ''} "
f"drift={entry.get('drift_pins') or ''}"
)

Comment thread lib/harness-registry.sh
Comment on lines +517 to +524
printf ' {"name": "%s", "present": %s, "capability_tags": "%s", "cost_tier": "%s", "autonomy_class": "%s", "auth_requirement": "%s", "health_probe": "%s", "drift_pins": "%s"}' \
"$n" "$present" \
"$(harness_field "$n" capability_tags)" \
"$(harness_field "$n" cost_tier)" \
"$(harness_field "$n" autonomy_class)" \
"$(harness_field "$n" auth_requirement)" \
"$(harness_field "$n" health_probe)" \
"$(harness_field "$n" drift_pins)"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The JSON roster generation uses plain printf interpolation. While most fields are static identifiers, health_probe is a shell command and is highly likely to contain double quotes (e.g., pgrep -f "process" or curl -H "Auth..."). If it does, the generated JSON will be malformed. Escaping double quotes in health_probe prevents this.

Suggested change
printf ' {"name": "%s", "present": %s, "capability_tags": "%s", "cost_tier": "%s", "autonomy_class": "%s", "auth_requirement": "%s", "health_probe": "%s", "drift_pins": "%s"}' \
"$n" "$present" \
"$(harness_field "$n" capability_tags)" \
"$(harness_field "$n" cost_tier)" \
"$(harness_field "$n" autonomy_class)" \
"$(harness_field "$n" auth_requirement)" \
"$(harness_field "$n" health_probe)" \
"$(harness_field "$n" drift_pins)"
local probe
probe="$(harness_field "$n" health_probe)"
printf ' {"name": "%s", "present": %s, "capability_tags": "%s", "cost_tier": "%s", "autonomy_class": "%s", "auth_requirement": "%s", "health_probe": "%s", "drift_pins": "%s"}' \
"$n" "$present" \
"$(harness_field "$n" capability_tags)" \
"$(harness_field "$n" cost_tier)" \
"$(harness_field "$n" autonomy_class)" \
"$(harness_field "$n" auth_requirement)" \
"${probe//\"/\\\"}" \
"$(harness_field "$n" drift_pins)"

@apple-techie apple-techie merged commit e710d08 into main Jun 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant