feat: deterministic Stop hook for executor completion (#129) by sriumcp · Pull Request #137 · AI-native-Systems-Research/agentic-strategy-evolution

sriumcp · 2026-05-24T11:58:45Z

Summary

New bin/nous-execute-stop Python entrypoint suitable for use as a Claude Code Stop hook.
Allows the executor session to terminate only when both principle_updates.json exists and nous validate execution passes — schema-driven, no LLM judgment.
On block, writes a structured reason to stderr; Claude Code feeds that back into the agent's conversation so it fixes the artifact instead of restarting.

Why deterministic over Haiku

The /goal evaluator (#124) is right for fuzzy success criteria, but execution completion is a schema check — a shell-out that runs validate_execution is cheaper, faster, and immune to evaluator drift. The two coexist.

Wire-up

The orchestrator exports NOUS_ITER_DIR before launching the executor session; the per-campaign .claude/settings.json (lands in #135) registers this script under hooks.Stop. This PR ships just the script — installation today is manual via that settings file.

Behavioral tests

Five cases in tests/test_execute_stop_hook.py:

Scenario	Expected
Valid iter dir + principle_updates.json present	exit 0, no stderr
principle_updates.json missing	exit 2, stderr names the file
findings.json missing required field	exit 2, stderr has schema diff
NOUS_ITER_DIR points at non-existent dir	exit 2, reason given
NOUS_ITER_DIR unset	exit 2, config-error reason

Tests use StubDispatcher to populate a known-passing iter_dir, then mutate it to simulate failure modes. Assertions describe what the hook emits (exit code + stderr substrings) — never which functions it called or how it organized internal work.

Test plan

pytest tests/test_execute_stop_hook.py — 5/5 pass
pytest (full suite) — 343/343 pass (was 338; 5 new)
Wire into a real .claude/settings.json after security: per-campaign permission policy template (.claude/settings.json) #135 lands and confirm an executor session terminates only when validation passes.

Closes #129.
Refs #120.

🤖 Generated with Claude Code

…Systems-Research#129) Ship bin/nous-execute-stop, a Python entrypoint suitable for use as a Claude Code Stop hook. It tells the harness whether the executor agent is allowed to terminate, based on objective evidence on disk: * exit 0 (allow stop) when: - principle_updates.json exists in $NOUS_ITER_DIR - `nous validate execution --dir $NOUS_ITER_DIR` returns pass * exit 2 (block stop) otherwise, with a structured reason on stderr so Claude Code feeds it back into the agent's conversation and the next turn fixes the artifact rather than restarting. Why deterministic over probabilistic: the existing /goal evaluator (Haiku post-turn) is right for fuzzy success criteria, but execution completion is a schema check — cheaper, faster, and immune to evaluator drift to have a deterministic shell-out. The two coexist; AI-native-Systems-Research#124 wires /goal for fuzzy gating, this hook handles the schema gate. Wire-up: the orchestrator exports NOUS_ITER_DIR before launching the executor session, and the per-campaign .claude/settings.json (which lands in AI-native-Systems-Research#135) registers this script under hooks.Stop. This PR ships just the script so it can be installed manually today. Behavioral tests (5): * pass case: valid iter dir + principle_updates.json -> exit 0, no stderr * block: principle_updates.json missing -> exit 2, stderr names the file * block: corrupted findings.json -> exit 2, stderr includes the schema diff * block: NOUS_ITER_DIR points at non-existent dir -> exit 2 with reason * block: NOUS_ITER_DIR unset -> exit 2 with config-error reason Tests use StubDispatcher to populate a known-passing iter dir, then mutate it to simulate failure modes. Assertions describe what the hook emits (exit code + stderr substrings) — never which functions it called. Test suite: 338 baseline + 5 new = 343 passing. Closes AI-native-Systems-Research#129. Refs AI-native-Systems-Research#120. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sriumcp · 2026-05-24T23:44:44Z

Superseded by #153 — the consolidated tracking-120 PR carrying all 17 commits in merge order. Closing this in favor of that single PR per project owner's request.

sriumcp mentioned this pull request May 24, 2026

Tracking: Claude-Code-native uplift for Nous (UX, quality, speed, token budget) #120

Open

15 tasks

sriumcp closed this May 24, 2026

sriumcp deleted the feat/129-execute-stop-hook branch May 25, 2026 00:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: deterministic Stop hook for executor completion (#129)#137

feat: deterministic Stop hook for executor completion (#129)#137
sriumcp wants to merge 1 commit into
AI-native-Systems-Research:reflectivefrom
sriumcp:feat/129-execute-stop-hook

sriumcp commented May 24, 2026

Uh oh!

sriumcp commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sriumcp commented May 24, 2026

Summary

Why deterministic over Haiku

Wire-up

Behavioral tests

Test plan

Uh oh!

sriumcp commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant