feat: deterministic Stop hook for executor completion (#129)#137
Closed
sriumcp wants to merge 1 commit into
Closed
feat: deterministic Stop hook for executor completion (#129)#137sriumcp wants to merge 1 commit into
sriumcp wants to merge 1 commit into
Conversation
…Systems-Research#129) Ship bin/nous-execute-stop, a Python entrypoint suitable for use as a Claude Code Stop hook. It tells the harness whether the executor agent is allowed to terminate, based on objective evidence on disk: * exit 0 (allow stop) when: - principle_updates.json exists in $NOUS_ITER_DIR - `nous validate execution --dir $NOUS_ITER_DIR` returns pass * exit 2 (block stop) otherwise, with a structured reason on stderr so Claude Code feeds it back into the agent's conversation and the next turn fixes the artifact rather than restarting. Why deterministic over probabilistic: the existing /goal evaluator (Haiku post-turn) is right for fuzzy success criteria, but execution completion is a schema check — cheaper, faster, and immune to evaluator drift to have a deterministic shell-out. The two coexist; AI-native-Systems-Research#124 wires /goal for fuzzy gating, this hook handles the schema gate. Wire-up: the orchestrator exports NOUS_ITER_DIR before launching the executor session, and the per-campaign .claude/settings.json (which lands in AI-native-Systems-Research#135) registers this script under hooks.Stop. This PR ships just the script so it can be installed manually today. Behavioral tests (5): * pass case: valid iter dir + principle_updates.json -> exit 0, no stderr * block: principle_updates.json missing -> exit 2, stderr names the file * block: corrupted findings.json -> exit 2, stderr includes the schema diff * block: NOUS_ITER_DIR points at non-existent dir -> exit 2 with reason * block: NOUS_ITER_DIR unset -> exit 2 with config-error reason Tests use StubDispatcher to populate a known-passing iter dir, then mutate it to simulate failure modes. Assertions describe what the hook emits (exit code + stderr substrings) — never which functions it called. Test suite: 338 baseline + 5 new = 343 passing. Closes AI-native-Systems-Research#129. Refs AI-native-Systems-Research#120. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 tasks
Collaborator
Author
|
Superseded by #153 — the consolidated tracking-120 PR carrying all 17 commits in merge order. Closing this in favor of that single PR per project owner's request. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
bin/nous-execute-stopPython entrypoint suitable for use as a Claude Code Stop hook.principle_updates.jsonexists andnous validate executionpasses — schema-driven, no LLM judgment.Why deterministic over Haiku
The
/goalevaluator (#124) is right for fuzzy success criteria, but execution completion is a schema check — a shell-out that runsvalidate_executionis cheaper, faster, and immune to evaluator drift. The two coexist.Wire-up
The orchestrator exports
NOUS_ITER_DIRbefore launching the executor session; the per-campaign.claude/settings.json(lands in #135) registers this script underhooks.Stop. This PR ships just the script — installation today is manual via that settings file.Behavioral tests
Five cases in
tests/test_execute_stop_hook.py:Tests use
StubDispatcherto populate a known-passing iter_dir, then mutate it to simulate failure modes. Assertions describe what the hook emits (exit code + stderr substrings) — never which functions it called or how it organized internal work.Test plan
pytest tests/test_execute_stop_hook.py— 5/5 passpytest(full suite) — 343/343 pass (was 338; 5 new).claude/settings.jsonafter security: per-campaign permission policy template (.claude/settings.json) #135 lands and confirm an executor session terminates only when validation passes.Closes #129.
Refs #120.
🤖 Generated with Claude Code