feat: PreToolUse plan-enforcer hook (#128)#139
Closed
sriumcp wants to merge 2 commits into
Closed
Conversation
…I-native-Systems-Research#135) Replace --dangerously-skip-permissions with a fine-grained, per-campaign permission policy generated at init. The orchestrator's pure renderer (orchestrator/settings_template.py) takes work_dir, repo_path, and an optional experiment_plan, and returns a dict suitable for serialization as .claude/settings.json. The contents: - permissions.allowOnly: campaign work-dir and target repo path. Anything else is denied by default. - permissions.allow: Bash command allowlist — conservative defaults plus any binaries pulled out of experiment_plan.yaml arm conditions, plus caller-provided extras. - permissions.deny: hard blocks for outbound https (curl/wget) and catastrophic shell commands (rm -rf /). - hooks.Stop: registered when bin/nous-execute-stop is present (AI-native-Systems-Research#129 integration). - hooks.PreToolUse: registered when caller provides the path (AI-native-Systems-Research#128 hook). setup_work_dir() now writes the rendered settings file at init time, idempotently (won't clobber a hand-edited file). CLIDispatcher auto-detects work_dir/.claude/settings.json on construction, and when present passes --settings <path> to claude -p instead of --dangerously-skip-permissions. SDKDispatcher already accepted settings_path in AI-native-Systems-Research#121 — wire-up matches. Behavioral tests (tests/test_settings_template.py): 14 cases. Renderer contract: - allowOnly contains work_dir - allowOnly contains repo_path when provided - default bin allowlist contains python, git, grep - plan binaries (./blis, /usr/local/bin/sim) are added by basename - extra_bin_allowlist extends defaults - deny blocks outbound https - hooks section absent unless hook paths provided - Stop hook registered with absolute path - PreToolUse hook registered with Bash matcher Disk write contract: - write_campaign_settings creates parent dir + writes JSON - settings_path_for returns .claude/settings.json under work_dir Init wiring contract: - setup_work_dir writes the file when fresh - setup_work_dir does NOT overwrite a user-customized settings file Replacement invariant (the security property): - rendered settings impose non-empty allowOnly AND non-empty deny (otherwise the file is functionally equivalent to --dangerously and the swap is a regression). Out of scope: the "out-of-worktree write is denied" criterion is an integration test against a live claude session and is verified manually. docs/security.md describes the model end-to-end. Test suite: 338 baseline + 14 new = 352 passing. Closes AI-native-Systems-Research#135. Refs AI-native-Systems-Research#120. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ship bin/nous-plan-enforcer, a Python entrypoint for use as a Claude Code
PreToolUse hook. It intercepts proposed Bash tool calls during the
executor session and decides whether to allow them based on the
iteration's experiment_plan.yaml.
Decision protocol:
* NOUS_PLAN_ENFORCEMENT=strict: exit 2 (block) if the proposed
command's head binary is not the head binary of any planned
condition. Stderr explains the violation; the agent reads it and
is expected to either revise the command or annotate
"# nous: ad-hoc" to opt out for one call.
* NOUS_PLAN_ENFORCEMENT=warn (default): always exit 0 (allow), but
record violations to <iter_dir>/plan_violations.jsonl with
timestamp, kind, command, and best-effort arm attribution.
* Escape hatch: a command containing the literal "# nous: ad-hoc"
is allowed in BOTH modes and logged as kind:"ad-hoc" so reviewers
can audit how often it's used.
Why this exists: 5/18 mech-design-enforcement showed two executor
processes racing on the same iter dir, partly because nothing inside
the agent enforced the plan. Hooks intercept tool calls deterministically
before the LLM acts — defense in depth on top of AI-native-Systems-Research#135's permission
policy.
Wire-up: setup_work_dir registers the hook automatically when
bin/nous-plan-enforcer exists, alongside the Stop hook from AI-native-Systems-Research#129. The
.claude/settings.json template (AI-native-Systems-Research#135) already supports
pre_tool_use_hook_path; this PR connects the wire.
Behavioral tests (8 in tests/test_plan_enforcer_hook.py):
Strict mode:
- allows a planned binary's command (different args still match by head)
- blocks an unplanned binary with stderr naming the violation
- allows ad-hoc-marked commands AND logs them distinctly
Warn mode:
- allows unplanned and logs to plan_violations.jsonl
- does NOT log planned commands
No false positives: parametric over four representative plan shapes
(single-arm/condition; multi-condition; multi-arm; absolute path) —
every planned command is allowed in strict mode.
Edge cases:
- missing NOUS_ITER_DIR: fail open (cannot enforce what we can't
compare against)
- non-Bash tool calls (Read, Write, etc.): pass through, no log
Stacked on AI-native-Systems-Research#135 (security/135-permission-policy). Rebase onto
reflective once that lands.
Test suite: 352 (post-AI-native-Systems-Research#135) + 8 new = 360 passing.
Closes AI-native-Systems-Research#128.
Refs AI-native-Systems-Research#120.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 tasks
Collaborator
Author
|
Superseded by #153 — the consolidated tracking-120 PR carrying all 17 commits in merge order. Closing this in favor of that single PR per project owner's request. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why
5/18 mech-design-enforcement showed two executor processes racing on the same iter dir partly because nothing inside the agent enforced the plan. Hooks intercept tool calls deterministically before the LLM acts — defense in depth on top of #135's permission policy.
Wire-up
`setup_work_dir` (#135 changes) auto-registers this hook in the rendered `.claude/settings.json` when `bin/nous-plan-enforcer` exists, alongside the Stop hook from #129.
Behavioral tests (8)
All assertions describe externally-visible behavior: exit code, stderr substrings, JSON rows in `plan_violations.jsonl`. None inspect internal helpers.
Test plan
.claude/settings.json) #135 stack) — 360/360 passCloses #128.
Refs #120, #135.
🤖 Generated with Claude Code