Skip to content

feat: PreToolUse plan-enforcer hook (#128)#139

Closed
sriumcp wants to merge 2 commits into
AI-native-Systems-Research:reflectivefrom
sriumcp:feat/128-pretooluse-enforcer
Closed

feat: PreToolUse plan-enforcer hook (#128)#139
sriumcp wants to merge 2 commits into
AI-native-Systems-Research:reflectivefrom
sriumcp:feat/128-pretooluse-enforcer

Conversation

@sriumcp
Copy link
Copy Markdown
Collaborator

@sriumcp sriumcp commented May 24, 2026

⚠️ Stacked on #138 (security/135-permission-policy). This PR's diff includes #135's commits. Once #138 lands on `reflective`, this branch should be rebased and the diff will shrink to only #128's changes.

Summary

  • New `bin/nous-plan-enforcer` Python entrypoint suitable for use as a Claude Code PreToolUse hook.
  • Intercepts proposed Bash tool calls and compares the proposed command's head binary against the head binaries in the iteration's `experiment_plan.yaml`.
  • Two modes: strict (block unplanned commands; exit 2 with reason) and warn (default; log to `plan_violations.jsonl` and allow).
  • Escape hatch: `# nous: ad-hoc` annotation allows one-off out-of-plan commands and logs them distinctly.

Why

5/18 mech-design-enforcement showed two executor processes racing on the same iter dir partly because nothing inside the agent enforced the plan. Hooks intercept tool calls deterministically before the LLM acts — defense in depth on top of #135's permission policy.

Wire-up

`setup_work_dir` (#135 changes) auto-registers this hook in the rendered `.claude/settings.json` when `bin/nous-plan-enforcer` exists, alongside the Stop hook from #129.

Behavioral tests (8)

Class Cases
Strict mode planned-binary allowed; unplanned blocked with stderr; ad-hoc escape allowed + logged
Warn mode unplanned allowed + logged; planned not logged
No false positives parametric over 4 representative plan shapes (single-arm, multi-condition, multi-arm, absolute path)
Edge cases missing iter_dir fails open; non-Bash tool calls pass through

All assertions describe externally-visible behavior: exit code, stderr substrings, JSON rows in `plan_violations.jsonl`. None inspect internal helpers.

Test plan

Closes #128.
Refs #120, #135.

🤖 Generated with Claude Code

sriumcp and others added 2 commits May 24, 2026 08:03
…I-native-Systems-Research#135)

Replace --dangerously-skip-permissions with a fine-grained, per-campaign
permission policy generated at init.

The orchestrator's pure renderer (orchestrator/settings_template.py) takes
work_dir, repo_path, and an optional experiment_plan, and returns a dict
suitable for serialization as .claude/settings.json. The contents:

  - permissions.allowOnly: campaign work-dir and target repo path. Anything
    else is denied by default.
  - permissions.allow: Bash command allowlist — conservative defaults plus
    any binaries pulled out of experiment_plan.yaml arm conditions, plus
    caller-provided extras.
  - permissions.deny: hard blocks for outbound https (curl/wget) and
    catastrophic shell commands (rm -rf /).
  - hooks.Stop: registered when bin/nous-execute-stop is present (AI-native-Systems-Research#129
    integration).
  - hooks.PreToolUse: registered when caller provides the path (AI-native-Systems-Research#128 hook).

setup_work_dir() now writes the rendered settings file at init time,
idempotently (won't clobber a hand-edited file). CLIDispatcher
auto-detects work_dir/.claude/settings.json on construction, and when
present passes --settings <path> to claude -p instead of
--dangerously-skip-permissions. SDKDispatcher already accepted
settings_path in AI-native-Systems-Research#121 — wire-up matches.

Behavioral tests (tests/test_settings_template.py): 14 cases.

Renderer contract:
  - allowOnly contains work_dir
  - allowOnly contains repo_path when provided
  - default bin allowlist contains python, git, grep
  - plan binaries (./blis, /usr/local/bin/sim) are added by basename
  - extra_bin_allowlist extends defaults
  - deny blocks outbound https
  - hooks section absent unless hook paths provided
  - Stop hook registered with absolute path
  - PreToolUse hook registered with Bash matcher

Disk write contract:
  - write_campaign_settings creates parent dir + writes JSON
  - settings_path_for returns .claude/settings.json under work_dir

Init wiring contract:
  - setup_work_dir writes the file when fresh
  - setup_work_dir does NOT overwrite a user-customized settings file

Replacement invariant (the security property):
  - rendered settings impose non-empty allowOnly AND non-empty deny
    (otherwise the file is functionally equivalent to --dangerously
    and the swap is a regression).

Out of scope: the "out-of-worktree write is denied" criterion is an
integration test against a live claude session and is verified manually.

docs/security.md describes the model end-to-end.

Test suite: 338 baseline + 14 new = 352 passing.

Closes AI-native-Systems-Research#135.
Refs AI-native-Systems-Research#120.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ship bin/nous-plan-enforcer, a Python entrypoint for use as a Claude Code
PreToolUse hook. It intercepts proposed Bash tool calls during the
executor session and decides whether to allow them based on the
iteration's experiment_plan.yaml.

Decision protocol:

  * NOUS_PLAN_ENFORCEMENT=strict: exit 2 (block) if the proposed
    command's head binary is not the head binary of any planned
    condition. Stderr explains the violation; the agent reads it and
    is expected to either revise the command or annotate
    "# nous: ad-hoc" to opt out for one call.

  * NOUS_PLAN_ENFORCEMENT=warn (default): always exit 0 (allow), but
    record violations to <iter_dir>/plan_violations.jsonl with
    timestamp, kind, command, and best-effort arm attribution.

  * Escape hatch: a command containing the literal "# nous: ad-hoc"
    is allowed in BOTH modes and logged as kind:"ad-hoc" so reviewers
    can audit how often it's used.

Why this exists: 5/18 mech-design-enforcement showed two executor
processes racing on the same iter dir, partly because nothing inside
the agent enforced the plan. Hooks intercept tool calls deterministically
before the LLM acts — defense in depth on top of AI-native-Systems-Research#135's permission
policy.

Wire-up: setup_work_dir registers the hook automatically when
bin/nous-plan-enforcer exists, alongside the Stop hook from AI-native-Systems-Research#129. The
.claude/settings.json template (AI-native-Systems-Research#135) already supports
pre_tool_use_hook_path; this PR connects the wire.

Behavioral tests (8 in tests/test_plan_enforcer_hook.py):

Strict mode:
  - allows a planned binary's command (different args still match by head)
  - blocks an unplanned binary with stderr naming the violation
  - allows ad-hoc-marked commands AND logs them distinctly

Warn mode:
  - allows unplanned and logs to plan_violations.jsonl
  - does NOT log planned commands

No false positives: parametric over four representative plan shapes
(single-arm/condition; multi-condition; multi-arm; absolute path) —
every planned command is allowed in strict mode.

Edge cases:
  - missing NOUS_ITER_DIR: fail open (cannot enforce what we can't
    compare against)
  - non-Bash tool calls (Read, Write, etc.): pass through, no log

Stacked on AI-native-Systems-Research#135 (security/135-permission-policy). Rebase onto
reflective once that lands.

Test suite: 352 (post-AI-native-Systems-Research#135) + 8 new = 360 passing.

Closes AI-native-Systems-Research#128.
Refs AI-native-Systems-Research#120.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sriumcp
Copy link
Copy Markdown
Collaborator Author

sriumcp commented May 24, 2026

Superseded by #153 — the consolidated tracking-120 PR carrying all 17 commits in merge order. Closing this in favor of that single PR per project owner's request.

@sriumcp sriumcp closed this May 24, 2026
@sriumcp sriumcp deleted the feat/128-pretooluse-enforcer branch May 25, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant