Skip to content

feat(policy): add agentic approval loop#1528

Draft
zredlined wants to merge 9 commits into
mainfrom
1097-agentic-policy-approval-loop
Draft

feat(policy): add agentic approval loop#1528
zredlined wants to merge 9 commits into
mainfrom
1097-agentic-policy-approval-loop

Conversation

@zredlined
Copy link
Copy Markdown
Collaborator

@zredlined zredlined commented May 22, 2026

Summary

Ships the agentic policy approval loop end-to-end. When the sandbox denies a network request, an agent inside the sandbox can propose a narrow policy refinement; the gateway runs a formal prover against the merged-policy delta; safe proposals (no new findings) auto-approve in ~1s; risky ones land in pending with structured evidence the reviewer can act on. The agent waits on a socket — zero LLM tokens burn during human review.

This is the loop the platform has been building toward: agents do the narrowing work, the prover catches changes the operator should know about, and the audit trail makes every approval reconstructable.

Closes #1097
Refs #1062
Refs #1532

What this PR ships

The loop. Sandbox denial → agent reads /etc/openshell/skills/policy_advisor.md → agent POSTs a narrow proposal to policy.local → gateway runs the prover → either auto-approve (empty delta) or pending (any finding) → on approval, sandbox hot-reloads → agent retries.

Prover wired in as the auto-approval referee. Every proposal (mechanistic and agent-authored alike) runs through openshell-prover. The prover answers four categorical questions about the proposed change — see What the prover decides. The gateway computes the delta vs the baseline policy and the auto-approval gate fires only when the delta is empty.

Providers-v2 in the loop. The prover validates against the effective policy — provider profiles composed in via providers-v2 are part of the model the prover reasons over. Agent-authored chunks for endpoints a provider profile covers land as their own rules (Fix A in merge.rs) instead of getting silently absorbed into the provider rule, so the prover sees the agent's narrow contribution honestly.

Default-deny posture preserved. Auto-approval is opt-in via the proposal_approval_mode runtime setting: gateway scope (openshell settings set --global proposal_approval_mode auto) or sandbox scope (openshell settings set <name> proposal_approval_mode auto), with gateway scope winning. Default ("manual", the absence of any setting) routes every proposal to human review regardless of prover verdict. CLI exposes a shorthand at create time: openshell sandbox create --approval-mode <manual|auto>, which writes the sandbox-scoped setting post-create. The audit event carries resolved_from=<gateway|sandbox|default> so operators can see why a given approval was auto vs manual.

Demo that walks the full loop. examples/agent-driven-policy-management/demo.sh runs a Codex agent through a two-path flow against a local gateway: one un-credentialed action auto-approves silently; one credentialed action escalates with a categorical finding, demo.sh approves on behalf, the agent retries and the file lands in GitHub. End-to-end in ~50–110s with one human-visible escalation, exactly the kind the prover cannot decide unilaterally.

Reconstructable audit. Every auto-approval emits a CONFIG:APPROVED OCSF event with unmapped fields auto=true, source=<mechanistic|agent_authored>, prover_delta=empty, and resolved_from=<gateway|sandbox|default>. The chunk's persisted validation_result carries the categorical finding lines for human-reviewed approvals.

Provider profile tightening. providers/github.yaml defaults api.github.com from read-write to read-only. Writes (gh / git via REST) now flow through the agentic loop — the loop becomes the on-ramp to write access, and the prover audits each capability change.

What the prover decides

The prover answers four formal questions about each proposed change. Each "yes" is its own categorical finding — no severity grade. Any finding blocks auto-approval; empty delta means the change is provably safe under the model.

Category The prover detects
link_local_reach Reach to a host in 169.254.0.0/16 or fe80::/10 (cloud-metadata range, serves credentials).
l7_bypass_credentialed A binary using a wire protocol the L7 proxy cannot inspect (git-remote-https, ssh, nc) reaches a host where a credential is in scope.
credential_reach_expansion A binary gains credentialed reach to a (host, port) it could not reach before.
capability_expansion On a (binary, host, port) that already had credentialed reach, the policy adds a new HTTP method. Finding cites the specific method.

Detail in crates/openshell-prover/README.md.

What the demo shows

==> Step 1 — un-credentialed reach (auto-approves)
   curl GET raw.githubusercontent.com/.../api.github.com.json
   prover: no findings (no credential in scope for the host)
   gateway: auto-approved in ~1s
   audit: "auto-approved: no new prover findings (source=agent_authored)"

==> Step 2 — credentialed capability change (escalates)
   curl PUT api.github.com/.../specific.md
   prover: credential_reach_expansion (or capability_expansion) on api.github.com:443
   gateway: pending — human review required
   demo.sh approves on behalf → agent retries → file lands in github

Acceptance criteria (deterministic, in tests)

  1. Un-credentialed reach auto-approves under auto-mode (zero findings, terminal status approved).
  2. Credentialed reach expansion lands in pending with credential_reach_expansion in validation_result.
  3. Capability expansion on an already-reached credentialed host lands in pending with capability_expansion citing the new method.
  4. Link-local reach lands in pending unconditionally with link_local_reach.
  5. L7-bypass binary with credential lands in pending with l7_bypass_credentialed.
  6. Implicit supersede works in both directions on (host, port, binary) overlap.
  7. Default approval mode is manual — empty delta does NOT auto-approve when the proposal_approval_mode setting is unset at both scopes, "manual", or any unknown future value. Gateway scope wins over sandbox scope.
  8. Approval mode resolves through settings: gateway scope wins over sandbox scope, and CLI --approval-mode auto writes the sandbox-scoped setting after create.
  9. Auto-approval audit carries auto=true, source=<mode>, prover_delta=empty, and resolved_from=<gateway|sandbox|default> as unmapped OCSF fields.
  10. Agent-submitted rule names using the reserved _provider_ prefix are rejected at submit time.
  11. Categorical findings (no severity tiers) appear in validation_result.

All covered by unit and integration tests in crates/openshell-server/src/grpc/policy.rs::tests.

Testing

  • cargo test --workspace --lib — 534 gateway tests, all 16 crates green.
  • cargo clippy -p openshell-server -p openshell-cli -p openshell-core --all-targets -- -D warnings — clean.
  • cargo fmt --check — clean.
  • ./examples/agent-driven-policy-management/demo.sh runs end-to-end against the local Docker gateway and writes the demo file to GitHub.

Explicitly deferred (follow-up PRs)

  • LLM-based contextual review layered on top of the deterministic gate.
  • Intent files / per-sandbox config of "which findings auto-reject vs. escalate."
  • Credential scope modeling (read-only vs write-scoped tokens).
  • MCP as a third L7 surface (REST + GraphQL + MCP).
  • Per-binary credential isolation (binaries see only the credentials their policy authorizes).
  • L7 watch mode for L4 grants (record HTTP requests through approved L4 tunnels for later L4→L7 conversion).
  • Trust tiers per sandbox class (production sandboxes get tighter defaults).
  • Dedicated CONFIG:AUTO_APPROVED OCSF event class (today reuses CONFIG:APPROVED with auto=true unmapped).
  • User-facing docs page under docs/ for the agentic loop.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@zredlined zredlined self-assigned this May 23, 2026
@zredlined zredlined added the topic:l7 Application-layer policy and inspection work label May 23, 2026
@zredlined zredlined force-pushed the 1097-agentic-policy-approval-loop branch from e135a1c to a68b370 Compare May 25, 2026 21:06
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 25, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@zredlined
Copy link
Copy Markdown
Collaborator Author

/ok to test a68b370

@johntmyers
Copy link
Copy Markdown
Collaborator

proposal_approval_mode is currently registered as a free-form string setting, so openshell settings set ... proposal_approval_mode autom is accepted and then silently resolves as manual. The runtime fail-closed behavior is good and should stay, but we should reject invalid values at configure time for operator UX.

Can we add key-specific validation for proposal_approval_mode, accepting only manual and auto?

Places I’d expect coverage:

  • Server-side UpdateConfig / proto_setting_to_stored, so all API callers get consistent validation.
  • CLI settings set, unless it can rely fully on the server error.
  • TUI setting edits, either by reusing shared CLI/core validation plumbing or by surfacing the server-side rejection clearly.

I’d still keep resolve_proposal_approval_mode fail-closed for unknown persisted values, so stale/corrupt/future values never enable auto-approval on older gateways.

zredlined added 8 commits May 28, 2026 09:42
Signed-off-by: Alexander Watson <zredlined@gmail.com>
…roval

Run the prover on every proposal regardless of analysis_mode. Auto-approve
proposals whose merged-policy delta is empty (proposer-agnostic, with the
global-policy gate respected). Calibrate prover findings to a single HIGH
severity emitted on link-local hosts, L4+credential-in-scope, and
bypass-L7-binary+credential-in-scope. Add implicit supersede on
(host, port, binary): newer submissions auto-reject older pending chunks,
and incoming mechanistic chunks auto-reject when an approved agent_authored
chunk already covers the same endpoint.

Audit auto-approvals via CONFIG:APPROVED OCSF events carrying auto=true,
source=<mode>, prover_delta=empty as unmapped fields, with message text
"auto-approved: no new prover findings". Build credential set from
sandbox-attached providers (presence only — no scope modeling in v1).

Signed-off-by: Alexander Watson <zredlined@gmail.com>
Signed-off-by: Alexander Watson <zredlined@gmail.com>
The prover now answers four formal questions about a proposed policy
change and emits one finding per "yes" answer:

  - link_local_reach
  - l7_bypass_credentialed
  - credential_reach_expansion
  - capability_expansion

There is no severity grade. The category name is the signal; the
per-path evidence carries the structured detail. The auto-approval
gate is binary — empty delta or not. This removes the previous
HIGH/MEDIUM/CRITICAL severity tiers and the narrowness classifier
that was inconsistent across the access-shorthand / explicit-rules
boundary.

Gateway-side finding_delta gains category suppression:
capability_expansion paths whose (binary, host, port) appears in the
credential_reach_expansion delta are suppressed, so a brand-new
credentialed reach surfaces as one finding rather than one reach plus
N method findings.

The github provider profile now defaults api.github.com to read-only
(was: read-write). Writes flow through the agentic loop — the prover
audits each capability change rather than treating broad write access
as the default.

Demo, sandbox skill, and architecture docs updated to describe the
four-category model. Prover gains a README.md documenting the formal
queries, evidence shape, and how to add a new category.

Signed-off-by: Alexander Watson <zredlined@gmail.com>
Signed-off-by: Alexander Watson <zredlined@gmail.com>
…iasing

Move proposal_approval_mode out of SandboxSpec and into the existing runtime-mutable settings model so it can be flipped on a running sandbox and pinned fleet-wide via gateway scope. Precedence matches the rest of the settings model: gateway wins over sandbox, default is manual. The CLI's --approval-mode flag on `sandbox create` is now a shorthand that writes the sandbox-scoped setting post-create. Auto-approval audit events carry resolved_from=<gateway|sandbox|default>.

Reject agent proposals whose rule_name starts with `_provider_`. That namespace is reserved for provider-profile-synthesized rules; allowing agents to address them by name would bypass the merge guard that splits agent contributions into their own rule so the prover sees them honestly.

Refs #1097

Signed-off-by: Alexander Watson <zredlined@gmail.com>
Signed-off-by: Alexander Watson <zredlined@gmail.com>
Previously the setting was a free-form string, so `openshell settings set
... proposal_approval_mode autom` was accepted and silently resolved to
manual at runtime. Operators got no signal that they had fat-fingered
the value.

Extend RegisteredSetting with an optional allowed_string_values whitelist
and apply it at every operator entry point:

- Server-side proto_setting_to_stored rejects out-of-whitelist values
  with Status::invalid_argument listing the allowed set, so all gRPC
  callers get consistent validation.
- CLI parse_cli_setting_value rejects client-side before the round-trip.
- TUI global and sandbox setting editors surface the same error inline.

Runtime resolve_proposal_approval_mode is intentionally unchanged: it
still treats any value other than exact "auto" as manual, so stale
storage or future-mode values never enable auto-approval on older
gateways.

Also documents the approval-mode loop in docs/sandboxes/policy-advisor.mdx
with new Approval Modes and What Auto-Approval Checks sections covering
mode precedence, the --approval-mode create shorthand, the audit-event
fields, and the four categorical prover findings.

Refs #1528

Signed-off-by: Alexander Watson <zredlined@gmail.com>
@zredlined zredlined force-pushed the 1097-agentic-policy-approval-loop branch from 8984c37 to 716d436 Compare May 28, 2026 16:48
@github-actions
Copy link
Copy Markdown

@zredlined
Copy link
Copy Markdown
Collaborator Author

zredlined commented May 28, 2026

Thanks for catching this- addressed in 716d436.

Validation now lands at every operator entry point you called out-

  • Server-side proto_setting_to_stored (crates/openshell-server/src/grpc/policy.rs) rejects out-of-whitelist values with Status::invalid_argument listing the allowed set, so all gRPC callers get consistent validation. Added an RPC-level test (update_config_global_rejects_invalid_proposal_approval_mode) so a future refactor can't accidentally route writes around the chokepoint.
  • CLI parse_cli_setting_value (crates/openshell-cli/src/run.rs) rejects client-side before the round-trip.
  • TUI editors (crates/openshell-tui/src/app.rs) — both handle_setting_edit_key and handle_sandbox_setting_edit_key consult the shared registry via settings::setting_for_key(&entry.key).validate_string_value(...) and surface the same error inline. Adding the next constrained string setting only needs the registry entry.

Mechanism: RegisteredSetting gains allowed_string_values: Option<&'static [&'static str]>; validate_string_value returns the allowed slice on rejection so each caller formats its own diagnostic. Set ["manual", "auto"] for proposal_approval_mode.

Runtime fail-closed contract preserved: resolve_proposal_approval_mode still does the exact value == "auto" check, so stale or corrupt values from older gateways never enable auto-approval on this code path. The new whitelist is operator-UX-only.

The opening claim, the loop description, and the Review Proposals section
all predated auto-approval mode and read as if a developer always sat in
the loop. Update them to reflect the prover-gated auto-approval path:

- Opening: preserve the default-deny framing but acknowledge opt-in auto
  mode lets the gateway approve empty-delta proposals.
- Loop: now seven steps. Step 5 mentions the prover. Step 6 splits
  manual vs auto behavior. Step 7 covers the agent wait/retry path.
- Review Proposals: note that under auto mode, only flagged proposals
  show as pending; empty-delta ones are visible under --status approved
  with the audit fields documented in Approval Modes.

Refs #1528 #1480

Signed-off-by: Alexander Watson <zredlined@gmail.com>
@zredlined
Copy link
Copy Markdown
Collaborator Author

Output of running ./examples/agent-driven-policy-management/demo.sh:

===== Run agent-driven policy management demo =====
==> [t+0.9s] Preflight
  gateway:  connected · 0.0.52-dev.9+gd851129e
  github:   zredlined/openshell-policy-demo @ main (8e8fbed)
  providers created (codex, github) — credentials injected as env vars only
==> [t+2.7s] Run summary
  repo:     zredlined/openshell-policy-demo
  branch:   main
  target:   openshell-policy-advisor-demo/20260528-102008.md
  sandbox:  policy-demo-20260528-102008
==> [t+2.7s] Launching sandbox; agent will hit a policy block and draft a proposal
  policy:   raw GitHub schema path denied; GitHub writes denied
  approval: auto for no new findings; review for credential risk
  target:   PUT /repos/zredlined/openshell-policy-demo/contents/openshell-policy-advisor-demo/20260528-102008.md
  log:      /var/folders/m8/pftdbmpd6ls8_cwwxqfjb6r40000gp/T//openshell-policy-demo.AnxlJO/agent.log
==> [t+2.7s] Waiting for the agent to draft a policy proposal
  Loop: deny → propose → validate → decide → retry
    auto:   scoped requests with no new findings continue
    review: credentialed or risky requests pause here
  
  Watching for review requests...
  
  approval requested
  Request 1: chunk 9ca53957
    Binary:     /usr/bin/curl
    Reason:     Allow /usr/bin/curl to create or update one repository file in zredlined/openshell-policy-demo as part of the requested sandbox task.
    Prover:    1 new finding
    Finding:   credential_reach_expansion: api.github.com:443 via /usr/bin/curl
    Finding:   credential_reach_expansion: api.github.com:443 via /usr/lib/node_modules/@openai/codex/node_modules/@openai/codex-linux-arm64/vendor/aarch64-unknown-linux-musl/codex/codex
    Access:     api.github.com:443 [L7 rest, allow PUT /repos/zredlined/openshell-policy-demo/contents/openshell-policy-advisor-demo/20260528-102008.md]
  
==> [t+60.5s] Approving for demo
  OK 1 chunk(s) approved, 0 skipped. Policy version: 4
  agent exited after 1 review approval(s)
==> [t+72.9s] Verifying GitHub write
  file: openshell-policy-advisor-demo/20260528-102008.md
  url:  https://github.com/zredlined/openshell-policy-demo/blob/main/openshell-policy-advisor-demo/20260528-102008.md
==> [t+73.5s] Decision trace
  [1779988831.972] [sandbox] [OCSF ] [ocsf] CONFIG:LOADED [INFO] Policy reloaded successfully [policy_hash:a1fdcf311165cc04c7306bbc82d7d6d2d318c54c07651d7f3becd141d33e9df3] [hash:a1fdcf311165cc04c7306bbc82d7d6d2d318c54c07651d7f3becd141d33e9df3]
  [1779988841.339] [sandbox] [OCSF ] [ocsf] CONFIG:PROPOSED [INFO] agent_authored proposal chunk:12a2b3f8-fa89-4130-9f12-8064acd83926 on raw.githubusercontent.com:443 GET /github/rest-api-description/main/descriptions/api.github.com/api.github.com.json by /usr/bin/curl
  [1779988842.027] [sandbox] [OCSF ] [ocsf] CONFIG:LOADED [INFO] Policy reloaded successfully [policy_hash:a6b5f55c58518b3479a5b14e0a6948b44ebb22035f60867985af6d27103c894f] [hash:a6b5f55c58518b3479a5b14e0a6948b44ebb22035f60867985af6d27103c894f]
  [1779988847.126] [sandbox] [OCSF ] [ocsf] CONFIG:PROPOSED [INFO] agent_authored proposal chunk:bc6d0b02-27ae-4fe5-8129-b4f915444486 on raw.githubusercontent.com:443 GET /github/rest-api-description/main/descriptions/api.github.com/api.github.com.json by /usr/bin/curl
  [1779988861.498] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED PUT http://api.github.com:443/repos/zredlined/openshell-policy-demo/contents/openshell-policy-advisor-demo/20260528-102008.md [policy:github_api_readonly engine:l7] [reason:L7_REQUEST deny PUT api.github.com:443/repos/zredlined/openshell-policy-demo/con...]
  [1779988865.613] [sandbox] [OCSF ] [ocsf] CONFIG:PROPOSED [INFO] agent_authored proposal chunk:9ca53957-d3a9-42ab-a05c-727cc7c414da on api.github.com:443 PUT /repos/zredlined/openshell-policy-demo/contents/openshell-policy-advisor-demo/20260528-102008.md by /usr/bin/curl
  [1779988872.085] [sandbox] [OCSF ] [ocsf] CONFIG:LOADED [INFO] Policy reloaded successfully [policy_hash:81efb11137a1fd5d83ccee21c12b86d16fc82a61ee08ca808073f7ae18961a08] [hash:81efb11137a1fd5d83ccee21c12b86d16fc82a61ee08ca808073f7ae18961a08]
  [1779988875.929] [sandbox] [OCSF ] [ocsf] HTTP:PUT [INFO] ALLOWED PUT http://api.github.com:443/repos/zredlined/openshell-policy-demo/contents/openshell-policy-advisor-demo/20260528-102008.md [policy:github_api_readonly engine:l7]
✓ Demo complete.
===== Stop gateway =====
Stopped Docker gateway session with Ctrl-C at Thu May 28 10:21:50 PDT 2026
Gateway log: tmp/e2e-demo-logs/gateway-docker-running-20260528-101910.log

@zredlined
Copy link
Copy Markdown
Collaborator Author

/ok to test d851129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:l7 Application-layer policy and inspection work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(gateway): persist and validate agent policy proposal operations

2 participants