Version: 1.0 Status: Active Last Updated: 2026-03-26
policy-contract is the canonical policy scope model and resolver for Phenotype's multi-agent AgentOps platform. It defines a 6-level hierarchical scope stack (system -> user -> repo -> harness -> task-domain -> task-instance), a Python resolver that merges scopes into a deterministic effective policy, host-specific artifact generators for Codex / Cursor / Claude / Factory-Droid, and governance tooling (schema validation, snapshot drift detection, version enforcement).
The resolved policy controls which shell commands each AI agent harness may allow, deny, or request at runtime, and provides conditional rules evaluated by cross-language wrapper evaluators (Go, Rust, Zig).
| User | Context |
|---|---|
| Multi-agent platform operator | Defines system-wide and user-level hard constraints |
| Repository owner | Maintains repo.yaml contract baseline for all agents in the repo |
| Harness integrator (Codex, Cursor, Claude, Factory-Droid) | Consumes generated host artifacts and wrapper bundles |
| CI/CD pipeline | Resolves effective policy on each run, validates schema and snapshots |
| Cross-language evaluator author (Go/Rust/Zig) | Consumes policy-wrapper-rules.json for runtime conditional decisions |
As a multi-agent operator, I define policies at system, user, repo, harness, task-domain, and task-instance scopes, and the resolver merges them with deterministic precedence, so that more-specific scopes override less-specific ones without ambiguity.
Acceptance Criteria:
- Six scope levels are resolved in order: system, user, repo, harness, task-domain, task-instance.
- Each scope contributes its
command_rulesand policy fields; later scopes override earlier scopes on key collision. - Within each discovery scope directory, files are deduplicated by stem using extension
precedence (
.yaml>.yml>.json). - The resolver emits a
policy_hash(SHA-256 of the final merged policy) for audit trails. - The
scopeschain in the output lists every scope file that contributed to the resolution in order.
As a CI pipeline, I resolve the effective policy to JSON via resolve.py with harness and
task-domain parameters, so that downstream tools and agents receive a stable, auditable
policy artifact.
Acceptance Criteria:
resolve.py --root <dir> --harness <name> --task-domain <name> --emit <path>writes a JSON file containingpolicy_hash,scopes, andpolicy.--task-instance <path>optionally injects a per-task-instance override file as the highest-precedence scope.--emit-host-rulestriggers host artifact generation inline.--include-conditionalincludes conditional rule payloads in the output.- Non-zero exit on any resolution or validation error; error details on stderr.
As a policy author, I define command rules with nested all/any condition groups that gate
allow/deny/request decisions on runtime predicates, so that commands like git checkout are
only allowed when workspace safety conditions hold.
Acceptance Criteria:
- Command rules support
conditionswithall/anynested groups. - Built-in git predicates:
git_is_worktree,git_clean_worktree,git_synced_to_upstream. - Each condition entry supports a
requiredflag (default:true). on_mismatchspecifies the fallback action (allow,deny,request) when conditions are not met.- Rules without conditions are treated as unconditional.
As a host hook layer, I consume the wrapper bundle for cross-language evaluators (Go/Rust/Zig) to make runtime conditional decisions, so that hosts without native conditional evaluation still enforce complex policies.
Acceptance Criteria:
- Resolved output includes a
policy_wrapperblock withschema_version, required conditions, and normalized command entries. - A
policy-wrapper-dispatch.manifest.jsonis generated alongside the wrapper bundle, containing the dispatch script path, bundle path, andrequired_conditionslist. - The manifest
dispatch_commandtemplate includes{command}and{cwd}template variables. - Evaluators respond with
allow,request, ordeny; host usesmissing_policy_defaultfor unmatched commands.
As a platform operator, I generate host-specific policy artifacts for Codex, Cursor, Claude, and Factory-Droid from a single resolved policy, so that each harness enforces the correct rules without manual configuration.
Acceptance Criteria:
scripts/sync_host_rules.py --policy-json <path> --out-dir <dir>generates:codex.rules-- Codexprefix_rule(...)entries (unconditional rules only).cursor.cli-config.json-- Cursor allow/deny shell rules.claude.settings.json-- Claude allow/deny/ask shell rules.factory-droid.settings.json-- Factory-Droid allow/request/deny command lists.policy-wrapper-rules.json-- machine schema for conditional evaluators.policy-wrapper-dispatch.manifest.json-- runtime wiring manifest.
--applywrites artifacts directly to live host config file locations.--jsonemits structured JSON output with rule counts and output paths.- Unconditional rules go into host fragments; conditional rules route into wrapper payloads.
As a Factory-Droid operator, I receive explicit allow/request/deny command lists so that Factory-Droid can surface user confirmation prompts for request-tier commands.
Acceptance Criteria:
factory-droid.settings.jsoncontainscommandAllowlist,commandRequestlist, andcommandDenylistkeys.- Request-action rules from resolved policy are placed in
commandRequestlist.
As a policy author, I validate all policy scope files against the canonical JSON schema so that malformed policy files are caught before deployment.
Acceptance Criteria:
scripts/validate_policy_contract.py --root <dir>validates all scope files againstagent-scope/policy_contract.schema.json.- Exit code 0 means all files pass; non-zero means at least one failure.
--jsonflag emits a structured JSON summary withchecked,missing,invalidcounts.- Governance failures list each failing file and the specific schema violations.
As a CI pipeline, I compare the current resolved policy against a committed canonical snapshot so that unintended policy changes are detected before merge.
Acceptance Criteria:
scripts/generate_policy_snapshot.py --check-existingcompares the freshly resolved policy against the committed snapshot file; exits non-zero on mismatch.--write-canonicalregenerates and commits all canonical snapshots.--validate-canonicalvalidates that all canonical snapshots exist and match current resolution.--jsonemits a structured result withstatus,policy_hash, and mismatch details.
As a governance enforcer, I verify that all policy scope files declare an allowed version so that version drift is caught early.
Acceptance Criteria:
scripts/check_policy_versions.py --root <dir>checks that every scope file has aversionfield whose value is in theallowed_versionslist.- Exit code 0 means all files pass; non-zero on any missing or invalid version.
--jsonemitsallowed_versions,observed_versions, andmissing_requiredcounts.
- Runtime predicate evaluation in Python (predicates are evaluated by Go/Rust/Zig wrappers or host hooks; Python only produces the bundle).
- Policy enforcement for non-agent human operators.
- Network-based policy distribution or remote policy fetch.
- GUI configuration of policy files.