The mental model behind Agents Shipgate, the deterministic merge gate for AI-generated agent capability changes — a local-first, static Tool-Use Readiness review.
For the product-level definition of a Tool-Use Readiness release gate, see
category.md. For the agent-facing
walkthrough, see AGENTS.md.
Tool-use readiness is the static check that an agent's tool surface is ready for promotion. It is not "did the tool call succeed" (a runtime concern) or "did the model pick the right tool" (an eval concern). It is the question a release reviewer answers at PR time:
Given the tool surface declared in this PR, do we have explicit approval policies, scope coverage, idempotency evidence, and review readiness for every action — before promotion?
Tool-use readiness has seven dimensions. agents-shipgate produces findings against each one.
| Dimension | What it asks | Evidence in the manifest |
|---|---|---|
| Inventory | What tools can the agent call? | A complete, named list — no wildcards, no "whatever this MCP server returns" |
| Schema | What inputs does each tool accept? | Strict JSON schema — additionalProperties: false, complete required, bounded numeric fields |
| Auth | What scopes does each tool need? | Declared per-tool or in permissions.scopes — narrower than the service account's actual scopes |
| Approval | Who reviews destructive actions before they fire? | policies.require_approval_for_tools: [...] for every write/destructive/financial action |
| Side effects | What does this tool change in the world? | Risk tags on the tool: write, destructive, external_write, financial_action, customer_communication |
| Idempotency | Can it be retried safely? | Idempotency key in the schema, documented retry policy, or explicit "do not retry" |
| Blast radius | If this tool fires unexpectedly, how bad is it? | Owner declared, prohibited actions enumerated, scope of resources bounded |
The tool surface is the set of named, schemaed actions an agent can invoke at runtime. It is declared via:
- Model Context Protocol (MCP) exports
- OpenAPI specs
- Framework-specific code (OpenAI Agents SDK Python, Google ADK, LangChain/LangGraph, CrewAI)
- API-specific artifacts (Anthropic Messages API tools.json, OpenAI Agents API function schemas)
The tool surface is a release artifact in the same sense as a service deployment's binary or an API contract: it's a checked-in, diff-able statement of what the agent can do, and it should be reviewed on every PR.
agents-shipgate is manifest-first: the canonical claim about an
agent's surface lives in a single shipgate.yaml checked into the
repo. Every tool source the manifest references is reviewed at scan
time. There is one place to look for "what does this agent ship with."
This is intentional. Implicit configurations (e.g. "use whatever the MCP registry returns") fail the inventory dimension above. The manifest is what makes the release gate reviewable.
agents-shipgate is static. It does not run the agent, invoke the model, call MCP servers, or make any network calls by default. Every finding is derived from the artifact diff alone.
Static analysis covers the Tool-Use Readiness release slice. Dynamic concerns — behavior under unusual inputs, runtime tool routing, latency, hallucination — belong in evals, observability, and runtime guardrails. agents-shipgate is additive to those, not a replacement.
| Guard | When it runs | What it catches |
|---|---|---|
| Tests | CI on every PR | Code paths in the agent's code |
| Evals | On a schedule or per release | Model behavior on curated inputs |
| agents-shipgate | CI on every PR | Tool surface, scopes, policies, prompt/surface alignment |
| Runtime guardrails / gateway | At call time | Per-call policy enforcement |
| Observability | Runtime | What actually happened in production |
Each catches something the others can't. Removing any of them is a regression.
category.md— the product-level "what is an agent release gate"checks.md— every check the scanner runsmanifest-v0.1.md— full manifest schematrust-model.md— local-only guarantees and disclosure processglossary.md— category vocabulary