Concepts

The mental model behind Agents Shipgate, the deterministic merge gate for AI-generated agent capability changes — a local-first, static Tool-Use Readiness review.

For the product-level definition of a Tool-Use Readiness release gate, see category.md. For the agent-facing walkthrough, see AGENTS.md.

Tool-use readiness

Tool-use readiness is the static check that an agent's tool surface is ready for promotion. It is not "did the tool call succeed" (a runtime concern) or "did the model pick the right tool" (an eval concern). It is the question a release reviewer answers at PR time:

Given the tool surface declared in this PR, do we have explicit approval policies, scope coverage, idempotency evidence, and review readiness for every action — before promotion?

Tool-use readiness has seven dimensions. agents-shipgate produces findings against each one.

Dimension	What it asks	Evidence in the manifest
Inventory	What tools can the agent call?	A complete, named list — no wildcards, no "whatever this MCP server returns"
Schema	What inputs does each tool accept?	Strict JSON schema — `additionalProperties: false`, complete `required`, bounded numeric fields
Auth	What scopes does each tool need?	Declared per-tool or in `permissions.scopes` — narrower than the service account's actual scopes
Approval	Who reviews destructive actions before they fire?	`policies.require_approval_for_tools: [...]` for every write/destructive/financial action
Side effects	What does this tool change in the world?	Risk tags on the tool: `write`, `destructive`, `external_write`, `financial_action`, `customer_communication`
Idempotency	Can it be retried safely?	Idempotency key in the schema, documented retry policy, or explicit "do not retry"
Blast radius	If this tool fires unexpectedly, how bad is it?	Owner declared, prohibited actions enumerated, scope of resources bounded

Tool surface

The tool surface is the set of named, schemaed actions an agent can invoke at runtime. It is declared via:

Model Context Protocol (MCP) exports
OpenAPI specs
Framework-specific code (OpenAI Agents SDK Python, Google ADK, LangChain/LangGraph, CrewAI)
API-specific artifacts (Anthropic Messages API tools.json, OpenAI Agents API function schemas)

The tool surface is a release artifact in the same sense as a service deployment's binary or an API contract: it's a checked-in, diff-able statement of what the agent can do, and it should be reviewed on every PR.

Manifest-first

agents-shipgate is manifest-first: the canonical claim about an agent's surface lives in a single shipgate.yaml checked into the repo. Every tool source the manifest references is reviewed at scan time. There is one place to look for "what does this agent ship with."

This is intentional. Implicit configurations (e.g. "use whatever the MCP registry returns") fail the inventory dimension above. The manifest is what makes the release gate reviewable.

Static vs dynamic

agents-shipgate is static. It does not run the agent, invoke the model, call MCP servers, or make any network calls by default. Every finding is derived from the artifact diff alone.

Static analysis covers the Tool-Use Readiness release slice. Dynamic concerns — behavior under unusual inputs, runtime tool routing, latency, hallucination — belong in evals, observability, and runtime guardrails. agents-shipgate is additive to those, not a replacement.

Where this fits in the wider stack

Guard	When it runs	What it catches
Tests	CI on every PR	Code paths in the agent's code
Evals	On a schedule or per release	Model behavior on curated inputs
agents-shipgate	CI on every PR	Tool surface, scopes, policies, prompt/surface alignment
Runtime guardrails / gateway	At call time	Per-call policy enforcement
Observability	Runtime	What actually happened in production

Each catches something the others can't. Removing any of them is a regression.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concepts

Tool-use readiness

Tool surface

Manifest-first

Static vs dynamic

Where this fits in the wider stack

Related reading

FilesExpand file tree

concepts.md

Latest commit

History

concepts.md

File metadata and controls

Concepts

Tool-use readiness

Tool surface

Manifest-first

Static vs dynamic

Where this fits in the wider stack

Related reading