Concepts

Vocabulary used throughout the rest of the wiki. Every term has a stable meaning in the JSON report and CLI output.

Manifest

A shipgate.yaml file. The single source of truth for one agent release. Every scan reads exactly one manifest and produces exactly one report. Schema is versioned (version: "0.1") and validated strictly — typos fail the scan with a suggested fix. Full grammar in Manifest Reference.

A manifest declares:

project / agent identity — names used in run IDs, fingerprints, finding evidence
declared purpose — short prose used to detect scope contradictions (e.g. read-only purpose + DELETE tool → finding)
environment.target — local | staging | production_like | production. Production targets fire stricter inventory checks
tool_sources — pointers to the supported tool surfaces: MCP exports, OpenAPI specs, OpenAI Agents SDK, Anthropic Messages API, Google ADK, LangChain/LangGraph, CrewAI, OpenAI API, Codex plugin packages, and n8n workflows
permissions / policies / risk_overrides — declared expectations against which checks fire
ci — advisory/strict mode and which severities fail
checks.ignore — explicit suppressions with required reasons

Tool surface

The complete set of tools an agent can call after scanning. Built by:

Loading every tool_sources[] entry and any openai_api artifacts
Flattening into a single list keyed by tool name
Resolving duplicates by source priority (openai_api > openapi > mcp > sdk_function); losers emit a Duplicate tool name warning
Enriching each tool with risk hints

The flattened list is what the check catalog operates on. It's also surfaced verbatim in report.json under tool_inventory.

Risk hint

A (tag, source, confidence, evidence) record attached to a tool. Tags include read_only, write, destructive, external_write, financial_action, customer_communication, sensitive_data_access, infrastructure_change, code_execution. Sources include openapi_method, mcp_annotation, sdk_keyword, manual (from risk_overrides). Confidence is low | medium | high.

Hints are inputs to checks, not findings themselves. Most checks demand min_confidence="medium" to fire. You can promote, demote, or remove hints per tool via risk_overrides.tools.

Internally hints are produced by core/risk_hints.py:_add_automatic_hints plus your manual overrides. The keyword classifier is fully tokenized — "deploy" matches the standalone token deploy but not the substring inside deployments.

Check

A pure function (ScanContext) -> list[Finding]. The 80+ built-in checks are listed in the Check Catalog and live under src/agents_shipgate/checks/. Each has:

a stable check ID (e.g. SHIP-POLICY-APPROVAL-MISSING)
a default severity (critical | high | medium | low | info)
a category — one of ~19, including inventory, schema, auth, scope, policy, side_effects, evidence, security, manifest, baseline, documentation, action_surface, verify, plus per-source families (api, adk, langchain, crewai, codex_plugin, n8n)

You can override the severity via checks.severity_overrides and add custom checks via Plugin Authoring.

Finding

A single scan output. Every finding has:

id — the fingerprint plus a content-derived discriminator on collision
fingerprint — a stable sha256(check_id | tool_name | canonical evidence)[:16], prefixed fp_
check_id, severity, category, title
tool_name / tool_id (or agent_id for agent-level findings)
evidence — structured payload describing why the check fired
recommendation — actionable next step
suppressed, suppression_reason
baseline_status — new, matched, or resolved when a baseline is applied

Fingerprints are content-addressed and stable across runs. They are the identity primitive used by suppressions and baselines.

Release decision

The release gate. report.json.release_decision.decision is one of blocked, review_required, insufficient_evidence, or passed, derived from the active findings, declared policies, and any baseline. It is baseline-aware — a baseline-matched critical lands in review_items (accepted debt), not blockers. Read this field for gating, not the legacy summary.status (kept baseline-blind for v0.7 callers). Treat unknown future enum values as review_required.

Capability change

The diff-derived delta in what an agent can do — tools, scopes, schemas, or policies added, modified, or removed between a base and a head ref. The verifier computes it so a reviewer sees exactly how a PR changed the agent's reach, not just the absolute surface.

Merge verdict & the verifier

agents-shipgate verify --base <ref> --head <ref> runs the gate on a PR diff and returns a merge verdict — a deterministic projection of release_decision.decision for the ongoing-PR flow. It writes agents-shipgate-reports/verifier.json with the trigger and base-scan orchestration status; that file is not a second verdict — the gate remains report.json.release_decision.decision. Trust-root edits (weakening policies, baselines, waivers, CI, or agent instructions) surface as SHIP-VERIFY-* findings routed to human review, so a change can't quietly disable its own gate.

Suppression

A manifest entry under checks.ignore that marks matching findings with suppressed: true and a required reason. Suppressed findings still appear in the JSON report (audit trail) but do not count toward severity totals or trigger CI failure. Stale suppressions (referencing missing checks or tools) emit SHIP-MANIFEST-STALE-SUPPRESSION.

Baseline

A snapshot of currently-active findings stored at .agents-shipgate/baseline.json. After saving a baseline, future scans tag each finding as matched (already in baseline), new, or resolved. Strict CI with --baseline fails only on new findings. See Baseline Workflow for the full pattern.

The fingerprint algorithm is v1 and intentionally excludes severity overrides, baseline status, source paths, timestamps, and the default_severity audit-evidence key — so a baseline survives manifest tweaks that don't change actual finding identity.

CI mode

advisory (default) or strict. Strict mode exits with code 20 on any unsuppressed finding whose severity is in ci.fail_on (default [critical]; configurable). Advisory mode never fails. See CI Recipes for usage patterns.

Trust model

Shipgate runs as a static analyzer. By default it does not import user code, run agents, call tools, invoke LLMs, connect to MCP servers, make network calls, or collect telemetry. The only opt-in to this guarantee is third-party check plugins, gated by AGENTS_SHIPGATE_ENABLE_PLUGINS=1 and overridable per-scan with --no-plugins. See Trust Model for details.

Exit codes

Code	Meaning
`0`	Scan completed; advisory mode or strict-mode pass
`2`	Manifest config error (typo, missing field)
`3`	Input parse error (malformed YAML/JSON, path traversal blocked, file too large)
`4`	Other Agents Shipgate error
`6`	Baseline integrity failure
`20`	Strict-mode gate failure (findings exist at `ci.fail_on` severity)

A nonzero exit is always either a real finding (20) or a real error (2/3/4). Check the stderr message to disambiguate. See Troubleshooting.

Agents Shipgate · Apache-2.0 · maintained by Three Moons Lab · Report a false positive

🏠 Home

Getting started

Reference

Workflows

Extending

Project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concepts

Concepts

Manifest

Tool surface

Risk hint

Check

Finding

Release decision

Capability change

Merge verdict & the verifier

Suppression

Baseline

CI mode

Trust model

Exit codes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally