Architecture

For contributors and integrators who want to understand the internals. Reflects agents-shipgate v0.10.0 (report schema v0.22); internal module paths are not a stable contract and may evolve — only the JSON wire contract and CLI surface in STABILITY.md are promised across 0.x.

Two execution shapes share one engine:

scan — the one-shot path: load → enrich → check → decide → report.
verify — the verify-first merge gate for a PR. It evaluates the published trigger catalog against the local diff, optionally scans a locally available base tree into an isolated temp dir, then runs exactly one authoritative head scan. It projects a merge_verdict from the head scan's release_decision and writes verifier.json as a trigger/base orchestration artifact — not a second verdict. report.json.release_decision.decision remains the only release gate. verify never fetches; the base ref must be made available beforehand.

High-level data flow

(verify only) trigger eval over diff  +  optional isolated base scan
    │
    ▼
shipgate.yaml  +  tool sources (10 adapter families)
    │
    ▼
┌─────────────────────┐
│ config loader       │  YAML → manifest (Pydantic, extra="forbid", strict)
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ inputs/ + adapters  │  MCP · OpenAPI · OpenAI Agents SDK · Anthropic ·
│                     │  Google ADK · LangChain · CrewAI · OpenAI API ·
│                     │  Codex plugin · n8n  → list[LoadedToolSource]
│                     │  (+ third-party adapters via entry points)
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ flatten + dedup     │  priority-based merge → list[Tool]
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ core/risk_hints     │  HTTP method, MCP annotations, tokenized keyword
│                     │  classifier, manual overrides → Tools w/ risk_hints
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ action surface      │  action_surface_facts (one fact per loaded tool);
│                     │  optional action_surface_diff vs base via --diff-from
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ checks/registry     │  run_checks (80+ built-ins + opt-in plugins/policy packs)
│                     │  → list[Finding] (incl. SHIP-ACTION-* / SHIP-VERIFY-*)
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ core/findings       │  fingerprint + collision discriminator,
│                     │  severity overrides, suppressions
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ core/baseline       │  apply_baseline (matched / new / resolved)
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ release_decision    │  baseline-aware gate → decision ∈
│                     │  {blocked, review_required,
│                     │   insufficient_evidence, passed}
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ report + packet     │  ReadinessReport → report.{md,json,sarif}
│                     │  + Release Evidence Packet packet.{md,json,html,pdf}
│                     │  + GitHub step summary
│ ci/exit_policy      │  exit code → 0 / 2 / 3 / 4 / 20
│ (verify) verifier   │  project merge_verdict, write verifier.json + pr-comment.md
└─────────────────────┘

Module layout

Illustrative (internal paths are not a stable contract):

src/agents_shipgate/
├── __main__.py          # python -m agents_shipgate entry point
├── triggers.py          # trigger-catalog evaluation (the one git-shelling module)
├── fixtures.py          # bundled fixture resolution
├── cli/
│   ├── main.py          # Typer app + per-command registrars (_register_*.py)
│   ├── scan/            # scan orchestration: load → enrich → check → decide → report
│   ├── verify/          # verify merge-gate orchestrator + cli/verify/git.py probe
│   ├── trigger.py       # `trigger` subcommand
│   ├── bootstrap.py     # detect → init → scan → apply-patches chain
│   ├── detect.py        # workspace classifier
│   ├── discovery/       # workspace scan, manifest template, agent-instruction renderers
│   ├── apply_patches.py # file-grouped, containment-checked patch applier
│   ├── evidence_packet.py / explain_finding.py / findings.py / scenario.py
│   └── self_check.py / diagnostics.py / agent_mode.py
├── config/              # YAML loader + typo suggester
├── schemas/             # PUBLIC wire-contract models (the stable API surface)
│   ├── manifest/        # manifest v0.1 Pydantic models (strict, extra="forbid")
│   ├── report.py        # ReadinessReport (report_schema_version "0.22")
│   ├── packet.py        # EvidencePacket (packet_schema_version "0.6")
│   ├── baseline.py      # baseline v0.5
│   ├── verifier.py / verification.py   # verifier.json + VerificationContext
│   ├── capability_change.py / surfaces.py   # capability delta + tool/action surfaces
│   └── contract.py      # `contract --json` payload shape
├── inputs/              # one module per adapter family + protocol.py / adapter_validation.py
│   ├── mcp.py · openapi.py · openai_sdk_static.py · openai_api.py · anthropic_api.py
│   ├── google_adk.py · langchain.py · crewai.py · codex_plugin.py · n8n/
│   ├── policy_packs.py · validation.py · traces.py
│   └── common.py        # resolve_input_path, load_structured_file, schema_to_parameters
├── core/
│   ├── context.py       # ScanContext + VerificationContext (frozen)
│   ├── risk_hints.py    # tokenized keyword classifier + manual overrides
│   ├── findings/        # fingerprinting, suppressions, release-decision gate
│   ├── lenses/          # reviewer/agent/action-surface projections (incl. diffs)
│   ├── baseline.py · baseline_audit.py   # save/load/apply + tamper-evident audit log
│   ├── severity_overrides.py · dynamic_defaults.py · privacy.py · globbing.py
│   └── errors.py · logging.py
├── checks/
│   ├── registry.py      # BUILTIN_CHECKS list + plugin/adapter loader
│   ├── base.py          # tool_finding / agent_finding helpers (set provenance_kind)
│   ├── inventory · documentation · schema · auth · manifest_scope ·
│   │   manifest_consistency · policy · side_effects · evidence · security
│   ├── api.py           # OpenAI API checks
│   ├── adk · langchain · crewai · codex_plugin · n8n   # per-framework checks
│   ├── verify*.py       # SHIP-VERIFY-* trust-root checks (policy, ci_gate,
│   │                    # baseline/waiver, agent instructions, trigger drift)
│   └── baseline_integrity.py · plugin_validation.py
├── report/              # json_report.py · markdown.py · sarif.py · summary_text.py
├── packet/              # builder.py + md/html/pdf/json + evidence_matrix.py
└── ci/                  # exit_policy.py (exit codes) + github_summary.py

Key types

Public wire-contract models (Finding, ReadinessReport, CheckMetadata, EvidencePacket, etc.) live under agents_shipgate.schemas.* — that is the stable consumer API. Internal scan/domain containers live under agents_shipgate.core.* and are not a stable import surface.

`ScanContext` (`core/context.py`)

Frozen value object passed to every check function: the loaded manifest, the agent, the flattened+enriched tools, the config path, and the loaded API/ framework artifacts. Checks must not mutate it. A sibling VerificationContext (populated only by verify or scan --changed-files) carries the diff/base signals the SHIP-VERIFY-* trust-root checks read; a plain scan leaves it absent, so those checks emit nothing.

`Tool` (`schemas`)

Carries the union of fields a check might inspect: name, description, source_type, schemas, parameters, annotations, auth scopes, risk_hints, owner, extraction confidence. Source-specific fields (HTTP method, MCP annotation hints) live under annotations.

`Finding` (`schemas`)

Required fields: check_id, title, severity, category, recommendation, provenance_kind. Optional: tool_id, tool_name, agent_id, evidence (free-form dict), confidence, source (SourceReference). Set after creation: id, fingerprint, suppressed, suppression_reason, baseline_status, blocks_release. provenance_kind (static_declaration | ast_extraction | keyword_heuristic | regex_heuristic | policy_pack) is a reviewer-triage signal only — it never gates release; blocks_release (v0.16+) is the explicit policy-blocking bit set by Action Surface Diff and policy-pack rules.

`CheckMetadata` (`schemas`)

Used by list-checks / explain. Plugins attach a CheckMetadata (or compatible dict) as run.AGENTS_SHIPGATE_METADATA to register catalog entries. Carries floor_severity (a hard severity floor no override can cross) and dynamic_default (whether the emitted severity depends on manifest values; plugins may not set it).

The release decision (`core/findings`)

The baseline-aware gate that turns the active finding set into release_decision.decision ∈ {blocked, review_required, insufficient_evidence, passed} plus blockers[] / review_items[] and a per-finding contribution_rules[] audit. This is the only release gate. summary.status is the legacy baseline-blind signal preserved for v0.7 callers — a baseline-matched critical produces summary.status = "release_blockers_detected" yet release_decision.decision = "review_required" (intentional divergence). New consumers read release_decision.decision; merge_verdict in verifier.json is a deterministic projection of it.

Risk-hint classifier (`core/risk_hints.py`)

The most heuristic-laden module. Critical implementation notes:

Tokenized keyword matching. Names/descriptions/scopes are split into word tokens (re.findall(r"[a-z]+", text.lower())) then intersected with module-level keyword sets. This avoids substring false positives ("deploy" matches the standalone token but not the substring inside "deployments").
Source-typed gating. The keyword classifier runs only for openai_api and sdk_function source types. OpenAPI-derived tools get read_only / write directly from HTTP method.
SDK preview safety net. SDK functions whose tokens include preview and have no HTTP method get read_only at HIGH confidence and are exempted from the keyword classifier — this is what protects fixture tools like send_email_preview from being tagged as external_write.
GET → read_only at HIGH. Any GET endpoint with no write hint gets read_only at HIGH confidence so is_effectively_read_only short-circuits policy/scope checks. The exception is GETs that pick up a destructive tag from operationId tokens (e.g. *_destroy_with_associated_resources) — those still flow through.
Manual overrides win. risk_overrides.tools.{tool}.tags add hints at HIGH manual confidence; remove_tags removes by tag regardless of source.

The full keyword sets live near the top of risk_hints.py and are documented in Check Catalog § Risk-hint reference.

Fingerprint algorithm (`core/findings.py`)

def finding_fingerprint(finding: Finding) -> str:
    identity = {
        "check_id": finding.check_id,
        "tool_name": finding.tool_name,
        "evidence": _canonicalize_for_fingerprint(finding.evidence),
    }
    digest = hashlib.sha256(
        json.dumps(identity, sort_keys=True, default=str).encode("utf-8")
    ).hexdigest()[:16]
    return f"fp_{digest}"

_canonicalize_for_fingerprint recursively sorts dict keys, sorts list items by JSON representation, and excludes two evidence keys: default_severity (the pre-override severity audit field, so severity_overrides are safe to apply before or after assign_finding_ids) and source_provenance (so adding local HITL provenance does not rotate existing baselines or suppressions). Note that since v0.18 the public findings[].fingerprint is computed from redacted evidence, so a finding whose identity evidence contains a recognized secret pattern gets a different public fingerprint than pre-v0.18 — --baseline scans check the legacy raw fingerprint in memory so old baselines keep matching until you re-run baseline save.

When two findings collide (same fingerprint), assign_finding_ids adds an 8-char content-derived discriminator built from agent_id, category, confidence, recommendation, source, title, tool_id, tool_name. The result is order-independent — running the same checks in a different order produces the same id for each finding.

Plugin loader (`checks/registry.py`)

Plugins are gated behind AGENTS_SHIPGATE_ENABLE_PLUGINS=1 (env) AND not overridden by --no-plugins (CLI). The loader:

Calls entry_points(group="agents_shipgate.checks").
Skips entry points where dist.metadata["Name"] (normalized) equals "agents-shipgate" — protects against builtin spoofing.
Falls back to a value-prefix check when dist is None (rare; usually pip installs).
Collects each plugin's metadata into loaded_plugins[] for the report.

See Plugin Authoring for the public-facing contract.

Trust-posture invariants

These are enforced by the test suite and grep-able from the source:

No subprocess, os.system, popen anywhere
No HTTP client (requests, urllib, httpx, aiohttp) in scanner code
YAML uses yaml.safe_load; !!python/object/... rejected
Path resolution rejects .. escape from manifest dir (tests/test_inputs.py::test_mcp_loader_rejects_path_traversal)
Plugin builtin spoof rejected (tests/test_plugins.py::test_builtin_distribution_entry_points_are_skipped)

See Trust Model § Verifying these claims.

Testing

git clone https://github.com/ThreeMoonsLab/agents-shipgate.git
cd agents-shipgate
python -m pip install -e ".[dev]"
python -m pytest                                  # full suite
python -m pytest tests/test_risk_hints.py         # tokenization invariants
python -m pytest tests/test_adapter_static_only.py  # trust-model invariant lint
python -m pytest tests/test_fixture_no_import.py  # per-adapter no-import proof
python -m ruff check .                            # lint

CI runs the Trust-model invariant lint (test_adapter_static_only.py) as a dedicated step before the main suite, so any regression in the no-execute / no-import property is visible at the top of the logs. CI also pins coverage, Ruff lint, pip-audit for dependency vulnerabilities, and cyclonedx-py for SBOM generation. Releases are signed with sigstore and published via PyPI Trusted Publishing (.github/workflows/release.yml).

Where to add new code

You're adding…	File / pattern
A new check	Code in `src/agents_shipgate/checks/<category>.py` plus a `BUILTIN_CHECKS` entry in `checks/registry.py`. Add metadata to `docs/checks/<category>.yaml` (loaded into `CHECK_METADATA` at registry-import time by `checks._metadata_loader`), a test under `tests/`, and a row in `docs/checks.md`. Then regenerate `docs/checks.json` with `python scripts/generate_schemas.py`. Never change a check ID after it ships — only add new ones. Use the `tool_finding`/`agent_finding` helpers so `provenance_kind` is set.
A new risk-hint heuristic	Extend the automatic-hint pass in `risk_hints.py`. Add tests in `tests/test_risk_hints.py` covering both true positives and the edge case that motivated it.
A new input loader / adapter	`src/agents_shipgate/inputs/{name}.py` returning a loaded source; for third-party distribution, implement the `ToolSourceAdapter` Protocol and register under the `agents_shipgate.adapters` entry-point group. Use `resolve_input_path` for paths. Start from `docs/framework-adapter-checklist.md`; adapters must be static-by-default (no user-code import, no network, no agent execution).
A new manifest field	`src/agents_shipgate/schemas/manifest/`. The typo suggester picks it up automatically (no list update needed). Bump the manifest schema version only for a breaking grammar change.
A new CLI command	Register it in `cli/main.py` (per-command registrars live in `cli/_register_*.py`). Errors → `ConfigError` (exit 2), `InputParseError` (exit 3), `AgentsShipgateError` (exit 4). Add a `--json` form and a `next_action` payload for agent mode.

Roadmap and current debts

See ROADMAP.md for the official direction. Known internal debts that contributors are welcome to take on:

Split SHIP-API-OPERATIONAL-READINESS into atomic check IDs (currently bundles retry, timeout, test cases, output schemas, traces).
Strict mode default fails only on critical — discussion ongoing about whether [critical, high] should be the implicit default.
Baselines include created_at and aren't byte-idempotent across runs — a content-only mode would improve git diffs.

(The legacy top-level check_severity_overrides alias was removed in v0.4 — overrides live under checks.severity_overrides only.) Open issues with the architecture label discuss these in detail.

Agents Shipgate · Apache-2.0 · maintained by Three Moons Lab · Report a false positive

🏠 Home

Getting started

Reference

Workflows

Extending

Project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

Architecture

High-level data flow

Module layout

Key types

`ScanContext` (`core/context.py`)

`Tool` (`schemas`)

`Finding` (`schemas`)

`CheckMetadata` (`schemas`)

The release decision (`core/findings`)

Risk-hint classifier (`core/risk_hints.py`)

Fingerprint algorithm (`core/findings.py`)

Plugin loader (`checks/registry.py`)

Trust-posture invariants

Testing

Where to add new code

Roadmap and current debts

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Architecture

Architecture

High-level data flow

Module layout

Key types

ScanContext (core/context.py)

Tool (schemas)

Finding (schemas)

CheckMetadata (schemas)

The release decision (core/findings)

Risk-hint classifier (core/risk_hints.py)

Fingerprint algorithm (core/findings.py)

Plugin loader (checks/registry.py)

Trust-posture invariants

Testing

Where to add new code

Roadmap and current debts

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`ScanContext` (`core/context.py`)

`Tool` (`schemas`)

`Finding` (`schemas`)

`CheckMetadata` (`schemas`)

The release decision (`core/findings`)

Risk-hint classifier (`core/risk_hints.py`)

Fingerprint algorithm (`core/findings.py`)

Plugin loader (`checks/registry.py`)