-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
For contributors and integrators who want to understand the internals. Reflects
agents-shipgate v0.10.0 (report schema v0.22); internal module paths are not
a stable contract and may evolve — only the JSON wire contract and CLI surface in
STABILITY.md
are promised across 0.x.
Two execution shapes share one engine:
-
scan— the one-shot path: load → enrich → check → decide → report. -
verify— the verify-first merge gate for a PR. It evaluates the published trigger catalog against the local diff, optionally scans a locally available base tree into an isolated temp dir, then runs exactly one authoritative head scan. It projects amerge_verdictfrom the head scan'srelease_decisionand writesverifier.jsonas a trigger/base orchestration artifact — not a second verdict.report.json.release_decision.decisionremains the only release gate.verifynever fetches; the base ref must be made available beforehand.
(verify only) trigger eval over diff + optional isolated base scan
│
▼
shipgate.yaml + tool sources (10 adapter families)
│
▼
┌─────────────────────┐
│ config loader │ YAML → manifest (Pydantic, extra="forbid", strict)
└─────────────────────┘
│
▼
┌─────────────────────┐
│ inputs/ + adapters │ MCP · OpenAPI · OpenAI Agents SDK · Anthropic ·
│ │ Google ADK · LangChain · CrewAI · OpenAI API ·
│ │ Codex plugin · n8n → list[LoadedToolSource]
│ │ (+ third-party adapters via entry points)
└─────────────────────┘
│
▼
┌─────────────────────┐
│ flatten + dedup │ priority-based merge → list[Tool]
└─────────────────────┘
│
▼
┌─────────────────────┐
│ core/risk_hints │ HTTP method, MCP annotations, tokenized keyword
│ │ classifier, manual overrides → Tools w/ risk_hints
└─────────────────────┘
│
▼
┌─────────────────────┐
│ action surface │ action_surface_facts (one fact per loaded tool);
│ │ optional action_surface_diff vs base via --diff-from
└─────────────────────┘
│
▼
┌─────────────────────┐
│ checks/registry │ run_checks (80+ built-ins + opt-in plugins/policy packs)
│ │ → list[Finding] (incl. SHIP-ACTION-* / SHIP-VERIFY-*)
└─────────────────────┘
│
▼
┌─────────────────────┐
│ core/findings │ fingerprint + collision discriminator,
│ │ severity overrides, suppressions
└─────────────────────┘
│
▼
┌─────────────────────┐
│ core/baseline │ apply_baseline (matched / new / resolved)
└─────────────────────┘
│
▼
┌─────────────────────┐
│ release_decision │ baseline-aware gate → decision ∈
│ │ {blocked, review_required,
│ │ insufficient_evidence, passed}
└─────────────────────┘
│
▼
┌─────────────────────┐
│ report + packet │ ReadinessReport → report.{md,json,sarif}
│ │ + Release Evidence Packet packet.{md,json,html,pdf}
│ │ + GitHub step summary
│ ci/exit_policy │ exit code → 0 / 2 / 3 / 4 / 20
│ (verify) verifier │ project merge_verdict, write verifier.json + pr-comment.md
└─────────────────────┘
Illustrative (internal paths are not a stable contract):
src/agents_shipgate/
├── __main__.py # python -m agents_shipgate entry point
├── triggers.py # trigger-catalog evaluation (the one git-shelling module)
├── fixtures.py # bundled fixture resolution
├── cli/
│ ├── main.py # Typer app + per-command registrars (_register_*.py)
│ ├── scan/ # scan orchestration: load → enrich → check → decide → report
│ ├── verify/ # verify merge-gate orchestrator + cli/verify/git.py probe
│ ├── trigger.py # `trigger` subcommand
│ ├── bootstrap.py # detect → init → scan → apply-patches chain
│ ├── detect.py # workspace classifier
│ ├── discovery/ # workspace scan, manifest template, agent-instruction renderers
│ ├── apply_patches.py # file-grouped, containment-checked patch applier
│ ├── evidence_packet.py / explain_finding.py / findings.py / scenario.py
│ └── self_check.py / diagnostics.py / agent_mode.py
├── config/ # YAML loader + typo suggester
├── schemas/ # PUBLIC wire-contract models (the stable API surface)
│ ├── manifest/ # manifest v0.1 Pydantic models (strict, extra="forbid")
│ ├── report.py # ReadinessReport (report_schema_version "0.22")
│ ├── packet.py # EvidencePacket (packet_schema_version "0.6")
│ ├── baseline.py # baseline v0.5
│ ├── verifier.py / verification.py # verifier.json + VerificationContext
│ ├── capability_change.py / surfaces.py # capability delta + tool/action surfaces
│ └── contract.py # `contract --json` payload shape
├── inputs/ # one module per adapter family + protocol.py / adapter_validation.py
│ ├── mcp.py · openapi.py · openai_sdk_static.py · openai_api.py · anthropic_api.py
│ ├── google_adk.py · langchain.py · crewai.py · codex_plugin.py · n8n/
│ ├── policy_packs.py · validation.py · traces.py
│ └── common.py # resolve_input_path, load_structured_file, schema_to_parameters
├── core/
│ ├── context.py # ScanContext + VerificationContext (frozen)
│ ├── risk_hints.py # tokenized keyword classifier + manual overrides
│ ├── findings/ # fingerprinting, suppressions, release-decision gate
│ ├── lenses/ # reviewer/agent/action-surface projections (incl. diffs)
│ ├── baseline.py · baseline_audit.py # save/load/apply + tamper-evident audit log
│ ├── severity_overrides.py · dynamic_defaults.py · privacy.py · globbing.py
│ └── errors.py · logging.py
├── checks/
│ ├── registry.py # BUILTIN_CHECKS list + plugin/adapter loader
│ ├── base.py # tool_finding / agent_finding helpers (set provenance_kind)
│ ├── inventory · documentation · schema · auth · manifest_scope ·
│ │ manifest_consistency · policy · side_effects · evidence · security
│ ├── api.py # OpenAI API checks
│ ├── adk · langchain · crewai · codex_plugin · n8n # per-framework checks
│ ├── verify*.py # SHIP-VERIFY-* trust-root checks (policy, ci_gate,
│ │ # baseline/waiver, agent instructions, trigger drift)
│ └── baseline_integrity.py · plugin_validation.py
├── report/ # json_report.py · markdown.py · sarif.py · summary_text.py
├── packet/ # builder.py + md/html/pdf/json + evidence_matrix.py
└── ci/ # exit_policy.py (exit codes) + github_summary.py
Public wire-contract models (
Finding,ReadinessReport,CheckMetadata,EvidencePacket, etc.) live underagents_shipgate.schemas.*— that is the stable consumer API. Internal scan/domain containers live underagents_shipgate.core.*and are not a stable import surface.
Frozen value object passed to every check function: the loaded manifest, the
agent, the flattened+enriched tools, the config path, and the loaded API/
framework artifacts. Checks must not mutate it. A sibling VerificationContext
(populated only by verify or scan --changed-files) carries the diff/base
signals the SHIP-VERIFY-* trust-root checks read; a plain scan leaves it
absent, so those checks emit nothing.
Carries the union of fields a check might inspect: name, description, source_type, schemas, parameters, annotations, auth scopes, risk_hints, owner, extraction confidence. Source-specific fields (HTTP method, MCP annotation hints) live under annotations.
Required fields: check_id, title, severity, category, recommendation, provenance_kind. Optional: tool_id, tool_name, agent_id, evidence (free-form dict), confidence, source (SourceReference). Set after creation: id, fingerprint, suppressed, suppression_reason, baseline_status, blocks_release. provenance_kind (static_declaration | ast_extraction | keyword_heuristic | regex_heuristic | policy_pack) is a reviewer-triage signal only — it never gates release; blocks_release (v0.16+) is the explicit policy-blocking bit set by Action Surface Diff and policy-pack rules.
Used by list-checks / explain. Plugins attach a CheckMetadata (or compatible dict) as run.AGENTS_SHIPGATE_METADATA to register catalog entries. Carries floor_severity (a hard severity floor no override can cross) and dynamic_default (whether the emitted severity depends on manifest values; plugins may not set it).
The baseline-aware gate that turns the active finding set into
release_decision.decision ∈ {blocked, review_required,
insufficient_evidence, passed} plus blockers[] / review_items[] and a
per-finding contribution_rules[] audit. This is the only release gate.
summary.status is the legacy baseline-blind signal preserved for v0.7 callers —
a baseline-matched critical produces summary.status = "release_blockers_detected"
yet release_decision.decision = "review_required" (intentional divergence). New
consumers read release_decision.decision; merge_verdict in verifier.json is
a deterministic projection of it.
The most heuristic-laden module. Critical implementation notes:
-
Tokenized keyword matching. Names/descriptions/scopes are split into word tokens (
re.findall(r"[a-z]+", text.lower())) then intersected with module-level keyword sets. This avoids substring false positives ("deploy"matches the standalone token but not the substring inside"deployments"). -
Source-typed gating. The keyword classifier runs only for
openai_apiandsdk_functionsource types. OpenAPI-derived tools getread_only/writedirectly from HTTP method. -
SDK preview safety net. SDK functions whose tokens include
previewand have no HTTP method getread_onlyat HIGH confidence and are exempted from the keyword classifier — this is what protects fixture tools likesend_email_previewfrom being tagged as external_write. -
GET → read_only at HIGH. Any GET endpoint with no write hint gets
read_onlyat HIGH confidence sois_effectively_read_onlyshort-circuits policy/scope checks. The exception is GETs that pick up adestructivetag from operationId tokens (e.g.*_destroy_with_associated_resources) — those still flow through. -
Manual overrides win.
risk_overrides.tools.{tool}.tagsadd hints at HIGH manual confidence;remove_tagsremoves by tag regardless of source.
The full keyword sets live near the top of risk_hints.py and are documented in Check Catalog § Risk-hint reference.
def finding_fingerprint(finding: Finding) -> str:
identity = {
"check_id": finding.check_id,
"tool_name": finding.tool_name,
"evidence": _canonicalize_for_fingerprint(finding.evidence),
}
digest = hashlib.sha256(
json.dumps(identity, sort_keys=True, default=str).encode("utf-8")
).hexdigest()[:16]
return f"fp_{digest}"_canonicalize_for_fingerprint recursively sorts dict keys, sorts list items by JSON representation, and excludes two evidence keys: default_severity (the pre-override severity audit field, so severity_overrides are safe to apply before or after assign_finding_ids) and source_provenance (so adding local HITL provenance does not rotate existing baselines or suppressions). Note that since v0.18 the public findings[].fingerprint is computed from redacted evidence, so a finding whose identity evidence contains a recognized secret pattern gets a different public fingerprint than pre-v0.18 — --baseline scans check the legacy raw fingerprint in memory so old baselines keep matching until you re-run baseline save.
When two findings collide (same fingerprint), assign_finding_ids adds an 8-char content-derived discriminator built from agent_id, category, confidence, recommendation, source, title, tool_id, tool_name. The result is order-independent — running the same checks in a different order produces the same id for each finding.
Plugins are gated behind AGENTS_SHIPGATE_ENABLE_PLUGINS=1 (env) AND not overridden by --no-plugins (CLI). The loader:
- Calls
entry_points(group="agents_shipgate.checks"). - Skips entry points where
dist.metadata["Name"](normalized) equals"agents-shipgate"— protects against builtin spoofing. - Falls back to a value-prefix check when
distis None (rare; usually pip installs). - Collects each plugin's metadata into
loaded_plugins[]for the report.
See Plugin Authoring for the public-facing contract.
These are enforced by the test suite and grep-able from the source:
- No
subprocess,os.system,popenanywhere - No HTTP client (
requests,urllib,httpx,aiohttp) in scanner code - YAML uses
yaml.safe_load;!!python/object/...rejected - Path resolution rejects
..escape from manifest dir (tests/test_inputs.py::test_mcp_loader_rejects_path_traversal) - Plugin builtin spoof rejected (
tests/test_plugins.py::test_builtin_distribution_entry_points_are_skipped)
See Trust Model § Verifying these claims.
git clone https://github.com/ThreeMoonsLab/agents-shipgate.git
cd agents-shipgate
python -m pip install -e ".[dev]"
python -m pytest # full suite
python -m pytest tests/test_risk_hints.py # tokenization invariants
python -m pytest tests/test_adapter_static_only.py # trust-model invariant lint
python -m pytest tests/test_fixture_no_import.py # per-adapter no-import proof
python -m ruff check . # lintCI runs the Trust-model invariant lint (test_adapter_static_only.py) as a
dedicated step before the main suite, so any regression in the no-execute /
no-import property is visible at the top of the logs. CI also pins coverage,
Ruff lint, pip-audit for dependency vulnerabilities, and cyclonedx-py for
SBOM generation. Releases are signed with sigstore and published via PyPI Trusted
Publishing (.github/workflows/release.yml).
| You're adding… | File / pattern |
|---|---|
| A new check | Code in src/agents_shipgate/checks/<category>.py plus a BUILTIN_CHECKS entry in checks/registry.py. Add metadata to docs/checks/<category>.yaml (loaded into CHECK_METADATA at registry-import time by checks._metadata_loader), a test under tests/, and a row in docs/checks.md. Then regenerate docs/checks.json with python scripts/generate_schemas.py. Never change a check ID after it ships — only add new ones. Use the tool_finding/agent_finding helpers so provenance_kind is set. |
| A new risk-hint heuristic | Extend the automatic-hint pass in risk_hints.py. Add tests in tests/test_risk_hints.py covering both true positives and the edge case that motivated it. |
| A new input loader / adapter |
src/agents_shipgate/inputs/{name}.py returning a loaded source; for third-party distribution, implement the ToolSourceAdapter Protocol and register under the agents_shipgate.adapters entry-point group. Use resolve_input_path for paths. Start from docs/framework-adapter-checklist.md; adapters must be static-by-default (no user-code import, no network, no agent execution). |
| A new manifest field |
src/agents_shipgate/schemas/manifest/. The typo suggester picks it up automatically (no list update needed). Bump the manifest schema version only for a breaking grammar change. |
| A new CLI command | Register it in cli/main.py (per-command registrars live in cli/_register_*.py). Errors → ConfigError (exit 2), InputParseError (exit 3), AgentsShipgateError (exit 4). Add a --json form and a next_action payload for agent mode. |
See ROADMAP.md for the official direction. Known internal debts that contributors are welcome to take on:
-
Split
SHIP-API-OPERATIONAL-READINESSinto atomic check IDs (currently bundles retry, timeout, test cases, output schemas, traces). -
Strict mode default fails only on
critical— discussion ongoing about whether[critical, high]should be the implicit default. -
Baselines include
created_atand aren't byte-idempotent across runs — a content-only mode would improve git diffs.
(The legacy top-level check_severity_overrides alias was removed in v0.4 —
overrides live under checks.severity_overrides only.) Open issues with the
architecture label discuss these in detail.
Agents Shipgate · Apache-2.0 · maintained by Three Moons Lab · Report a false positive
Getting started
Reference
Workflows
Extending
Project