From 69421a12830d53f096540e93c13b6bd066142065 Mon Sep 17 00:00:00 2001
From: Pengfei Hu <pengfei@threemoonslab.com>
Date: Mon, 1 Jun 2026 11:06:32 -0700
Subject: [PATCH] Fix packet low-confidence residuals

---
 CHANGELOG.md                                  |   4 +-
 .../support_refund_agent/expected/packet.html |   2 +-
 .../support_refund_agent/expected/packet.json |   4 +-
 .../support_refund_agent/expected/packet.md   |   2 +-
 src/agents_shipgate/packet/builder.py         |   2 +-
 tests/test_evidence_packet.py                 | 156 +++++++++++++++++-
 6 files changed, 163 insertions(+), 7 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index e21bebf4..ccda50db 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,8 @@
 
 ## Unreleased
 
+## 0.11.0 - 2026-05-31
+
 - **Verifier adoption-loop release prep.** Public docs and discovery metadata now
   lead with the verify-first adoption path, pinned `v0.11.0` snippets, verifier
   artifacts, merge verdicts, `fix_task`, and explicit Action merge-policy
@@ -10,8 +12,6 @@
   `agents-shipgate feedback export` command plus
   `docs/feedback-schema.v0.1.json` for redacted design-partner feedback loops.
 
-## 0.11.0 - 2026-05-31
-
 - **Verifier PR comment v2 + additive Action outputs.** The GitHub Action now
   defaults to the verifier workflow (`verify_mode: verify`) and the
   capability-review PR comment (`pr_comment_style: capability-review`) for the
diff --git a/samples/support_refund_agent/expected/packet.html b/samples/support_refund_agent/expected/packet.html
index 098e9f04..9d95f946 100644
--- a/samples/support_refund_agent/expected/packet.html
+++ b/samples/support_refund_agent/expected/packet.html
@@ -26,4 +26,4 @@
 .status-missing { color: #7f1d1d; }
 .status-informational { color: #555; }
 .meta { color: #555; font-size: 0.92rem; }
-</style></head><body><h1>Release Evidence Packet</h1><p class="meta">Project: <strong>support-refund-agent</strong> · Agent: <strong>refund-assistant</strong> · Environment: <strong>production_like</strong><br>Run id: <code>agents_shipgate_ebb71d7248235cc3</code> · Generated at: 2026-01-01T00:00:00+00:00 · Packet schema: 0.6</p><p>This packet is a reviewer-shaped synthesis of a static Agents Shipgate scan. See §10 for what the packet does <em>not</em> prove.</p><h2>§1 Release decision — <span class="verdict verdict-blocked">BLOCKED</span></h2><ul><li>Decision: <code>blocked</code></li><li>Reason: 2 active findings block release.</li><li>Blockers: 2</li><li>Review items: 16</li></ul><h3>CI gate behavior (informational)</h3><ul><li>ci_mode: <code>advisory</code>, would_fail_ci: <code>false</code>, exit code: <code>0</code></li><li>Note: CI behavior is metadata about the run gate, not the verdict. The verdict above derives from <code>release_decision.decision</code>.</li></ul><h3>Blockers</h3><ul><li><code>SHIP-POLICY-APPROVAL-MISSING</code> (critical): stripe.create_refund lacks a declared approval policy</li><li><code>SHIP-SIDEFX-IDEMPOTENCY-MISSING</code> (critical): stripe.create_refund lacks idempotency evidence</li></ul><h3>Review items</h3><ul><li><code>SHIP-INVENTORY-WILDCARD-TOOLS</code> (high): Wildcard tool exposure declared</li><li><code>SHIP-SCHEMA-MISSING-BOUNDS</code> (high): stripe.create_refund.amount has no maximum bound</li><li><code>SHIP-SCHEMA-BROAD-FREE-TEXT</code> (high): zendesk.update_ticket accepts broad free-form action input</li><li><code>SHIP-SCHEMA-BROAD-FREE-TEXT</code> (high): gmail.send_customer_email accepts broad free-form action input</li><li><code>SHIP-SCHEMA-FREEFORM-OUTPUT</code> (medium): send_email_preview returns free-form text output</li><li><code>SHIP-AUTH-MANIFEST-BROAD-SCOPE</code> (high): Manifest declares broad permission scopes</li><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): shopify.cancel_order requires scopes not declared in the manifest</li><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): support.search_kb requires scopes not declared in the manifest</li><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): gmail.send_customer_email requires scopes not declared in the manifest</li><li><code>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT</code> (high): stripe.create_refund appears to overlap with a prohibited action</li><li><code>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT</code> (high): gmail.send_customer_email appears to overlap with a prohibited action</li><li><code>SHIP-POLICY-CONFIRMATION-MISSING</code> (high): stripe.create_refund lacks a declared confirmation policy</li><li><code>SHIP-POLICY-CONFIRMATION-MISSING</code> (high): gmail.send_customer_email lacks a declared confirmation policy</li><li><code>SHIP-SIDEFX-IDEMPOTENCY-MISSING</code> (high): gmail.send_customer_email lacks idempotency evidence</li><li><code>SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING</code> (high): shopify.cancel_order is high-risk but has no owner</li><li><code>SHIP-MANIFEST-UNUSED-SCOPE</code> (medium): Manifest declares unused permission scope zendesk:tickets:read</li></ul><h2>§1A Evidence matrix — compact review summary</h2><ul><li>Evidence Matrix Light is derived from public report.json only. Release decisions, CI exit behavior, and baseline semantics remain owned by release_decision. Domain rows intentionally overlap; a single finding can appear in multiple rows when it is relevant to each review lens.</li></ul><table><thead><tr><th>Domain</th><th>Evidence present</th><th>Evidence source</th><th>Confidence</th><th>Missing controls</th><th>Blocking findings</th><th>Review items</th></tr></thead><tbody><tr><td>Inventory</td><td>partial</td><td>tool_inventory; tool_surface; +2 more</td><td>high</td><td>SHIP-INVENTORY-WILDCARD-TOOLS on wildcard_mcp_tools.*: Wildcard tool exposure declared</td><td>—</td><td>SHIP-INVENTORY-WILDCARD-TOOLS (high)</td></tr><tr><td>Schema</td><td>partial</td><td>tool_surface_facts.tools[].hashes; findings[]</td><td>mixed</td><td>SHIP-SCHEMA-MISSING-BOUNDS on stripe.create_refund: stripe.create_refund.amount has no maximum bound; SHIP-SCHEMA-BROAD-FREE-TEXT on zendesk.update_ticket: zendesk.update_ticket accepts broad free-form action input; +2 more</td><td>—</td><td>SHIP-SCHEMA-MISSING-BOUNDS (high); SHIP-SCHEMA-BROAD-FREE-TEXT (high); +2 more</td></tr><tr><td>Auth</td><td>partial</td><td>tool_surface_facts.scopes; tool_inventory[].auth_scopes; +1 more</td><td>mixed</td><td>SHIP-AUTH-MANIFEST-BROAD-SCOPE: Manifest declares broad permission scopes; SHIP-AUTH-SCOPE-COVERAGE-MISSING on shopify.cancel_order: shopify.cancel_order requires scopes not declared in the manifest; +3 more</td><td>—</td><td>SHIP-AUTH-MANIFEST-BROAD-SCOPE (high); SHIP-AUTH-SCOPE-COVERAGE-MISSING (high); +3 more</td></tr><tr><td>Approval</td><td>partial</td><td>tool_surface_facts.controls[kind=approval_policy]; findings[]</td><td>high</td><td>SHIP-POLICY-APPROVAL-MISSING on stripe.create_refund: stripe.create_refund lacks a declared approval policy</td><td>SHIP-POLICY-APPROVAL-MISSING (critical)</td><td>—</td></tr><tr><td>Confirmation</td><td>partial</td><td>tool_surface_facts.controls[kind=confirmation_policy]; findings[]</td><td>high</td><td>SHIP-POLICY-CONFIRMATION-MISSING on stripe.create_refund: stripe.create_refund lacks a declared confirmation policy; SHIP-POLICY-CONFIRMATION-MISSING on gmail.send_customer_email: gmail.send_customer_email lacks a declared confirmation policy</td><td>—</td><td>SHIP-POLICY-CONFIRMATION-MISSING (high); SHIP-POLICY-CONFIRMATION-MISSING (high)</td></tr><tr><td>Idempotency</td><td>partial</td><td>tool_surface_facts.controls[kind=idempotency_evidence]; action_surface_facts.actions[].safeguards.idempotency; +1 more</td><td>mixed</td><td>SHIP-SIDEFX-IDEMPOTENCY-MISSING on stripe.create_refund: stripe.create_refund lacks idempotency evidence; SHIP-SIDEFX-IDEMPOTENCY-MISSING on gmail.send_customer_email: gmail.send_customer_email lacks idempotency evidence</td><td>SHIP-SIDEFX-IDEMPOTENCY-MISSING (critical)</td><td>SHIP-SIDEFX-IDEMPOTENCY-MISSING (high)</td></tr><tr><td>Side effects</td><td>partial</td><td>tool_inventory[].risk_tags; action_surface_facts.actions[].effect; +1 more</td><td>mixed</td><td>SHIP-SCHEMA-BROAD-FREE-TEXT on zendesk.update_ticket: zendesk.update_ticket accepts broad free-form action input; SHIP-SCHEMA-BROAD-FREE-TEXT on gmail.send_customer_email: gmail.send_customer_email accepts broad free-form action input; +5 more</td><td>SHIP-POLICY-APPROVAL-MISSING (critical); SHIP-SIDEFX-IDEMPOTENCY-MISSING (critical)</td><td>SHIP-SCHEMA-BROAD-FREE-TEXT (high); SHIP-SCHEMA-BROAD-FREE-TEXT (high); +3 more</td></tr><tr><td>Memory isolation</td><td>not_declared</td><td>—</td><td>unknown</td><td>—</td><td>—</td><td>—</td></tr><tr><td>Human-in-the-loop evidence</td><td>not_declared</td><td>—</td><td>unknown</td><td>—</td><td>—</td><td>—</td></tr><tr><td>Prompt/scope alignment</td><td>partial</td><td>declared_intentions; misalignments; +2 more</td><td>medium</td><td>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT on stripe.create_refund: stripe.create_refund appears to overlap with a prohibited action; SHIP-SCOPE-PROHIBITED-TOOL-PRESENT on gmail.send_customer_email: gmail.send_customer_email appears to overlap with a prohibited action</td><td>—</td><td>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT (high); SHIP-SCOPE-PROHIBITED-TOOL-PRESENT (high)</td></tr><tr><td>Retry/timeout</td><td>not_declared</td><td>—</td><td>unknown</td><td>—</td><td>—</td><td>—</td></tr><tr><td>Baseline debt</td><td>informational</td><td>—</td><td>unknown</td><td>—</td><td>—</td><td>—</td></tr><tr><td>Action-surface policy</td><td>covered</td><td>action_surface_facts.actions</td><td>medium</td><td>—</td><td>—</td><td>—</td></tr></tbody></table><h2>§2 Capability ↔ Intent diff — <span class="status-missing">missing</span></h2><h3>Declared</h3><ul><li>Purpose: answer refund policy questions</li><li>Purpose: prepare refund requests for human review</li><li>Purpose: update support ticket notes</li><li>Prohibited: issue refund without approval</li><li>Prohibited: cancel order without explicit confirmation</li><li>Prohibited: send external email without preview</li></ul><h3>Observed tools</h3><ul><li><code>gmail.send_customer_email</code></li><li><code>refund_status_lookup</code></li><li><code>send_email_preview</code></li><li><code>shopify.cancel_order</code></li><li><code>stripe.create_refund</code></li><li><code>support.search_kb</code></li><li><code>wildcard_mcp_tools.*</code></li><li><code>zendesk.update_ticket</code></li></ul><h3>Divergences</h3><ul><li><code>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT</code>: stripe.create_refund appears to overlap with a prohibited action</li><li><code>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT</code>: gmail.send_customer_email appears to overlap with a prohibited action</li></ul><h2>§3 High-risk tool surface — <span class="status-partial">partial</span></h2><p class="meta">Total tools: 8 · High-risk: 3</p><table><thead><tr><th>Tool</th><th>Source</th><th>Risk tags</th><th>Approval</th><th>Idempotency</th></tr></thead><tbody><tr><td><code>gmail.send_customer_email</code></td><td>mcp</td><td>customer_communication, external_write</td><td>no</td><td>no</td></tr><tr><td><code>shopify.cancel_order</code></td><td>openapi</td><td>destructive, write</td><td>yes</td><td>yes</td></tr><tr><td><code>stripe.create_refund</code></td><td>openapi</td><td>external_write, financial_action, write</td><td>no</td><td>no</td></tr></tbody></table><h2>§3A Tool-surface diff — <span class="status-not_declared">not declared</span></h2><p>Status: disabled — No --diff-from report or v0.3 baseline snapshot was provided.<br>Base: <code>none</code></p><h2>§3B Action-surface diff — <span class="status-not_declared">not declared</span></h2><p>Status: disabled — No action-surface comparison source was provided.<br>Base: <code>none</code></p><h2>§4 Approval policy coverage — <span class="status-partial">partial</span></h2><table><thead><tr><th>Tool</th><th>Declared</th><th>Source</th><th>Gap finding(s)</th></tr></thead><tbody><tr><td><code>shopify.cancel_order</code></td><td>yes</td><td>policies</td><td>—</td></tr><tr><td><code>stripe.create_refund</code></td><td>no</td><td>—</td><td>fp_f092940f62fbb012</td></tr></tbody></table><h3>Gap findings</h3><ul><li><code>SHIP-POLICY-APPROVAL-MISSING</code> (critical): stripe.create_refund lacks a declared approval policy</li></ul><h2>§5 Idempotency / retry risk — <span class="status-partial">partial</span></h2><p>Retry policy: <strong>not declared</strong></p><table><thead><tr><th>Tool</th><th>Declared</th><th>Source</th><th>Gap finding(s)</th></tr></thead><tbody><tr><td><code>gmail.send_customer_email</code></td><td>no</td><td>—</td><td>fp_0f8aaa912d589cf0</td></tr><tr><td><code>shopify.cancel_order</code></td><td>yes</td><td>policies</td><td>—</td></tr><tr><td><code>stripe.create_refund</code></td><td>no</td><td>—</td><td>fp_dac8011e14c53777</td></tr></tbody></table><h3>Gap findings</h3><ul><li><code>SHIP-SIDEFX-IDEMPOTENCY-MISSING</code> (critical): stripe.create_refund lacks idempotency evidence</li><li><code>SHIP-SIDEFX-IDEMPOTENCY-MISSING</code> (high): gmail.send_customer_email lacks idempotency evidence</li></ul><h2>§6 Scope coverage — <span class="status-missing">missing</span></h2><h3>Declared scopes</h3><ul><li><code>zendesk:tickets:read</code></li><li><code>zendesk:tickets:write</code></li><li><code>stripe:*</code></li></ul><table><thead><tr><th>Scope</th><th>Declared</th><th>Used by tools</th></tr></thead><tbody><tr><td><code>gmail:send</code></td><td>no</td><td><code>gmail.send_customer_email</code></td></tr><tr><td><code>shopify:orders:write</code></td><td>no</td><td><code>shopify.cancel_order</code></td></tr><tr><td><code>stripe:*</code></td><td>yes</td><td>—</td></tr><tr><td><code>stripe:refunds:write</code></td><td>yes</td><td><code>stripe.create_refund</code></td></tr><tr><td><code>support:kb:read</code></td><td>no</td><td><code>support.search_kb</code></td></tr><tr><td><code>zendesk:tickets:read</code></td><td>yes</td><td>—</td></tr><tr><td><code>zendesk:tickets:write</code></td><td>yes</td><td><code>zendesk.update_ticket</code></td></tr></tbody></table><h3>Unused declared scopes</h3><ul><li><code>zendesk:tickets:read</code></li></ul><h3>Used by tools but not declared</h3><ul><li><code>gmail:send</code></li><li><code>shopify:orders:write</code></li><li><code>support:kb:read</code></li></ul><h3>Gap findings</h3><ul><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): shopify.cancel_order requires scopes not declared in the manifest</li><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): support.search_kb requires scopes not declared in the manifest</li><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): gmail.send_customer_email requires scopes not declared in the manifest</li><li><code>SHIP-MANIFEST-UNUSED-SCOPE</code> (medium): Manifest declares unused permission scope zendesk:tickets:read</li></ul><h2>§7 Memory isolation — <span class="status-not_declared">not declared</span></h2><p>Manifest does not declare a memory isolation policy. The current manifest schema (v0.1) has no agent.memory field. See §10 for the residual review item.</p><h2>§8 Human-in-the-loop evidence — <span class="status-covered">covered</span></h2><ul><li>Configured: yes</li><li>Human review recommended: yes</li><li>Provenance mode: <code>fresh_scan</code></li><li>HITL evidence is local review evidence only. Missing local evidence does not prove a runtime control is absent, and present local evidence does not certify runtime enforcement.</li></ul><h3>Approval-required tools</h3><ul><li><code>shopify.cancel_order</code></li></ul><h3>Confirmation-required tools</h3><ul><li><code>shopify.cancel_order</code></li></ul><h2>§9 Required dynamic scenarios — <span class="status-partial">partial</span></h2><ul><li><strong>Manual review for SHIP-AUTH-MANIFEST-BROAD-SCOPE</strong> — Replace broad manifest permission scopes with the narrowest scopes needed for this release.<br><span class="meta">Related finding(s): fp_d27325cbdbbf5483</span></li><li><strong>Manual review for SHIP-AUTH-SCOPE-COVERAGE-MISSING</strong> — Add the required scopes for shopify.cancel_order to permissions.scopes or narrow the tool&#x27;s declared auth requirements.<br><span class="meta">Related finding(s): fp_1f6cfd6b7daa9b7c, fp_83852fbd6b440524, fp_d8e6d1865dae97cc</span></li><li><strong>Manual review for SHIP-INVENTORY-WILDCARD-TOOLS</strong> — Replace wildcard tool exposure with an explicit tool allowlist before release review.<br><span class="meta">Related finding(s): fp_fc02d8ecd30f2578</span></li><li><strong>Manual review for SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING</strong> — Declare an owner for each high-risk production tool in risk_overrides.tools.<br><span class="meta">Related finding(s): fp_fd2577850cef1f87</span></li><li><strong>Manual review for SHIP-MANIFEST-UNUSED-SCOPE</strong> — Remove unused manifest scopes or add tool metadata showing why they are required.<br><span class="meta">Related finding(s): fp_39b9ae878f343d1b</span></li><li><strong>Manual review for SHIP-POLICY-APPROVAL-MISSING</strong> — Declare an approval policy for stripe.create_refund or remove this tool from the release.<br><span class="meta">Related finding(s): fp_f092940f62fbb012</span></li><li><strong>Manual review for SHIP-POLICY-CONFIRMATION-MISSING</strong> — Declare a user confirmation policy for stripe.create_refund or remove this action from the release.<br><span class="meta">Related finding(s): fp_8e08a4fe6b0917f6, fp_a62ca2fd9a68a1d1</span></li><li><strong>Manual review for SHIP-SCHEMA-BROAD-FREE-TEXT</strong> — Constrain zendesk.update_ticket.updates with an enum, structured schema, or narrower field-specific parameters.<br><span class="meta">Related finding(s): fp_acd63b899d49aa1c, fp_ff2f028953d1c220</span></li><li><strong>Manual review for SHIP-SCHEMA-FREEFORM-OUTPUT</strong> — Prefer a structured output schema for send_email_preview, especially when output is later passed back into model context.<br><span class="meta">Related finding(s): fp_85f8513ad72cd9ea</span></li><li><strong>Manual review for SHIP-SCHEMA-MISSING-BOUNDS</strong> — Add a maximum bound to stripe.create_refund.amount or document an equivalent limit in the tool policy.<br><span class="meta">Related finding(s): fp_ab60b01cb53cfcbe</span></li><li><strong>Manual review for SHIP-SCOPE-PROHIBITED-TOOL-PRESENT</strong> — Remove stripe.create_refund, narrow its policy, or revise prohibited_actions so the manifest and tool surface do not contradict each other.<br><span class="meta">Related finding(s): fp_12985c36a06026de, fp_e090c62e390e70ab</span></li><li><strong>Manual review for SHIP-SIDEFX-IDEMPOTENCY-MISSING</strong> — Add an idempotency key, idempotent annotation, or declared idempotency policy for stripe.create_refund.<br><span class="meta">Related finding(s): fp_0f8aaa912d589cf0, fp_dac8011e14c53777</span></li><li><strong>Re-run scan after resolving source warnings</strong> — Source loaders emitted warnings; some tool surfaces may have been parsed with reduced confidence.</li><li><strong>Verify low-confidence tool extractions</strong> — One or more tools were extracted with low confidence; confirm against the upstream source before release.</li></ul><h2>§10 What this packet did NOT prove</h2><p>Agents Shipgate is an advisory tool: the deterministic merge gate for AI-generated agent capability changes, run as a local-first, static Tool-Use Readiness review. The packet below is derived from a scan; it does not, by itself, prove the following properties:</p><ul><li><strong>Prompt robustness.</strong> Whether the agent&#x27;s prompt holds up under jailbreaks, persona drift, indirect prompt injection, or adversarial inputs.</li><li><strong>Runtime behavior.</strong> Whether the agent actually invokes only the declared tools, respects approval gates at runtime, or follows policy under load. Static config is not runtime evidence.</li><li><strong>Model correctness.</strong> Whether the underlying model produces correct outputs, calls the right tools, or stays within the declared scope. The packet does not benchmark the model.</li><li><strong>Adversarial resistance.</strong> Whether the agent withstands red-team or penetration testing. The packet does not run scenarios; it organizes evidence.</li></ul><h3>Per-run residuals</h3><ul><li>Source warnings:<ul><li>MCP source declares wildcard tool exposure</li></ul></li><li>Low-confidence tool extractions: none</li><li>Suppressed findings in effect: none</li><li>Memory isolation is not modeled by the v0.1 manifest schema; no static evidence is available.</li><li>6 active finding(s) came from heuristic provenance (keyword_heuristic=6, regex_heuristic=0); review the finding evidence before acting.</li></ul></body></html>
+</style></head><body><h1>Release Evidence Packet</h1><p class="meta">Project: <strong>support-refund-agent</strong> · Agent: <strong>refund-assistant</strong> · Environment: <strong>production_like</strong><br>Run id: <code>agents_shipgate_ebb71d7248235cc3</code> · Generated at: 2026-01-01T00:00:00+00:00 · Packet schema: 0.6</p><p>This packet is a reviewer-shaped synthesis of a static Agents Shipgate scan. See §10 for what the packet does <em>not</em> prove.</p><h2>§1 Release decision — <span class="verdict verdict-blocked">BLOCKED</span></h2><ul><li>Decision: <code>blocked</code></li><li>Reason: 2 active findings block release.</li><li>Blockers: 2</li><li>Review items: 16</li></ul><h3>CI gate behavior (informational)</h3><ul><li>ci_mode: <code>advisory</code>, would_fail_ci: <code>false</code>, exit code: <code>0</code></li><li>Note: CI behavior is metadata about the run gate, not the verdict. The verdict above derives from <code>release_decision.decision</code>.</li></ul><h3>Blockers</h3><ul><li><code>SHIP-POLICY-APPROVAL-MISSING</code> (critical): stripe.create_refund lacks a declared approval policy</li><li><code>SHIP-SIDEFX-IDEMPOTENCY-MISSING</code> (critical): stripe.create_refund lacks idempotency evidence</li></ul><h3>Review items</h3><ul><li><code>SHIP-INVENTORY-WILDCARD-TOOLS</code> (high): Wildcard tool exposure declared</li><li><code>SHIP-SCHEMA-MISSING-BOUNDS</code> (high): stripe.create_refund.amount has no maximum bound</li><li><code>SHIP-SCHEMA-BROAD-FREE-TEXT</code> (high): zendesk.update_ticket accepts broad free-form action input</li><li><code>SHIP-SCHEMA-BROAD-FREE-TEXT</code> (high): gmail.send_customer_email accepts broad free-form action input</li><li><code>SHIP-SCHEMA-FREEFORM-OUTPUT</code> (medium): send_email_preview returns free-form text output</li><li><code>SHIP-AUTH-MANIFEST-BROAD-SCOPE</code> (high): Manifest declares broad permission scopes</li><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): shopify.cancel_order requires scopes not declared in the manifest</li><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): support.search_kb requires scopes not declared in the manifest</li><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): gmail.send_customer_email requires scopes not declared in the manifest</li><li><code>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT</code> (high): stripe.create_refund appears to overlap with a prohibited action</li><li><code>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT</code> (high): gmail.send_customer_email appears to overlap with a prohibited action</li><li><code>SHIP-POLICY-CONFIRMATION-MISSING</code> (high): stripe.create_refund lacks a declared confirmation policy</li><li><code>SHIP-POLICY-CONFIRMATION-MISSING</code> (high): gmail.send_customer_email lacks a declared confirmation policy</li><li><code>SHIP-SIDEFX-IDEMPOTENCY-MISSING</code> (high): gmail.send_customer_email lacks idempotency evidence</li><li><code>SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING</code> (high): shopify.cancel_order is high-risk but has no owner</li><li><code>SHIP-MANIFEST-UNUSED-SCOPE</code> (medium): Manifest declares unused permission scope zendesk:tickets:read</li></ul><h2>§1A Evidence matrix — compact review summary</h2><ul><li>Evidence Matrix Light is derived from public report.json only. Release decisions, CI exit behavior, and baseline semantics remain owned by release_decision. Domain rows intentionally overlap; a single finding can appear in multiple rows when it is relevant to each review lens.</li></ul><table><thead><tr><th>Domain</th><th>Evidence present</th><th>Evidence source</th><th>Confidence</th><th>Missing controls</th><th>Blocking findings</th><th>Review items</th></tr></thead><tbody><tr><td>Inventory</td><td>partial</td><td>tool_inventory; tool_surface; +2 more</td><td>high</td><td>SHIP-INVENTORY-WILDCARD-TOOLS on wildcard_mcp_tools.*: Wildcard tool exposure declared</td><td>—</td><td>SHIP-INVENTORY-WILDCARD-TOOLS (high)</td></tr><tr><td>Schema</td><td>partial</td><td>tool_surface_facts.tools[].hashes; findings[]</td><td>mixed</td><td>SHIP-SCHEMA-MISSING-BOUNDS on stripe.create_refund: stripe.create_refund.amount has no maximum bound; SHIP-SCHEMA-BROAD-FREE-TEXT on zendesk.update_ticket: zendesk.update_ticket accepts broad free-form action input; +2 more</td><td>—</td><td>SHIP-SCHEMA-MISSING-BOUNDS (high); SHIP-SCHEMA-BROAD-FREE-TEXT (high); +2 more</td></tr><tr><td>Auth</td><td>partial</td><td>tool_surface_facts.scopes; tool_inventory[].auth_scopes; +1 more</td><td>mixed</td><td>SHIP-AUTH-MANIFEST-BROAD-SCOPE: Manifest declares broad permission scopes; SHIP-AUTH-SCOPE-COVERAGE-MISSING on shopify.cancel_order: shopify.cancel_order requires scopes not declared in the manifest; +3 more</td><td>—</td><td>SHIP-AUTH-MANIFEST-BROAD-SCOPE (high); SHIP-AUTH-SCOPE-COVERAGE-MISSING (high); +3 more</td></tr><tr><td>Approval</td><td>partial</td><td>tool_surface_facts.controls[kind=approval_policy]; findings[]</td><td>high</td><td>SHIP-POLICY-APPROVAL-MISSING on stripe.create_refund: stripe.create_refund lacks a declared approval policy</td><td>SHIP-POLICY-APPROVAL-MISSING (critical)</td><td>—</td></tr><tr><td>Confirmation</td><td>partial</td><td>tool_surface_facts.controls[kind=confirmation_policy]; findings[]</td><td>high</td><td>SHIP-POLICY-CONFIRMATION-MISSING on stripe.create_refund: stripe.create_refund lacks a declared confirmation policy; SHIP-POLICY-CONFIRMATION-MISSING on gmail.send_customer_email: gmail.send_customer_email lacks a declared confirmation policy</td><td>—</td><td>SHIP-POLICY-CONFIRMATION-MISSING (high); SHIP-POLICY-CONFIRMATION-MISSING (high)</td></tr><tr><td>Idempotency</td><td>partial</td><td>tool_surface_facts.controls[kind=idempotency_evidence]; action_surface_facts.actions[].safeguards.idempotency; +1 more</td><td>mixed</td><td>SHIP-SIDEFX-IDEMPOTENCY-MISSING on stripe.create_refund: stripe.create_refund lacks idempotency evidence; SHIP-SIDEFX-IDEMPOTENCY-MISSING on gmail.send_customer_email: gmail.send_customer_email lacks idempotency evidence</td><td>SHIP-SIDEFX-IDEMPOTENCY-MISSING (critical)</td><td>SHIP-SIDEFX-IDEMPOTENCY-MISSING (high)</td></tr><tr><td>Side effects</td><td>partial</td><td>tool_inventory[].risk_tags; action_surface_facts.actions[].effect; +1 more</td><td>mixed</td><td>SHIP-SCHEMA-BROAD-FREE-TEXT on zendesk.update_ticket: zendesk.update_ticket accepts broad free-form action input; SHIP-SCHEMA-BROAD-FREE-TEXT on gmail.send_customer_email: gmail.send_customer_email accepts broad free-form action input; +5 more</td><td>SHIP-POLICY-APPROVAL-MISSING (critical); SHIP-SIDEFX-IDEMPOTENCY-MISSING (critical)</td><td>SHIP-SCHEMA-BROAD-FREE-TEXT (high); SHIP-SCHEMA-BROAD-FREE-TEXT (high); +3 more</td></tr><tr><td>Memory isolation</td><td>not_declared</td><td>—</td><td>unknown</td><td>—</td><td>—</td><td>—</td></tr><tr><td>Human-in-the-loop evidence</td><td>not_declared</td><td>—</td><td>unknown</td><td>—</td><td>—</td><td>—</td></tr><tr><td>Prompt/scope alignment</td><td>partial</td><td>declared_intentions; misalignments; +2 more</td><td>medium</td><td>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT on stripe.create_refund: stripe.create_refund appears to overlap with a prohibited action; SHIP-SCOPE-PROHIBITED-TOOL-PRESENT on gmail.send_customer_email: gmail.send_customer_email appears to overlap with a prohibited action</td><td>—</td><td>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT (high); SHIP-SCOPE-PROHIBITED-TOOL-PRESENT (high)</td></tr><tr><td>Retry/timeout</td><td>not_declared</td><td>—</td><td>unknown</td><td>—</td><td>—</td><td>—</td></tr><tr><td>Baseline debt</td><td>informational</td><td>—</td><td>unknown</td><td>—</td><td>—</td><td>—</td></tr><tr><td>Action-surface policy</td><td>covered</td><td>action_surface_facts.actions</td><td>medium</td><td>—</td><td>—</td><td>—</td></tr></tbody></table><h2>§2 Capability ↔ Intent diff — <span class="status-missing">missing</span></h2><h3>Declared</h3><ul><li>Purpose: answer refund policy questions</li><li>Purpose: prepare refund requests for human review</li><li>Purpose: update support ticket notes</li><li>Prohibited: issue refund without approval</li><li>Prohibited: cancel order without explicit confirmation</li><li>Prohibited: send external email without preview</li></ul><h3>Observed tools</h3><ul><li><code>gmail.send_customer_email</code></li><li><code>refund_status_lookup</code></li><li><code>send_email_preview</code></li><li><code>shopify.cancel_order</code></li><li><code>stripe.create_refund</code></li><li><code>support.search_kb</code></li><li><code>wildcard_mcp_tools.*</code></li><li><code>zendesk.update_ticket</code></li></ul><h3>Divergences</h3><ul><li><code>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT</code>: stripe.create_refund appears to overlap with a prohibited action</li><li><code>SHIP-SCOPE-PROHIBITED-TOOL-PRESENT</code>: gmail.send_customer_email appears to overlap with a prohibited action</li></ul><h2>§3 High-risk tool surface — <span class="status-partial">partial</span></h2><p class="meta">Total tools: 8 · High-risk: 3</p><table><thead><tr><th>Tool</th><th>Source</th><th>Risk tags</th><th>Approval</th><th>Idempotency</th></tr></thead><tbody><tr><td><code>gmail.send_customer_email</code></td><td>mcp</td><td>customer_communication, external_write</td><td>no</td><td>no</td></tr><tr><td><code>shopify.cancel_order</code></td><td>openapi</td><td>destructive, write</td><td>yes</td><td>yes</td></tr><tr><td><code>stripe.create_refund</code></td><td>openapi</td><td>external_write, financial_action, write</td><td>no</td><td>no</td></tr></tbody></table><h2>§3A Tool-surface diff — <span class="status-not_declared">not declared</span></h2><p>Status: disabled — No --diff-from report or v0.3 baseline snapshot was provided.<br>Base: <code>none</code></p><h2>§3B Action-surface diff — <span class="status-not_declared">not declared</span></h2><p>Status: disabled — No action-surface comparison source was provided.<br>Base: <code>none</code></p><h2>§4 Approval policy coverage — <span class="status-partial">partial</span></h2><table><thead><tr><th>Tool</th><th>Declared</th><th>Source</th><th>Gap finding(s)</th></tr></thead><tbody><tr><td><code>shopify.cancel_order</code></td><td>yes</td><td>policies</td><td>—</td></tr><tr><td><code>stripe.create_refund</code></td><td>no</td><td>—</td><td>fp_f092940f62fbb012</td></tr></tbody></table><h3>Gap findings</h3><ul><li><code>SHIP-POLICY-APPROVAL-MISSING</code> (critical): stripe.create_refund lacks a declared approval policy</li></ul><h2>§5 Idempotency / retry risk — <span class="status-partial">partial</span></h2><p>Retry policy: <strong>not declared</strong></p><table><thead><tr><th>Tool</th><th>Declared</th><th>Source</th><th>Gap finding(s)</th></tr></thead><tbody><tr><td><code>gmail.send_customer_email</code></td><td>no</td><td>—</td><td>fp_0f8aaa912d589cf0</td></tr><tr><td><code>shopify.cancel_order</code></td><td>yes</td><td>policies</td><td>—</td></tr><tr><td><code>stripe.create_refund</code></td><td>no</td><td>—</td><td>fp_dac8011e14c53777</td></tr></tbody></table><h3>Gap findings</h3><ul><li><code>SHIP-SIDEFX-IDEMPOTENCY-MISSING</code> (critical): stripe.create_refund lacks idempotency evidence</li><li><code>SHIP-SIDEFX-IDEMPOTENCY-MISSING</code> (high): gmail.send_customer_email lacks idempotency evidence</li></ul><h2>§6 Scope coverage — <span class="status-missing">missing</span></h2><h3>Declared scopes</h3><ul><li><code>zendesk:tickets:read</code></li><li><code>zendesk:tickets:write</code></li><li><code>stripe:*</code></li></ul><table><thead><tr><th>Scope</th><th>Declared</th><th>Used by tools</th></tr></thead><tbody><tr><td><code>gmail:send</code></td><td>no</td><td><code>gmail.send_customer_email</code></td></tr><tr><td><code>shopify:orders:write</code></td><td>no</td><td><code>shopify.cancel_order</code></td></tr><tr><td><code>stripe:*</code></td><td>yes</td><td>—</td></tr><tr><td><code>stripe:refunds:write</code></td><td>yes</td><td><code>stripe.create_refund</code></td></tr><tr><td><code>support:kb:read</code></td><td>no</td><td><code>support.search_kb</code></td></tr><tr><td><code>zendesk:tickets:read</code></td><td>yes</td><td>—</td></tr><tr><td><code>zendesk:tickets:write</code></td><td>yes</td><td><code>zendesk.update_ticket</code></td></tr></tbody></table><h3>Unused declared scopes</h3><ul><li><code>zendesk:tickets:read</code></li></ul><h3>Used by tools but not declared</h3><ul><li><code>gmail:send</code></li><li><code>shopify:orders:write</code></li><li><code>support:kb:read</code></li></ul><h3>Gap findings</h3><ul><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): shopify.cancel_order requires scopes not declared in the manifest</li><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): support.search_kb requires scopes not declared in the manifest</li><li><code>SHIP-AUTH-SCOPE-COVERAGE-MISSING</code> (high): gmail.send_customer_email requires scopes not declared in the manifest</li><li><code>SHIP-MANIFEST-UNUSED-SCOPE</code> (medium): Manifest declares unused permission scope zendesk:tickets:read</li></ul><h2>§7 Memory isolation — <span class="status-not_declared">not declared</span></h2><p>Manifest does not declare a memory isolation policy. The current manifest schema (v0.1) has no agent.memory field. See §10 for the residual review item.</p><h2>§8 Human-in-the-loop evidence — <span class="status-covered">covered</span></h2><ul><li>Configured: yes</li><li>Human review recommended: yes</li><li>Provenance mode: <code>fresh_scan</code></li><li>HITL evidence is local review evidence only. Missing local evidence does not prove a runtime control is absent, and present local evidence does not certify runtime enforcement.</li></ul><h3>Approval-required tools</h3><ul><li><code>shopify.cancel_order</code></li></ul><h3>Confirmation-required tools</h3><ul><li><code>shopify.cancel_order</code></li></ul><h2>§9 Required dynamic scenarios — <span class="status-partial">partial</span></h2><ul><li><strong>Manual review for SHIP-AUTH-MANIFEST-BROAD-SCOPE</strong> — Replace broad manifest permission scopes with the narrowest scopes needed for this release.<br><span class="meta">Related finding(s): fp_d27325cbdbbf5483</span></li><li><strong>Manual review for SHIP-AUTH-SCOPE-COVERAGE-MISSING</strong> — Add the required scopes for shopify.cancel_order to permissions.scopes or narrow the tool&#x27;s declared auth requirements.<br><span class="meta">Related finding(s): fp_1f6cfd6b7daa9b7c, fp_83852fbd6b440524, fp_d8e6d1865dae97cc</span></li><li><strong>Manual review for SHIP-INVENTORY-WILDCARD-TOOLS</strong> — Replace wildcard tool exposure with an explicit tool allowlist before release review.<br><span class="meta">Related finding(s): fp_fc02d8ecd30f2578</span></li><li><strong>Manual review for SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING</strong> — Declare an owner for each high-risk production tool in risk_overrides.tools.<br><span class="meta">Related finding(s): fp_fd2577850cef1f87</span></li><li><strong>Manual review for SHIP-MANIFEST-UNUSED-SCOPE</strong> — Remove unused manifest scopes or add tool metadata showing why they are required.<br><span class="meta">Related finding(s): fp_39b9ae878f343d1b</span></li><li><strong>Manual review for SHIP-POLICY-APPROVAL-MISSING</strong> — Declare an approval policy for stripe.create_refund or remove this tool from the release.<br><span class="meta">Related finding(s): fp_f092940f62fbb012</span></li><li><strong>Manual review for SHIP-POLICY-CONFIRMATION-MISSING</strong> — Declare a user confirmation policy for stripe.create_refund or remove this action from the release.<br><span class="meta">Related finding(s): fp_8e08a4fe6b0917f6, fp_a62ca2fd9a68a1d1</span></li><li><strong>Manual review for SHIP-SCHEMA-BROAD-FREE-TEXT</strong> — Constrain zendesk.update_ticket.updates with an enum, structured schema, or narrower field-specific parameters.<br><span class="meta">Related finding(s): fp_acd63b899d49aa1c, fp_ff2f028953d1c220</span></li><li><strong>Manual review for SHIP-SCHEMA-FREEFORM-OUTPUT</strong> — Prefer a structured output schema for send_email_preview, especially when output is later passed back into model context.<br><span class="meta">Related finding(s): fp_85f8513ad72cd9ea</span></li><li><strong>Manual review for SHIP-SCHEMA-MISSING-BOUNDS</strong> — Add a maximum bound to stripe.create_refund.amount or document an equivalent limit in the tool policy.<br><span class="meta">Related finding(s): fp_ab60b01cb53cfcbe</span></li><li><strong>Manual review for SHIP-SCOPE-PROHIBITED-TOOL-PRESENT</strong> — Remove stripe.create_refund, narrow its policy, or revise prohibited_actions so the manifest and tool surface do not contradict each other.<br><span class="meta">Related finding(s): fp_12985c36a06026de, fp_e090c62e390e70ab</span></li><li><strong>Manual review for SHIP-SIDEFX-IDEMPOTENCY-MISSING</strong> — Add an idempotency key, idempotent annotation, or declared idempotency policy for stripe.create_refund.<br><span class="meta">Related finding(s): fp_0f8aaa912d589cf0, fp_dac8011e14c53777</span></li><li><strong>Re-run scan after resolving source warnings</strong> — Source loaders emitted warnings; some tool surfaces may have been parsed with reduced confidence.</li><li><strong>Verify low-confidence tool extractions</strong> — One or more tools were extracted with low confidence; confirm against the upstream source before release.</li></ul><h2>§10 What this packet did NOT prove</h2><p>Agents Shipgate is an advisory tool: the deterministic merge gate for AI-generated agent capability changes, run as a local-first, static Tool-Use Readiness review. The packet below is derived from a scan; it does not, by itself, prove the following properties:</p><ul><li><strong>Prompt robustness.</strong> Whether the agent&#x27;s prompt holds up under jailbreaks, persona drift, indirect prompt injection, or adversarial inputs.</li><li><strong>Runtime behavior.</strong> Whether the agent actually invokes only the declared tools, respects approval gates at runtime, or follows policy under load. Static config is not runtime evidence.</li><li><strong>Model correctness.</strong> Whether the underlying model produces correct outputs, calls the right tools, or stays within the declared scope. The packet does not benchmark the model.</li><li><strong>Adversarial resistance.</strong> Whether the agent withstands red-team or penetration testing. The packet does not run scenarios; it organizes evidence.</li></ul><h3>Per-run residuals</h3><ul><li>Source warnings:<ul><li>MCP source declares wildcard tool exposure</li></ul></li><li>Low-confidence tool extractions: <code>send_email_preview</code></li><li>Suppressed findings in effect: none</li><li>Memory isolation is not modeled by the v0.1 manifest schema; no static evidence is available.</li><li>6 active finding(s) came from heuristic provenance (keyword_heuristic=6, regex_heuristic=0); review the finding evidence before acting.</li></ul></body></html>
diff --git a/samples/support_refund_agent/expected/packet.json b/samples/support_refund_agent/expected/packet.json
index a9557958..19c8cd70 100644
--- a/samples/support_refund_agent/expected/packet.json
+++ b/samples/support_refund_agent/expected/packet.json
@@ -1293,7 +1293,9 @@
       "6 active finding(s) came from heuristic provenance (keyword_heuristic=6, regex_heuristic=0); review the finding evidence before acting."
     ],
     "headline": "Agents Shipgate is an advisory tool: the deterministic merge gate for AI-generated agent capability changes, run as a local-first, static Tool-Use Readiness review. The packet below is derived from a scan; it does not, by itself, prove the following properties:",
-    "low_confidence_tools": [],
+    "low_confidence_tools": [
+      "send_email_preview"
+    ],
     "source_warnings": [
       "MCP source declares wildcard tool exposure"
     ],
diff --git a/samples/support_refund_agent/expected/packet.md b/samples/support_refund_agent/expected/packet.md
index 07665195..aeedcf6f 100644
--- a/samples/support_refund_agent/expected/packet.md
+++ b/samples/support_refund_agent/expected/packet.md
@@ -234,7 +234,7 @@ Agents Shipgate is an advisory tool: the deterministic merge gate for AI-generat
 
 - Source warnings:
   - MCP source declares wildcard tool exposure
-- Low-confidence tool extractions: none
+- Low-confidence tool extractions: `send\_email\_preview`
 - Suppressed findings in effect: none
 - Memory isolation is not modeled by the v0.1 manifest schema; no static evidence is available.
 - 6 active finding\(s\) came from heuristic provenance \(keyword\_heuristic=6, regex\_heuristic=0\); review the finding evidence before acting.
diff --git a/src/agents_shipgate/packet/builder.py b/src/agents_shipgate/packet/builder.py
index c52712f0..37215a9a 100644
--- a/src/agents_shipgate/packet/builder.py
+++ b/src/agents_shipgate/packet/builder.py
@@ -1080,7 +1080,7 @@ def _build_not_proven(
 ) -> NotProvenSection:
     suppressed_ids = sorted(f.id for f in findings if f.suppressed and f.id)
     low_confidence_tools = sorted(
-        tool.name for tool in tools if tool.extraction_confidence == "low"
+        tool.name for tool in tools if tool.extraction_confidence != "high"
     )
     additional = [
         "Memory isolation is not modeled by the v0.1 manifest schema; "
diff --git a/tests/test_evidence_packet.py b/tests/test_evidence_packet.py
index 9a6113c3..9f820075 100644
--- a/tests/test_evidence_packet.py
+++ b/tests/test_evidence_packet.py
@@ -21,9 +21,11 @@
 import pytest
 from typer.testing import CliRunner
 
+from agents_shipgate.ci.release_decision import build_release_decision
 from agents_shipgate.cli.main import app
 from agents_shipgate.cli.scan import run_scan
 from agents_shipgate.core.disclaimers import HITL_RUNTIME_CONTROL_DISCLAIMER
+from agents_shipgate.core.domain import Tool
 from agents_shipgate.packet import (
     EvidencePacket,
     PacketSchemaError,
@@ -38,7 +40,12 @@
     PACKET_NON_PROOF_HEADLINE,
 )
 from agents_shipgate.packet.evidence_matrix import build_evidence_matrix
-from agents_shipgate.schemas.report import Finding
+from agents_shipgate.schemas.report import (
+    Finding,
+    ReadinessReport,
+    ReportSummary,
+    ToolSurfaceSummary,
+)
 
 SAMPLE_CONFIG = Path("samples/support_refund_agent/shipgate.yaml")
 EXPECTED_DIR = Path("samples/support_refund_agent/expected")
@@ -49,6 +56,78 @@
 GENERATED_AT = "2026-01-01T00:00:00+00:00"
 
 
+def _minimal_packet_with_not_proven(
+    section,
+    *,
+    low_confidence_tool_count: int = 0,
+) -> EvidencePacket:
+    from agents_shipgate.schemas.packet import (
+        ApprovalCoverageSection,
+        CapabilityIntentDiff,
+        DynamicScenariosSection,
+        HighRiskSurfaceSection,
+        HumanInTheLoopEvidence,
+        IdempotencyRiskSection,
+        MemoryIsolationStatus,
+        ReleaseDecisionSection,
+        ScopeCoverageSection,
+    )
+    from agents_shipgate.schemas.report import (
+        BaselineDelta,
+        EvidenceCoverageDecision,
+        FailPolicy,
+    )
+
+    decision = ReleaseDecisionSection(
+        decision="insufficient_evidence" if low_confidence_tool_count else "passed",
+        verdict="INSUFFICIENT EVIDENCE" if low_confidence_tool_count else "PASSED",
+        reason="Evidence coverage below threshold.",
+        evidence_coverage=EvidenceCoverageDecision(
+            level="static",
+            human_review_recommended=low_confidence_tool_count > 0,
+            source_warning_count=0,
+            low_confidence_tool_count=low_confidence_tool_count,
+        ),
+        baseline_delta=BaselineDelta(enabled=False),
+        fail_policy=FailPolicy(
+            ci_mode="advisory",
+            fail_on=[],
+            new_findings_only=False,
+            would_fail_ci=False,
+            exit_code=0,
+        ),
+    )
+    return EvidencePacket(
+        generated_at=GENERATED_AT,
+        run_id="r",
+        project={"name": "p"},
+        agent={"name": "a"},
+        environment={"target": "local"},
+        release_decision=decision,
+        capability_intent=CapabilityIntentDiff(
+            status="not_declared",
+            declared_purpose=[],
+            prohibited_actions=[],
+            observed_tools=[],
+            rows=[],
+            divergence_findings=[],
+        ),
+        high_risk_surface=HighRiskSurfaceSection(
+            status="informational",
+            total_tools=0,
+            high_risk_count=0,
+            tools=[],
+        ),
+        approval_coverage=ApprovalCoverageSection(status="informational"),
+        idempotency_risk=IdempotencyRiskSection(status="informational"),
+        scope_coverage=ScopeCoverageSection(status="informational"),
+        memory_isolation=MemoryIsolationStatus(),
+        human_in_the_loop=HumanInTheLoopEvidence(status="not_declared"),
+        dynamic_scenarios=DynamicScenariosSection(status="informational"),
+        not_proven=section,
+    )
+
+
 def _scan_with_packet(tmp_path: Path) -> tuple[Path, EvidencePacket]:
     """Run scan against the support_refund_agent fixture and return
     ``(out_dir, parsed_packet)``."""
@@ -182,6 +261,81 @@ def test_not_proven_residuals_include_non_static_provenance():
     assert "external policy packs" in residuals
 
 
+def test_not_proven_low_confidence_residuals_match_release_decision_count():
+    tools = [
+        Tool(
+            id="high",
+            name="high_confidence_inventory",
+            source_type="mcp",
+            extraction_confidence="high",
+        ),
+        Tool(
+            id="medium",
+            name="medium_confidence_sdk",
+            source_type="sdk_function",
+            extraction_confidence="medium",
+        ),
+        Tool(
+            id="low",
+            name="low_confidence_sdk",
+            source_type="sdk_function",
+            extraction_confidence="low",
+        ),
+    ]
+
+    section = _build_not_proven([], source_warnings=[], tools=tools)
+
+    assert section.low_confidence_tools == [
+        "low_confidence_sdk",
+        "medium_confidence_sdk",
+    ]
+    report = ReadinessReport(
+        run_id="r",
+        project={"name": "p"},
+        agent={"name": "a"},
+        environment={"target": "local"},
+        summary=ReportSummary(
+            status="human_review_recommended",
+            critical_count=0,
+            high_count=0,
+            medium_count=0,
+            human_review_recommended=True,
+            evidence_coverage="static",
+        ),
+        tool_surface=ToolSurfaceSummary(
+            total_tools=len(tools),
+            high_risk_tools=0,
+        ),
+        findings=[],
+        source_warnings=[],
+    )
+    decision = build_release_decision(
+        report=report,
+        tools=tools,
+        ci_mode="advisory",
+        fail_on=None,
+        new_findings_only=False,
+    )
+
+    assert decision.evidence_coverage.low_confidence_tool_count == len(
+        section.low_confidence_tools
+    )
+
+    packet = _minimal_packet_with_not_proven(
+        section,
+        low_confidence_tool_count=decision.evidence_coverage.low_confidence_tool_count,
+    )
+    md = render_packet_markdown(packet)
+    html = render_packet_html(packet)
+
+    assert "Low-confidence tool extractions: none" not in md
+    assert "Low-confidence tool extractions: none" not in html
+    assert "`medium\\_confidence\\_sdk`" in md
+    assert "<code>medium_confidence_sdk</code>" in html
+    assert "high_confidence_inventory" not in md
+    assert "high_confidence_inventory" not in html
+
+
 def test_evidence_matrix_uses_release_decision_only_for_blocking_and_review():
     payload = {
         "release_decision": {

Domain	Evidence present	Evidence source	Confidence	Missing controls	Blocking findings	Review items
Inventory	partial	tool_inventory; tool_surface; +2 more	high	SHIP-INVENTORY-WILDCARD-TOOLS on wildcard_mcp_tools.*: Wildcard tool exposure declared	—	SHIP-INVENTORY-WILDCARD-TOOLS (high)
Schema	partial	tool_surface_facts.tools[].hashes; findings[]	mixed	SHIP-SCHEMA-MISSING-BOUNDS on stripe.create_refund: stripe.create_refund.amount has no maximum bound; SHIP-SCHEMA-BROAD-FREE-TEXT on zendesk.update_ticket: zendesk.update_ticket accepts broad free-form action input; +2 more	—	SHIP-SCHEMA-MISSING-BOUNDS (high); SHIP-SCHEMA-BROAD-FREE-TEXT (high); +2 more
Auth	partial	tool_surface_facts.scopes; tool_inventory[].auth_scopes; +1 more	mixed	SHIP-AUTH-MANIFEST-BROAD-SCOPE: Manifest declares broad permission scopes; SHIP-AUTH-SCOPE-COVERAGE-MISSING on shopify.cancel_order: shopify.cancel_order requires scopes not declared in the manifest; +3 more	—	SHIP-AUTH-MANIFEST-BROAD-SCOPE (high); SHIP-AUTH-SCOPE-COVERAGE-MISSING (high); +3 more
Approval	partial	tool_surface_facts.controls[kind=approval_policy]; findings[]	high	SHIP-POLICY-APPROVAL-MISSING on stripe.create_refund: stripe.create_refund lacks a declared approval policy	SHIP-POLICY-APPROVAL-MISSING (critical)	—
Confirmation	partial	tool_surface_facts.controls[kind=confirmation_policy]; findings[]	high	SHIP-POLICY-CONFIRMATION-MISSING on stripe.create_refund: stripe.create_refund lacks a declared confirmation policy; SHIP-POLICY-CONFIRMATION-MISSING on gmail.send_customer_email: gmail.send_customer_email lacks a declared confirmation policy	—	SHIP-POLICY-CONFIRMATION-MISSING (high); SHIP-POLICY-CONFIRMATION-MISSING (high)
Idempotency	partial	tool_surface_facts.controls[kind=idempotency_evidence]; action_surface_facts.actions[].safeguards.idempotency; +1 more	mixed	SHIP-SIDEFX-IDEMPOTENCY-MISSING on stripe.create_refund: stripe.create_refund lacks idempotency evidence; SHIP-SIDEFX-IDEMPOTENCY-MISSING on gmail.send_customer_email: gmail.send_customer_email lacks idempotency evidence	SHIP-SIDEFX-IDEMPOTENCY-MISSING (critical)	SHIP-SIDEFX-IDEMPOTENCY-MISSING (high)
Side effects	partial	tool_inventory[].risk_tags; action_surface_facts.actions[].effect; +1 more	mixed	SHIP-SCHEMA-BROAD-FREE-TEXT on zendesk.update_ticket: zendesk.update_ticket accepts broad free-form action input; SHIP-SCHEMA-BROAD-FREE-TEXT on gmail.send_customer_email: gmail.send_customer_email accepts broad free-form action input; +5 more	SHIP-POLICY-APPROVAL-MISSING (critical); SHIP-SIDEFX-IDEMPOTENCY-MISSING (critical)	SHIP-SCHEMA-BROAD-FREE-TEXT (high); SHIP-SCHEMA-BROAD-FREE-TEXT (high); +3 more
Memory isolation	not_declared	—	unknown	—	—	—
Human-in-the-loop evidence	not_declared	—	unknown	—	—	—
Prompt/scope alignment	partial	declared_intentions; misalignments; +2 more	medium	SHIP-SCOPE-PROHIBITED-TOOL-PRESENT on stripe.create_refund: stripe.create_refund appears to overlap with a prohibited action; SHIP-SCOPE-PROHIBITED-TOOL-PRESENT on gmail.send_customer_email: gmail.send_customer_email appears to overlap with a prohibited action	—	SHIP-SCOPE-PROHIBITED-TOOL-PRESENT (high); SHIP-SCOPE-PROHIBITED-TOOL-PRESENT (high)
Retry/timeout	not_declared	—	unknown	—	—	—
Baseline debt	informational	—	unknown	—	—	—
Action-surface policy	covered	action_surface_facts.actions	medium	—	—	—
Tool	Source	Risk tags	Approval	Idempotency
`gmail.send_customer_email`	mcp	customer_communication, external_write	no	no
`shopify.cancel_order`	openapi	destructive, write	yes	yes
`stripe.create_refund`	openapi	external_write, financial_action, write	no	no
Tool	Declared	Source	Gap finding(s)
`shopify.cancel_order`	yes	policies	—
`stripe.create_refund`	no	—	fp_f092940f62fbb012
Tool	Declared	Source	Gap finding(s)
`gmail.send_customer_email`	no	—	fp_0f8aaa912d589cf0
`shopify.cancel_order`	yes	policies	—
`stripe.create_refund`	no	—	fp_dac8011e14c53777
Scope	Declared	Used by tools
`gmail:send`	no	`gmail.send_customer_email`
`shopify:orders:write`	no	`shopify.cancel_order`
`stripe:*`	yes	—
`stripe:refunds:write`	yes	`stripe.create_refund`
`support:kb:read`	no	`support.search_kb`
`zendesk:tickets:read`	yes	—
`zendesk:tickets:write`	yes	`zendesk.update_ticket`