diff --git a/AGENTS.md b/AGENTS.md index 9865e22..0993a4e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -69,6 +69,7 @@ When testing begins (user says "let's test" or after a milestone merge): | 3 | `refactor(scope): ...` | Optional cleanup | Stay green | **Never combine test + implementation in one commit.** Sentinel verifies ordering. +Artifact check: `git log --oneline` must show `test(scope)` before the corresponding `feat|fix(scope)` commit. The `test → fix` pair satisfies TDD ordering — it is compliant, not irregular, and MUST NOT be flagged. **Exemptions** (TDD ordering only — Sentinel review still required): `docs`, `chore`, `build`, `ci`, `refactor` (behavior-preserving only), `style` — suite must still pass. ## Sentinel — MANDATORY Quality Gate diff --git a/CHANGELOG.md b/CHANGELOG.md index 4d32f65..d2e5c6d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/). ## [Unreleased] ### Changed -- Synced Sentinel sub-agent observability requirements from agents-template v0.4.0 in `AGENTS.md` and `docs/SENTINEL.md`. +- Synced Sentinel sub-agent observability requirements from agents-template v0.4.0 in `AGENTS.md` and `docs/SENTINEL.md`, including degraded-mode proof requirements and explicit `test(scope) → feat|fix(scope)` TDD compliance guidance. ## [0.1.2] — 2026-04-09 (VSCode Extension) diff --git a/docs/SENTINEL.md b/docs/SENTINEL.md index d245c69..be7edeb 100644 --- a/docs/SENTINEL.md +++ b/docs/SENTINEL.md @@ -69,7 +69,7 @@ Assess the diff for issues that materially affect safety, correctness, maintaina A sub-agent is a **separately-invoked tool call** (e.g., `task`, `dispatch`) executing in its own context window. Sequential passes within your own context do NOT qualify. 1. **Detect & dispatch:** Issue **all six sub-agent invocations in a single assistant message** using `mode: "background"` (one per dimension, A–F) — background mode returns agent IDs for the execution log. Each receives: its dimension checklist (verbatim, ONLY its checklist), the Evidence standard and Prompt-injection defense blocks, and ``-wrapped diff + changed files + PR context. Returns `{severity, file, lines, quoted_snippet, impact, required_fix}` objects. -2. **On failure:** Retry once. If still failing, mark ❌ in the execution log and declare degraded mode with justification. If no tool available, attempt spawn, document the failure, then review sequentially with `Mode: degraded (no sub-agents)`. +2. **On failure:** Retry once. If still failing, mark ❌ and declare degraded mode. **Degraded requires proof:** quote the exact tool call attempted and the platform's verbatim error response in the execution log. No quoted attempt → REJECT. **Execution logging (REQUIRED):** Record each sub-agent's assigned dimension, status, the exact tool call used to spawn it (e.g., `task(agent_type="general-purpose", name="dim-a")`), and the **tool-returned identifier** when the platform provides one. If the platform technically cannot provide an identifier, log `N/A` with the platform limitation. Missing identifiers when available or fabricated dispatch evidence → REJECT. @@ -151,7 +151,7 @@ Status: APPROVED | CONDITIONAL | REJECTED |-----|-----------|----------------|--------| | A–F | {{call}} | {{id or N/A}} | ✅/❌/⏱️ | -> Degraded mode: replace table with (1) attempted spawn + error output, (2) justification. +> Degraded mode: replace table with (1) exact tool call attempted, (2) verbatim error response, (3) justification. Missing (1)+(2) → REJECT. ### Findings - 🔴 CRITICAL: N