netdata · ktsaou · Jun 14, 2026 · Jun 7, 2026 · Jun 10, 2026 · Jun 10, 2026
diff --git a/.agents/skills/project-delegation/SKILL.md b/.agents/skills/project-delegation/SKILL.md
@@ -7,14 +7,23 @@ description: Orchestration patterns for ai-viewer — when and how to spawn suba
 
 ## The Hard Rule
 
-The master assistant is the **orchestrator**, **QA lead**, **integrator**, and **reviewer**. The master assistant does **not** write production code. Code is produced by spawned subagents working from a written spec and failing tests.
+The master assistant (CTO) is the **orchestrator**, **QA lead**, **integrator**, and the **only role that runs reviewers**. The master assistant does **not** write production code. Code is produced by spawned subagents working from a written spec and failing tests.
 
-This rule exists because:
+### The implementer is `minimax`
+
+Per the Production-Grade Loop (see `AGENTS.md`), the CTO delegates all code production to **`minimax`** — the current stable minimax variant on litellm (default `llm-netdata-cloud/minimax-m3-coder`). The CTO is the only role that knows the project context; the implementer is a fresh-context subagent that receives a self-contained prompt (spec excerpt, failing tests, constraints, deliverable).
+
+- The implementer (`minimax`) is **not** the same instance as the reviewer-minimax pass. Two different invocations, two different contexts, two different jobs. The implementer writes code; the reviewer reads code.
+- If `minimax` is down/degraded, the CTO rotates the implementer role to the next-most-capable member of the reviewer set (default order: `qwen` → `mimo` → `deepseek` → `glm`). The rotated reviewer is **removed from the 5-reviewer cycle** for that PR (so the implementer is not also reviewing their own work); the 5-reviewer set is filled by substituting a reviewer from the ad-hoc set (`codex`, `gemini`, `claude`, `kimi`) chosen by the CTO. The CTO logs the rotation AND the substitution in the SOW under `## Implementer Rotation`.
+- The CTO pins to the current stable model at time of work, per the project's "always pin to latest stable" policy. Major-version upgrades require a brief SOW; minor/patch upgrades are autonomous.
+
+### Why this rule exists
 
 - The master assistant's context is finite. Code-writing fills it with raw output that displaces decision history.
 - Subagent output gets independently verified by the master before being trusted. Master-written code skips that verification step.
 - Compaction destroys the master's working memory; subagents start with a fresh, self-contained context every time.
 - Parallel subagents finish faster than serial master-context editing.
+- Splitting "writer" and "reviewer" across different model families produces a more honest, less self-confirming codebase.
 
 If the master assistant ever finds itself about to call `Edit` or `Write` on a production source file, stop and delegate.
 
@@ -186,7 +195,7 @@ Conversely, do not spawn a subagent for a one-line typo fix. Trivial verified ed
 
 ## Cross-References
 
-- Contract: `AGENTS.md` (Delegation Protocol section)
+- Contract: `AGENTS.md` "Production-Grade Loop" section (the single source of truth).
 - Workflow: `.agents/skills/project-workflow/SKILL.md`
 - Coding rules: `.agents/skills/project-coding/SKILL.md`
 - Gates: `.agents/skills/project-quality-gates/SKILL.md`

diff --git a/.agents/skills/project-second-opinions/SKILL.md b/.agents/skills/project-second-opinions/SKILL.md
@@ -1,15 +1,17 @@
 ---
 name: project-second-opinions
-description: Invoke external LLMs (codex, gemini, glm, kimi, mimo, minimax, qwen, deepseek) for code review, SOW review, design validation, and second opinions on ai-viewer work. Use before marking any non-trivial SOW completed and after major architectural changes.
+description: Invoke the 5-reviewer Production-Grade Loop (glm, mimo, minimax, qwen, deepseek) for code review, SOW review, design validation, and second opinions on ai-viewer work. The CTO runs reviewers; the implementer never does. Use before marking any non-trivial SOW completed and after major architectural changes.
 ---
 
-# Second Opinions
+# Second Opinions — the 5-Reviewer Production-Grade Loop
+
+This skill is the runtime enforcement of the **Production-Grade Loop** defined in `AGENTS.md`. The contract lives in `AGENTS.md`; this file is the implementation. If the two ever disagree, `AGENTS.md` wins.
 
 ## When To Run
 
 External second-opinion review is **mandatory** — not "encouraged" — for any non-trivial work. The assistant does not trust itself; review converges before "done" is uttered.
 
-**The orchestrator (master) runs review — exactly once per iteration, on the final integrated state — never the implementation subagent.** Review is the master's QA gate on code it did not author; an implementation subagent running reviewers on its own work both duplicates the master's mandatory round (the master will run it again → 2× the slow, costly review of identical code) and collapses the author/reviewer separation. Because spawned subagents inherit `AGENTS.md` (which mandates this review), the master MUST explicitly forbid reviewers in every implementation delegation prompt — see `project-delegation` skill, the `[FORBIDDEN]` block. If a subagent reports it "ran reviewers," that round does not substitute for the master's: treat the subagent's findings as a useful head start, then run the one official round on the final state and do not re-run beyond convergence.
+**The orchestrator (CTO / master assistant) runs review — exactly once per iteration, on the final integrated state — never the implementation subagent.** Review is the master's QA gate on code it did not author; an implementation subagent running reviewers on its own work both duplicates the master's mandatory round (the master will run it again → 2× the slow, costly review of identical code) and collapses the author/reviewer separation. Because spawned subagents inherit `AGENTS.md` (which mandates this review), the master MUST explicitly forbid reviewers in every implementation delegation prompt — see `project-delegation` skill, the `[FORBIDDEN]` block. If a subagent reports it "ran reviewers," that round does not substitute for the master's: treat the subagent's findings as a useful head start, then run the one official round on the final state and do not re-run beyond convergence.
 
 Mandatory before marking any of these SOWs `completed`:
 
@@ -21,20 +23,57 @@ Mandatory before marking any of these SOWs `completed`:
 - Any SOW spanning > 3 files of non-trivial logic.
 - Any SOW the operator flags as important.
 
-Mandatory minimum standard:
+### The 5-reviewer set (CTO only)
+
+The CTO runs **exactly these five reviewers in parallel** on every non-trivial code-producing PR:
+
+| # | Reviewer | Invocation |
+|---|---|---|
+| 1 | `glm` | `timeout 1800 opencode run -m "llm-netdata-cloud/glm-5.1" --agent code-reviewer "PROMPT"` |
+| 2 | `mimo` | `timeout 1800 opencode run -m "llm-netdata-cloud/mimo-v2.5-pro" --agent code-reviewer "PROMPT"` |
+| 3 | `minimax` (fresh-context review pass; **never** the implementer instance) | `timeout 1800 opencode run -m "llm-netdata-cloud/minimax-m3-coder" --agent code-reviewer "PROMPT"` |
+| 4 | `qwen` | `timeout 1800 opencode run -m "llm-netdata-cloud/qwen3.7-plus" --agent code-reviewer "PROMPT"` |
+| 5 | `deepseek` | `timeout 1800 opencode run -m "llm-netdata-cloud/deepseek-v4-pro" --agent code-reviewer "PROMPT"` |
+
+All five run in parallel (one Bash invocation each, batched in a single assistant turn). Foreground, with `timeout 1800`. The CTO is the only role that runs them.
+
+`codex` and `gemini` from the previous default set are **deprecated** for production-grade review on this project; they are kept in the invocation table for ad-hoc SOW/spec review only.
+
+### PRODUCTION GRADE vote
+
+Each reviewer responds with one of two outcomes:
+
+- `PRODUCTION GRADE` — ship it, no actionable findings.
+- `NEEDS WORK` — one or more findings, each with file:line, severity (P0–P3), and a concrete fix proposal.
+
+The CTO does not merge until 5/5 PRODUCTION GRADE, **or** until only P3 noise remains AND the CTO has recorded the P3 findings in the SOW under `## Reviews` with a disposition. P0/P1 always block. P2 always blocks unless explicitly waived by the CTO with a documented reason (rare).
 
-- **At least three reviewers in parallel** for code review (default set: codex + gemini + glm + qwen).
-- **Same prompt across iterations**; never narrow scope on follow-up rounds.
-- **Iterate until reviewers converge** with no new actionable findings.
-- **Record every round in the SOW** under `## Reviews` with reviewer attribution and resolution.
+### Stop conditions (P0/P1/P2/P3)
 
-Skip only:
+- **5/5 PRODUCTION GRADE, gates green, CI green** → CTO merges.
+- **Any P0/P1 NEEDS WORK** → fix, push, re-trigger full 5-reviewer cycle. Iterate.
+- **P2 NEEDS WORK** → fix in the same PR, re-trigger the full 5-reviewer cycle; merge only when 5/5 PG or only P3 noise remains.
+- **P3 NEEDS WORK** → fix in the same PR, document in SOW `## Reviews`, merge with note when gates green and CI green.
+- **Hard stall: 5+ cycles with new P0/P1 each round** → CTO writes a `## Regression` section in the SOW, opens a follow-up SOW in `.agents/sow/pending/`, and surfaces to the operator with a business-level recommendation. Do not loop forever.
+
+### Claim verification (CRITICAL — CTO's job)
+
+Reviewer findings are **claims, not findings**. The CTO verifies every claim before acting on it. Verification steps:
+
+1. **Read the file:line the reviewer cited.** Does the code actually do what the reviewer said?
+2. **Run the repro.** If the reviewer says "this race fires under X", construct X and run it.
+3. **Cross-check with the spec.** If the reviewer says "violates SPEC Y §3.2", open the spec and confirm.
+4. **Decide**: real bug (fix), false positive (reject with evidence in the SOW `## Reviews`), disputed (escalate).
+
+Acting on unverified claims causes two failure modes: (a) implementing phantom bugs that don't exist, (b) ignoring real bugs because the reviewer "sounded uncertain". Verify, then act. The CTO is the only one who decides.
+
+Skip the 5-reviewer cycle only:
 
 - Typo / format-only changes the assistant has visually verified.
 - Mechanical renames with no behavior change.
 - Doc-only updates with no spec/runtime impact.
 
-The bar to skip is high. When in doubt, run reviewers.
+The bar to skip is high. When in doubt, run the cycle.
 
 ## Safety Rule (CRITICAL)
 
@@ -48,33 +87,31 @@ Detection signals (any of these → do not run reviewers):
 
 When detected, complete the review work directly and return.
 
-## Invocation Patterns
+**This rule applies to the `minimax` review pass too**: when running as a reviewer, `minimax` is a fresh-context read-only session. If the prompt contains any of the above signals (it always does — the review prompt is "YOU ARE RUNNING BY ANOTHER ASSISTANT, FOR A SECOND OPINION"), the reviewer MUST NOT spawn further reviewers. The CTO is the only role that may spawn the 5-reviewer cycle.
 
-All commands use `timeout 1800` (30 minutes max wait). Run multiple reviewers in parallel (one Bash invocation per reviewer, batched in one assistant turn). Run in foreground (no `&`, no `run_in_background`).
+## Invocation Patterns (legacy / ad-hoc)
+
+The Production-Grade Loop supersedes the previous default set. The 5 reviewers above are mandatory for any non-trivial code-producing PR. The reviewers below remain available for **ad-hoc SOW/spec review** (one-off, off the production loop), where the CTO may pick a smaller subset:
 
 | Reviewer | Command |
 |---|---|
-| codex | `timeout 1800 codex exec "PROMPT" --skip-git-repo-check` |
-| gemini | `timeout 1800 gemini -p "PROMPT"` |
-| claude (Anthropic) | `CLAUDECODE="" timeout 1800 claude -p "PROMPT"` |
-| glm | `timeout 1800 opencode run -m "llm-netdata-cloud/glm-5.1" --agent code-reviewer "PROMPT"` |
-| kimi | `timeout 1800 opencode run -m "llm-netdata-cloud/kimi-k2.6" --agent code-reviewer "PROMPT"` |
-| mimo | `timeout 1800 opencode run -m "llm-netdata-cloud/mimo-v2.5-pro" --agent code-reviewer "PROMPT"` |
-| qwen | `timeout 1800 opencode run -m "llm-netdata-cloud/qwen3.6-plus" --agent code-reviewer "PROMPT"` |
-| minimax | `timeout 1800 opencode run -m "llm-netdata-cloud/minimax-m2.7-coder" --agent code-reviewer "PROMPT"` |
-| deepseek | `timeout 1800 opencode run -m "deepseek/deepseek-v4-pro" --agent code-reviewer "PROMPT"` |
+| codex (ad-hoc only) | `timeout 1800 codex exec "PROMPT" --skip-git-repo-check` |
+| gemini (ad-hoc only) | `timeout 1800 gemini -p "PROMPT"` |
+| claude (Anthropic) (ad-hoc only) | `CLAUDECODE="" timeout 1800 claude -p "PROMPT"` |
+| kimi (ad-hoc only) | `timeout 1800 opencode run -m "llm-netdata-cloud/kimi-k2.6" --agent code-reviewer "PROMPT"` |
+
+The five production reviewers (`glm`, `mimo`, `minimax`-fresh, `qwen`, `deepseek`) run mandatorily on every non-trivial PR. For ad-hoc SOW/spec pre-review, the CTO may invoke any reviewer (including the five production reviewers) at their discretion. Ad-hoc rounds are independent of the production-loop run. If a production reviewer is unavailable for a cycle (litellm error, model deprecated, timeout), the CTO retries once, then substitutes from the ad-hoc set (`codex`, `gemini`, `claude`, `kimi`) and logs the substitution in the SOW `## Reviews` with the reason. Two or more simultaneous unavailability → operator surface as a hard stall.
 
 `cd` into the project root before running. Use relative paths (some reviewers stumble on arbitrary absolute paths).
 
-## Default Reviewer Set
+## Ad-hoc Reviewer Set (off the production loop)
 
-- **Code review** (PR-style): codex + gemini + glm + qwen in parallel. Four reviewers triangulate well; more becomes noise.
 - **SOW / design review**: codex + gemini + mimo in parallel.
-- **Security-focused review**: add minimax + deepseek.
+- **Security-focused review**: minimax + deepseek.
 
 ## Prompt Templates
 
-### Code review
+### Code review (Production-Grade Loop)
 
 ```
 YOU ARE RUNNING BY ANOTHER ASSISTANT, FOR A SECOND OPINION:
@@ -83,6 +120,10 @@ Please review the following change in this repository:
 
 <diff or files to review, with file paths>
 
+Vote ONE of:
+- PRODUCTION GRADE — ship it, no actionable findings.
+- NEEDS WORK — list findings below, each with file:line, severity (P0/P1/P2/P3), and a concrete fix proposal.
+
 Look for:
 - Correctness bugs
 - Race conditions
@@ -101,11 +142,15 @@ MANDATORY RULES (FOLLOW ALWAYS):
 THIS IS A READ-ONLY REQUEST. PROVIDE YOUR REVIEW.
 ```
 
-### SOW review
+### SOW review (ad-hoc)
 
 ```
 YOU ARE RUNNING BY ANOTHER ASSISTANT, FOR A SECOND OPINION:
 
+Vote ONE of:
+- PRODUCTION GRADE — ship it, no actionable findings.
+- NEEDS WORK — list findings below, each with file:line, severity (P0/P1/P2/P3), and a concrete fix proposal.
+
 Please review SOW file: .agents/sow/<pending|current>/<name>.md
 
 Check:
@@ -124,7 +169,7 @@ MANDATORY RULES:
 THIS IS A READ-ONLY REQUEST.
 ```
 
-### Spec/design review
+### Spec/design review (ad-hoc)
 
 ```
 YOU ARE RUNNING BY ANOTHER ASSISTANT, FOR A SECOND OPINION:
@@ -146,32 +191,34 @@ THIS IS A READ-ONLY REQUEST.
 
 ## Workflow
 
-1. Decide the review type (code / SOW / spec).
-2. Pick the reviewer set.
+1. Decide the review type (production PR cycle / ad-hoc SOW / ad-hoc spec).
+2. Pick the reviewer set: 5 reviewers for production PR; smaller subset for ad-hoc.
 3. Compose the prompt (use a template; keep neutral, no embedded conclusions).
-4. **Show the prompt to the user before running.**
-5. Run all reviewers in parallel (multiple Bash tool calls in one turn).
-6. Wait for all to return.
-7. Synthesize findings:
+4. Run all reviewers in parallel (multiple Bash tool calls in one turn).
+5. Wait for all to return.
+6. Synthesize findings:
    - List each unique finding once, attributed to which reviewer flagged it.
-   - Classify: bug, design concern, style, false positive.
-   - Decide which to act on.
-8. Address findings (code changes + new tests where applicable).
-9. **Re-run the same reviewers with the same scope** plus a short note of fixes applied.
-10. Repeat until no actionable findings remain.
-11. Record the review history in the SOW under `## Reviews`.
+   - Classify: P0/P1/P2/P3.
+   - **Verify every claim** (read the file:line, run the repro, cross-check spec).
+7. Address findings (code changes + new tests where applicable). Reject false positives with evidence.
+8. **Re-run the same reviewers with the same scope** plus a short note of fixes applied.
+9. Repeat until no actionable P0/P1 findings remain. P2 fixed in PR. P3 documented in SOW.
+10. Record the review history in the SOW under `## Reviews` with reviewer attribution, the CTO's claim-verification verdict, the fix applied (or "rejected — false positive" with evidence), and the final PRODUCTION GRADE count (e.g. `5/5 PG after fix`).
+11. CTO merges via `gh pr merge <num> --merge --delete-branch` once gates green, CI green, and 5/5 PG (or only P3 noise remains).
 
 ## Anti-Patterns
 
 - **Narrowing scope on follow-up reviews.** Leaves the rest unreviewed. Always use the same prompt.
-- **One reviewer only for important work.** Single-reviewer blind spots are real. Minimum three for code-producing SOWs.
+- **Fewer than 5 reviewers on a production PR.** Single-reviewer or 3-reviewer blind spots are real. The 5-reviewer cycle is mandatory.
 - **Editing the prompt to be less neutral after a reviewer disagreed.** The disagreement is data, not something to argue with.
 - **Running reviewers in background and forgetting.** Use foreground. The harness handles parallelism.
 - **Pre-screening: "skip review because I'm confident".** That's exactly when you need the review.
-- **Reporting work "done" before review convergence.** The honest phrasing while review is pending is "code written, gates green, review pending".
+- **Acting on unverified claims.** Always run the verification steps before implementing a fix.
+- **Reporting work "done" before review convergence.** The honest phrasing while review is pending is "code written, gates green, review pending (X/5 PG)".
 
 ## Cross-References
 
+- Contract: `AGENTS.md` "Production-Grade Loop" section (the single source of truth).
 - Workflow: `.agents/skills/project-workflow/SKILL.md`
 - Coding: `.agents/skills/project-coding/SKILL.md`
 - Delegation: `.agents/skills/project-delegation/SKILL.md`