Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions .agents/skills/project-delegation/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,23 @@ description: Orchestration patterns for ai-viewer — when and how to spawn suba

## The Hard Rule

The master assistant is the **orchestrator**, **QA lead**, **integrator**, and **reviewer**. The master assistant does **not** write production code. Code is produced by spawned subagents working from a written spec and failing tests.
The master assistant (CTO) is the **orchestrator**, **QA lead**, **integrator**, and the **only role that runs reviewers**. The master assistant does **not** write production code. Code is produced by spawned subagents working from a written spec and failing tests.

This rule exists because:
### The implementer is `minimax`

Per the Production-Grade Loop (see `AGENTS.md`), the CTO delegates all code production to **`minimax`** — the current stable minimax variant on litellm (default `llm-netdata-cloud/minimax-m3-coder`). The CTO is the only role that knows the project context; the implementer is a fresh-context subagent that receives a self-contained prompt (spec excerpt, failing tests, constraints, deliverable).

- The implementer (`minimax`) is **not** the same instance as the reviewer-minimax pass. Two different invocations, two different contexts, two different jobs. The implementer writes code; the reviewer reads code.
- If `minimax` is down/degraded, the CTO rotates the implementer role to the next-most-capable member of the reviewer set (default order: `qwen` → `mimo` → `deepseek` → `glm`). The rotated reviewer is **removed from the 5-reviewer cycle** for that PR (so the implementer is not also reviewing their own work); the 5-reviewer set is filled by substituting a reviewer from the ad-hoc set (`codex`, `gemini`, `claude`, `kimi`) chosen by the CTO. The CTO logs the rotation AND the substitution in the SOW under `## Implementer Rotation`.
- The CTO pins to the current stable model at time of work, per the project's "always pin to latest stable" policy. Major-version upgrades require a brief SOW; minor/patch upgrades are autonomous.

### Why this rule exists

- The master assistant's context is finite. Code-writing fills it with raw output that displaces decision history.
- Subagent output gets independently verified by the master before being trusted. Master-written code skips that verification step.
- Compaction destroys the master's working memory; subagents start with a fresh, self-contained context every time.
- Parallel subagents finish faster than serial master-context editing.
- Splitting "writer" and "reviewer" across different model families produces a more honest, less self-confirming codebase.

If the master assistant ever finds itself about to call `Edit` or `Write` on a production source file, stop and delegate.

Expand Down Expand Up @@ -186,7 +195,7 @@ Conversely, do not spawn a subagent for a one-line typo fix. Trivial verified ed

## Cross-References

- Contract: `AGENTS.md` (Delegation Protocol section)
- Contract: `AGENTS.md` "Production-Grade Loop" section (the single source of truth).
- Workflow: `.agents/skills/project-workflow/SKILL.md`
- Coding rules: `.agents/skills/project-coding/SKILL.md`
- Gates: `.agents/skills/project-quality-gates/SKILL.md`
Expand Down
129 changes: 88 additions & 41 deletions .agents/skills/project-second-opinions/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
---
name: project-second-opinions
description: Invoke external LLMs (codex, gemini, glm, kimi, mimo, minimax, qwen, deepseek) for code review, SOW review, design validation, and second opinions on ai-viewer work. Use before marking any non-trivial SOW completed and after major architectural changes.
description: Invoke the 5-reviewer Production-Grade Loop (glm, mimo, minimax, qwen, deepseek) for code review, SOW review, design validation, and second opinions on ai-viewer work. The CTO runs reviewers; the implementer never does. Use before marking any non-trivial SOW completed and after major architectural changes.
---

# Second Opinions
# Second Opinions — the 5-Reviewer Production-Grade Loop

This skill is the runtime enforcement of the **Production-Grade Loop** defined in `AGENTS.md`. The contract lives in `AGENTS.md`; this file is the implementation. If the two ever disagree, `AGENTS.md` wins.

## When To Run

External second-opinion review is **mandatory** — not "encouraged" — for any non-trivial work. The assistant does not trust itself; review converges before "done" is uttered.

**The orchestrator (master) runs review — exactly once per iteration, on the final integrated state — never the implementation subagent.** Review is the master's QA gate on code it did not author; an implementation subagent running reviewers on its own work both duplicates the master's mandatory round (the master will run it again → 2× the slow, costly review of identical code) and collapses the author/reviewer separation. Because spawned subagents inherit `AGENTS.md` (which mandates this review), the master MUST explicitly forbid reviewers in every implementation delegation prompt — see `project-delegation` skill, the `[FORBIDDEN]` block. If a subagent reports it "ran reviewers," that round does not substitute for the master's: treat the subagent's findings as a useful head start, then run the one official round on the final state and do not re-run beyond convergence.
**The orchestrator (CTO / master assistant) runs review — exactly once per iteration, on the final integrated state — never the implementation subagent.** Review is the master's QA gate on code it did not author; an implementation subagent running reviewers on its own work both duplicates the master's mandatory round (the master will run it again → 2× the slow, costly review of identical code) and collapses the author/reviewer separation. Because spawned subagents inherit `AGENTS.md` (which mandates this review), the master MUST explicitly forbid reviewers in every implementation delegation prompt — see `project-delegation` skill, the `[FORBIDDEN]` block. If a subagent reports it "ran reviewers," that round does not substitute for the master's: treat the subagent's findings as a useful head start, then run the one official round on the final state and do not re-run beyond convergence.

Mandatory before marking any of these SOWs `completed`:

Expand All @@ -21,20 +23,57 @@ Mandatory before marking any of these SOWs `completed`:
- Any SOW spanning > 3 files of non-trivial logic.
- Any SOW the operator flags as important.

Mandatory minimum standard:
### The 5-reviewer set (CTO only)

The CTO runs **exactly these five reviewers in parallel** on every non-trivial code-producing PR:

| # | Reviewer | Invocation |
|---|---|---|
| 1 | `glm` | `timeout 1800 opencode run -m "llm-netdata-cloud/glm-5.1" --agent code-reviewer "PROMPT"` |
| 2 | `mimo` | `timeout 1800 opencode run -m "llm-netdata-cloud/mimo-v2.5-pro" --agent code-reviewer "PROMPT"` |
| 3 | `minimax` (fresh-context review pass; **never** the implementer instance) | `timeout 1800 opencode run -m "llm-netdata-cloud/minimax-m3-coder" --agent code-reviewer "PROMPT"` |
| 4 | `qwen` | `timeout 1800 opencode run -m "llm-netdata-cloud/qwen3.7-plus" --agent code-reviewer "PROMPT"` |
| 5 | `deepseek` | `timeout 1800 opencode run -m "llm-netdata-cloud/deepseek-v4-pro" --agent code-reviewer "PROMPT"` |

All five run in parallel (one Bash invocation each, batched in a single assistant turn). Foreground, with `timeout 1800`. The CTO is the only role that runs them.

`codex` and `gemini` from the previous default set are **deprecated** for production-grade review on this project; they are kept in the invocation table for ad-hoc SOW/spec review only.

### PRODUCTION GRADE vote

Each reviewer responds with one of two outcomes:

- `PRODUCTION GRADE` — ship it, no actionable findings.
- `NEEDS WORK` — one or more findings, each with file:line, severity (P0–P3), and a concrete fix proposal.

The CTO does not merge until 5/5 PRODUCTION GRADE, **or** until only P3 noise remains AND the CTO has recorded the P3 findings in the SOW under `## Reviews` with a disposition. P0/P1 always block. P2 always blocks unless explicitly waived by the CTO with a documented reason (rare).

- **At least three reviewers in parallel** for code review (default set: codex + gemini + glm + qwen).
- **Same prompt across iterations**; never narrow scope on follow-up rounds.
- **Iterate until reviewers converge** with no new actionable findings.
- **Record every round in the SOW** under `## Reviews` with reviewer attribution and resolution.
### Stop conditions (P0/P1/P2/P3)

Skip only:
- **5/5 PRODUCTION GRADE, gates green, CI green** → CTO merges.
- **Any P0/P1 NEEDS WORK** → fix, push, re-trigger full 5-reviewer cycle. Iterate.
- **P2 NEEDS WORK** → fix in the same PR, re-trigger the full 5-reviewer cycle; merge only when 5/5 PG or only P3 noise remains.
- **P3 NEEDS WORK** → fix in the same PR, document in SOW `## Reviews`, merge with note when gates green and CI green.
- **Hard stall: 5+ cycles with new P0/P1 each round** → CTO writes a `## Regression` section in the SOW, opens a follow-up SOW in `.agents/sow/pending/`, and surfaces to the operator with a business-level recommendation. Do not loop forever.

### Claim verification (CRITICAL — CTO's job)

Reviewer findings are **claims, not findings**. The CTO verifies every claim before acting on it. Verification steps:

1. **Read the file:line the reviewer cited.** Does the code actually do what the reviewer said?
2. **Run the repro.** If the reviewer says "this race fires under X", construct X and run it.
3. **Cross-check with the spec.** If the reviewer says "violates SPEC Y §3.2", open the spec and confirm.
4. **Decide**: real bug (fix), false positive (reject with evidence in the SOW `## Reviews`), disputed (escalate).

Acting on unverified claims causes two failure modes: (a) implementing phantom bugs that don't exist, (b) ignoring real bugs because the reviewer "sounded uncertain". Verify, then act. The CTO is the only one who decides.

Skip the 5-reviewer cycle only:

- Typo / format-only changes the assistant has visually verified.
- Mechanical renames with no behavior change.
- Doc-only updates with no spec/runtime impact.

The bar to skip is high. When in doubt, run reviewers.
The bar to skip is high. When in doubt, run the cycle.

## Safety Rule (CRITICAL)

Expand All @@ -48,33 +87,31 @@ Detection signals (any of these → do not run reviewers):

When detected, complete the review work directly and return.

## Invocation Patterns
**This rule applies to the `minimax` review pass too**: when running as a reviewer, `minimax` is a fresh-context read-only session. If the prompt contains any of the above signals (it always does — the review prompt is "YOU ARE RUNNING BY ANOTHER ASSISTANT, FOR A SECOND OPINION"), the reviewer MUST NOT spawn further reviewers. The CTO is the only role that may spawn the 5-reviewer cycle.

All commands use `timeout 1800` (30 minutes max wait). Run multiple reviewers in parallel (one Bash invocation per reviewer, batched in one assistant turn). Run in foreground (no `&`, no `run_in_background`).
## Invocation Patterns (legacy / ad-hoc)

The Production-Grade Loop supersedes the previous default set. The 5 reviewers above are mandatory for any non-trivial code-producing PR. The reviewers below remain available for **ad-hoc SOW/spec review** (one-off, off the production loop), where the CTO may pick a smaller subset:

| Reviewer | Command |
|---|---|
| codex | `timeout 1800 codex exec "PROMPT" --skip-git-repo-check` |
| gemini | `timeout 1800 gemini -p "PROMPT"` |
| claude (Anthropic) | `CLAUDECODE="" timeout 1800 claude -p "PROMPT"` |
| glm | `timeout 1800 opencode run -m "llm-netdata-cloud/glm-5.1" --agent code-reviewer "PROMPT"` |
| kimi | `timeout 1800 opencode run -m "llm-netdata-cloud/kimi-k2.6" --agent code-reviewer "PROMPT"` |
| mimo | `timeout 1800 opencode run -m "llm-netdata-cloud/mimo-v2.5-pro" --agent code-reviewer "PROMPT"` |
| qwen | `timeout 1800 opencode run -m "llm-netdata-cloud/qwen3.6-plus" --agent code-reviewer "PROMPT"` |
| minimax | `timeout 1800 opencode run -m "llm-netdata-cloud/minimax-m2.7-coder" --agent code-reviewer "PROMPT"` |
| deepseek | `timeout 1800 opencode run -m "deepseek/deepseek-v4-pro" --agent code-reviewer "PROMPT"` |
| codex (ad-hoc only) | `timeout 1800 codex exec "PROMPT" --skip-git-repo-check` |
| gemini (ad-hoc only) | `timeout 1800 gemini -p "PROMPT"` |
| claude (Anthropic) (ad-hoc only) | `CLAUDECODE="" timeout 1800 claude -p "PROMPT"` |
| kimi (ad-hoc only) | `timeout 1800 opencode run -m "llm-netdata-cloud/kimi-k2.6" --agent code-reviewer "PROMPT"` |

The five production reviewers (`glm`, `mimo`, `minimax`-fresh, `qwen`, `deepseek`) run mandatorily on every non-trivial PR. For ad-hoc SOW/spec pre-review, the CTO may invoke any reviewer (including the five production reviewers) at their discretion. Ad-hoc rounds are independent of the production-loop run. If a production reviewer is unavailable for a cycle (litellm error, model deprecated, timeout), the CTO retries once, then substitutes from the ad-hoc set (`codex`, `gemini`, `claude`, `kimi`) and logs the substitution in the SOW `## Reviews` with the reason. Two or more simultaneous unavailability → operator surface as a hard stall.

`cd` into the project root before running. Use relative paths (some reviewers stumble on arbitrary absolute paths).

## Default Reviewer Set
## Ad-hoc Reviewer Set (off the production loop)

- **Code review** (PR-style): codex + gemini + glm + qwen in parallel. Four reviewers triangulate well; more becomes noise.
- **SOW / design review**: codex + gemini + mimo in parallel.
- **Security-focused review**: add minimax + deepseek.
- **Security-focused review**: minimax + deepseek.

## Prompt Templates

### Code review
### Code review (Production-Grade Loop)

```
YOU ARE RUNNING BY ANOTHER ASSISTANT, FOR A SECOND OPINION:
Expand All @@ -83,6 +120,10 @@ Please review the following change in this repository:

<diff or files to review, with file paths>

Vote ONE of:
- PRODUCTION GRADE — ship it, no actionable findings.
- NEEDS WORK — list findings below, each with file:line, severity (P0/P1/P2/P3), and a concrete fix proposal.

Look for:
- Correctness bugs
- Race conditions
Expand All @@ -101,11 +142,15 @@ MANDATORY RULES (FOLLOW ALWAYS):
THIS IS A READ-ONLY REQUEST. PROVIDE YOUR REVIEW.
```

### SOW review
### SOW review (ad-hoc)

```
YOU ARE RUNNING BY ANOTHER ASSISTANT, FOR A SECOND OPINION:

Vote ONE of:
- PRODUCTION GRADE — ship it, no actionable findings.
- NEEDS WORK — list findings below, each with file:line, severity (P0/P1/P2/P3), and a concrete fix proposal.

Please review SOW file: .agents/sow/<pending|current>/<name>.md

Check:
Expand All @@ -124,7 +169,7 @@ MANDATORY RULES:
THIS IS A READ-ONLY REQUEST.
```

### Spec/design review
### Spec/design review (ad-hoc)

```
YOU ARE RUNNING BY ANOTHER ASSISTANT, FOR A SECOND OPINION:
Expand All @@ -146,32 +191,34 @@ THIS IS A READ-ONLY REQUEST.

## Workflow

1. Decide the review type (code / SOW / spec).
2. Pick the reviewer set.
1. Decide the review type (production PR cycle / ad-hoc SOW / ad-hoc spec).
2. Pick the reviewer set: 5 reviewers for production PR; smaller subset for ad-hoc.
3. Compose the prompt (use a template; keep neutral, no embedded conclusions).
4. **Show the prompt to the user before running.**
5. Run all reviewers in parallel (multiple Bash tool calls in one turn).
6. Wait for all to return.
7. Synthesize findings:
4. Run all reviewers in parallel (multiple Bash tool calls in one turn).
5. Wait for all to return.
6. Synthesize findings:
- List each unique finding once, attributed to which reviewer flagged it.
- Classify: bug, design concern, style, false positive.
- Decide which to act on.
8. Address findings (code changes + new tests where applicable).
9. **Re-run the same reviewers with the same scope** plus a short note of fixes applied.
10. Repeat until no actionable findings remain.
11. Record the review history in the SOW under `## Reviews`.
- Classify: P0/P1/P2/P3.
- **Verify every claim** (read the file:line, run the repro, cross-check spec).
7. Address findings (code changes + new tests where applicable). Reject false positives with evidence.
8. **Re-run the same reviewers with the same scope** plus a short note of fixes applied.
9. Repeat until no actionable P0/P1 findings remain. P2 fixed in PR. P3 documented in SOW.
10. Record the review history in the SOW under `## Reviews` with reviewer attribution, the CTO's claim-verification verdict, the fix applied (or "rejected — false positive" with evidence), and the final PRODUCTION GRADE count (e.g. `5/5 PG after fix`).
11. CTO merges via `gh pr merge <num> --merge --delete-branch` once gates green, CI green, and 5/5 PG (or only P3 noise remains).

## Anti-Patterns

- **Narrowing scope on follow-up reviews.** Leaves the rest unreviewed. Always use the same prompt.
- **One reviewer only for important work.** Single-reviewer blind spots are real. Minimum three for code-producing SOWs.
- **Fewer than 5 reviewers on a production PR.** Single-reviewer or 3-reviewer blind spots are real. The 5-reviewer cycle is mandatory.
- **Editing the prompt to be less neutral after a reviewer disagreed.** The disagreement is data, not something to argue with.
- **Running reviewers in background and forgetting.** Use foreground. The harness handles parallelism.
- **Pre-screening: "skip review because I'm confident".** That's exactly when you need the review.
- **Reporting work "done" before review convergence.** The honest phrasing while review is pending is "code written, gates green, review pending".
- **Acting on unverified claims.** Always run the verification steps before implementing a fix.
- **Reporting work "done" before review convergence.** The honest phrasing while review is pending is "code written, gates green, review pending (X/5 PG)".

## Cross-References

- Contract: `AGENTS.md` "Production-Grade Loop" section (the single source of truth).
- Workflow: `.agents/skills/project-workflow/SKILL.md`
- Coding: `.agents/skills/project-coding/SKILL.md`
- Delegation: `.agents/skills/project-delegation/SKILL.md`
Expand Down
Loading