Skip to content

Proposal: Support Opencode for adversarial review absent Codex #13

@charles-dyfis-net

Description

@charles-dyfis-net

Motivating example output from a real-world session (doing design for an automated audiobook QA tool), using Gemma 4 26b:


Review Synthesis: Opus vs. Gemma 4

Opus reviewer: Approved with 3 advisory recommendations.
Gemma 4 reviewer: Issues Found — 12 blocking, 20 warnings.

Genuinely valuable findings from Gemma that Opus missed

  1. ReviewedDiscrepancy type is undefined (Rendering RN-5). This is the sharpest catch. The rendering and CLI kits both reference reviewed_discrepancies and ReviewedDiscrepancy, but no kit defines this type or specifies who creates it. The multimodal-review kit defines ReviewClassification only. An implementer would have to invent the wrapper. Fix: add a requirement in the multimodal-review kit.
  2. --chapter-filter pattern syntax unspecified (CLI COMPLETE-1). Is it glob, regex, or substring? Current code does substring. The kit should say so.
  3. Default model path undefined (CLI CLARITY-5). "Conventional" is meaningless to an implementer.
  4. ChapterError → exit code mapping unspecified (CONSISTENCY-3). R8 says chapter errors are caught and processing continues, but R10 doesn't clarify whether a partial-failure run (some chapters succeeded, one errored) exits 0 or 3.
  5. Context ±5 words: normalized or original? (Discrepancy CLARITY-3). Worth specifying.
  6. start_text disambiguation when multiple matches after start_page (Segmentation COMPLETE-5). Current code takes the first match (forward scan), but the kit doesn't state this.

Findings I'd push back on

  • Alignment algorithm unnamed (CLARITY-1): The kit says "deterministic" and cross-references cavekit-pdf-ingestion's text utilities. For brownfield specs, leaving algorithm choice to the build phase (which will keep difflib) is fine. Not blocking.
  • GAP markers are "implementation leakage" (GAP-1 through GAP-13): This is a methodological disagreement. In Cavekit's brownfield mode, [GAP] markers are standard — they're the delta between "what the code does now" and "what the spec requires." They tell the build phase what needs to change. Gemma's right that pure specs shouldn't describe current state, but brownfield specs intentionally do. I'd keep the GAP markers.
  • Source Traceability / Cleanup Items (IMPLEAK-1, -2, -3): Same brownfield rationale. These sections exist because we ran --from-code.
  • Overview dependency graph not R-level specific (DEP-1, DEP-2, CROSS-1): The overview is intentionally domain-level; individual kits carry the R-level dependencies.

Both reviewers caught

  • Cache key missing prompt/model version (Opus advisory + Gemma didn't flag this one directly, but the concern is related to R9).

Assessment of opencode/Gemma 4 as adversarial reviewer

Verdict: useful. It found a real type-definition gap (ReviewedDiscrepancy) that Opus approved past, plus several clarity issues worth fixing. Its false-positive rate is higher than I'd want — about half the "blocking" issues are either methodological disagreements or non-issues when you understand the brownfield context. But the signal-to-noise ratio is still good enough to be worth running.

My take is that this is surprisingly strong output.

If there's interest in accepting first-class support for using Gemma-via-Opencode in place of Codex upstream (and I realize that cavekit is opinionated and doesn't want to be all things to be all people), I'd be willing to take a shot at implementation when opportunity presents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions