| Dimension | Question | Where it bites first |
|---|---|---|
| Correctness | Does the artifact match its inputs? | Implementation, Spec |
| Completeness | Are all required sections present? | Requirements, Design |
| Consistency | Do artifacts agree with each other? | Spec ↔ Tasks ↔ Tests |
| Testability | Can each requirement be verified? | Requirements (EARS), Test plan |
| Maintainability | Can a stranger pick this up cold? | Implementation, Design |
| Traceability | Does every output link to its inputs? | All stages |
- Validate early. A defect found at Requirements is 100× cheaper than the same defect found at Review.
- Validate continuously. Don't batch validation to the end of a stage.
- Prefer explicit checks over assumptions. "Should be fine" is never a quality argument.
- Two layers per gate:
- Deterministic checks first — schema validation, linters, tests, ID uniqueness, cross-ref resolution. Fast, free, reliable.
- Critic-agent review second — judgment calls: "Is this requirement actually testable?", "Does this design honour the constitution?". Returns pass/fail + rationale.
Use the deterministic metrics script when a user asks for current project quality status, workflow KPI reporting, or information-system health signals:
npm run quality:metricsClaude users can invoke the same report through the workflow command:
/quality:status [--feature <feature-slug>] [--compare] [--save]The report summarizes stage-aware workflow health, evidence-backed maturity, lifecycle deliverable progress, artifact presence, required frontmatter coverage, requirement downstream coverage, test coverage, EARS usage, QA checklist gaps, blockers, and open clarifications. Scope to one feature with npm run quality:metrics -- --feature <feature-slug>, emit machine-readable output with npm run quality:metrics -- --json, persist a baseline with npm run quality:metrics -- --save, or compare against the latest saved baseline with npm run quality:metrics -- --compare.
Agent handoffs should treat the KPI snapshot as deterministic evidence:
orchestratorrecommends it when next-step readiness is unclear.qauses feature-scoped gaps as test-plan and test-report inputs.revieweruses JSON metrics as evidence before writing a verdict.release-manageruses feature-scoped comparison as release-readiness context.retrospectivesaves the post-learning baseline for future trend comparison.- project, roadmap, and portfolio agents consume JSON snapshots in status reports when supplied.
Interpret metric meaning, decision use, and misuse warnings with docs/quality-metrics.md.
The KPI snapshot is evidence for a quality review, not a replacement for the stage quality gate or critic-agent review.
- Problem statement is one paragraph and understandable to a non-expert.
- Target users named.
- Desired outcome stated.
- Constraints listed.
- Open questions captured (these become research items).
- Scope is bounded — no "boil the ocean" framing.
- Each research question from the idea is addressed.
- Sources cited (URLs or internal docs).
- ≥ 2 alternative approaches considered.
- User needs validated (data, interviews, or stated assumptions if neither).
- Technical considerations noted.
- Risks listed with severity (low/med/high).
- Goals and non-goals explicit.
- Personas / stakeholders named.
- Jobs to be done captured.
- All functional requirements use EARS notation (
docs/ears-notation.md). - Each requirement has a stable ID (
REQ-<AREA>-NNN). - Non-functional requirements (NFRs) listed: performance, security, accessibility, etc.
- Acceptance criteria are testable.
- Success metrics defined.
-
/spec:clarifyreturned no open questions.
- UX: primary flows mapped; information architecture clear.
- UI: key screens / states identified; design system referenced.
- Architecture: components and responsibilities listed; data flow shown; integration points named.
- Alternatives considered and rejected with rationale.
- Irreversible architectural decisions have ADRs.
- Risks have mitigations.
- Cross-stage check: every requirement is addressed somewhere in the design.
- Interfaces defined (schemas, signatures, API contracts).
- Data structures defined.
- Commands / behaviours enumerated.
- State transitions modelled (where relevant).
- Validation rules explicit.
- Edge cases listed.
- Test scenarios derivable from the spec alone.
- Each spec item has an ID and traces to ≥ 1 requirement.
- Each task ≤ ~½ day.
- Each task has a stable ID (
T-<AREA>-NNN). - Each task references ≥ 1 requirement ID.
- Dependencies between tasks explicit.
- Each task has a Definition of Done.
- TDD ordering: test tasks precede implementation tasks for the same requirement.
- Owner assigned (agent or human).
- Implementation matches the spec (no silent deviations).
- Any deviations documented in
implementation-log.mdwith rationale. - Lint clean.
- Type checks pass.
- Unit tests pass for changed surface.
- No unrelated changes ("scope creep") in the same task.
- Commit messages reference task IDs.
- Every EARS clause has ≥ 1 test (
TEST-<AREA>-NNN). - Critical paths covered (happy + key edge cases).
- Coverage threshold met (project-defined).
- Failures reproducible from the report.
- Gaps acknowledged (not hidden).
- Non-functional checks run where relevant (perf, security, a11y).
- Requirements satisfied (verified against RTM).
- Design honoured (no off-design implementation).
- No critical findings open.
- Risk assessment current.
- Traceability matrix complete and consistent.
- Constitution check passes.
- Brand review — when the diff touches
sites/,.claude/skills/specorator-design/, or any HTML/CSS/JSX producing user-visible UI, thebrand-revieweragent posts a PASS line or structured findings. Blocking findings (token literal, emoji, icon import, gradient/texture, white page background) must be resolved before merge. Seetemplates/brand-review-checklist.mdand.claude/agents/brand-reviewer.md.
- Summary of changes written.
- User-visible impact stated.
- Release readiness guide created when the increment has multi-stakeholder, operational, security/privacy/compliance, commercial, or communication risk; otherwise explicitly marked not used.
- Product perspectives and stakeholder requirements have an owner, evidence, status, and accepted go/no-go decision when the guide is used.
- Known limitations disclosed.
- Verification steps documented.
- Public product page updated or explicitly marked unaffected when user-visible capabilities, positioning, onboarding, or CTAs change.
- Rollback plan documented.
- Observability (logs, metrics, alerts) in place.
- Communication plan (if user-facing) ready.
- What worked / what didn't / what to change — each with an owner.
- Spec-adherence assessed (did we drift?).
- Lessons fed back as proposed amendments to templates, agents, or the constitution.
Trivial work (typo fix, dependency bump, copy change) may skip stages. Set the artifact's status to the bare skipped enum in workflow-state.md frontmatter, and put the reason in the body's "Skips" section so the status remains machine-parseable:
artifacts:
idea.md: skipped
research.md: skipped
...## Skips
- `idea.md`, `research.md` — trivial: typo fixA retrospective is never skipped, even for trivial work — though for trivial work it may be a single sentence.
Used in test reports, review findings, and incident triage:
- S1 — Critical. Data loss, security breach, full outage, regulatory exposure. Drop everything.
- S2 — High. Critical user flow broken, no acceptable workaround. Fix this sprint.
- S3 — Medium. Non-critical flow broken with workaround, or quality regression. Backlog with priority.
- S4 — Low. Cosmetic, polish, or low-frequency edge case. Backlog.
/spec:review and /spec:test use these labels. A finding's severity is the user impact, not the engineering effort to fix.
- Spec drift — implementation diverges from spec without updating it. Catch in Review.
- Test theatre — tests exist but don't actually exercise the EARS clauses. Catch via RTM.
- Decision evaporation — choices made in implementation without ADRs. Catch in Review.
- Premature abstraction — speculative generality added during implementation. Catch in Review.
- Quality-gate erosion — gates softened to "unblock" delivery. Surface in Retrospective; fix the gate or fix the upstream stage.