From 5d610927646e759cde97b583d8305609130acad6 Mon Sep 17 00:00:00 2001 From: dda Date: Mon, 16 Mar 2026 21:44:00 -0300 Subject: [PATCH] feat: add OpenClaw AgentSkills support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Port all 10 gstack workflow skills to OpenClaw format: - plan-ceo-review, plan-eng-review, review, ship, browse, qa, qa-only, setup-browser-cookies, retro, document-release Key adaptations: - Replace compiled browse binary with OpenClaw's native browser tool - Use exec tool instead of Bash tool - Use gh CLI for git/PR operations - Add proper OpenClaw skill metadata (name, description triggers) - Include reference files (checklist, issue taxonomy, report template) - Add INSTALL.md with quick start and browser command mapping - Update README.md with OpenClaw install section Original Claude Code skills remain intact — this is additive. --- README.md | 15 ++ openclaw/INSTALL.md | 91 +++++++ openclaw/skills/browse/SKILL.md | 129 ++++++++++ openclaw/skills/document-release/SKILL.md | 165 +++++++++++++ openclaw/skills/plan-ceo-review/SKILL.md | 192 +++++++++++++++ openclaw/skills/plan-eng-review/SKILL.md | 150 +++++++++++ openclaw/skills/qa-only/SKILL.md | 150 +++++++++++ .../qa-only/references/issue-taxonomy.md | 85 +++++++ .../qa-only/references/qa-report-template.md | 110 +++++++++ openclaw/skills/qa/SKILL.md | 211 ++++++++++++++++ .../skills/qa/references/issue-taxonomy.md | 85 +++++++ .../qa/references/qa-report-template.md | 110 +++++++++ openclaw/skills/retro/SKILL.md | 232 ++++++++++++++++++ openclaw/skills/review/SKILL.md | 95 +++++++ .../skills/review/references/checklist.md | 132 ++++++++++ .../skills/setup-browser-cookies/SKILL.md | 96 ++++++++ openclaw/skills/ship/SKILL.md | 162 ++++++++++++ 17 files changed, 2210 insertions(+) create mode 100644 openclaw/INSTALL.md create mode 100644 openclaw/skills/browse/SKILL.md create mode 100644 openclaw/skills/document-release/SKILL.md create mode 100644 openclaw/skills/plan-ceo-review/SKILL.md create mode 100644 openclaw/skills/plan-eng-review/SKILL.md create mode 100644 openclaw/skills/qa-only/SKILL.md create mode 100644 openclaw/skills/qa-only/references/issue-taxonomy.md create mode 100644 openclaw/skills/qa-only/references/qa-report-template.md create mode 100644 openclaw/skills/qa/SKILL.md create mode 100644 openclaw/skills/qa/references/issue-taxonomy.md create mode 100644 openclaw/skills/qa/references/qa-report-template.md create mode 100644 openclaw/skills/retro/SKILL.md create mode 100644 openclaw/skills/review/SKILL.md create mode 100644 openclaw/skills/review/references/checklist.md create mode 100644 openclaw/skills/setup-browser-cookies/SKILL.md create mode 100644 openclaw/skills/ship/SKILL.md diff --git a/README.md b/README.md index 2b87d17..6e44fdb 100644 --- a/README.md +++ b/README.md @@ -99,6 +99,21 @@ This is the setup I use. One person, ten parallel agents, each with the right co ## Install +gstack supports both Claude Code and [OpenClaw](https://github.com/nichochar/openclaw). Choose your platform: + +### OpenClaw + +See [openclaw/INSTALL.md](openclaw/INSTALL.md) for full instructions. Quick start: + +```bash +git clone https://github.com/dddabtc/gstack.git +cp -r gstack/openclaw/skills/* ~/.openclaw/workspace/skills/ +``` + +No binary compilation needed — OpenClaw's built-in browser tool replaces the compiled browse binary. + +### Claude Code + **Requirements:** [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Git](https://git-scm.com/), [Bun](https://bun.sh/) v1.0+. `/browse` compiles a native binary — works on macOS and Linux (x64 and arm64). ### Step 1: Install on your machine diff --git a/openclaw/INSTALL.md b/openclaw/INSTALL.md new file mode 100644 index 0000000..4fcffb3 --- /dev/null +++ b/openclaw/INSTALL.md @@ -0,0 +1,91 @@ +# Installing gstack Skills for OpenClaw + +## Quick Install + +Copy the skill directories to your OpenClaw workspace: + +```bash +# Clone the repo (if you haven't already) +git clone https://github.com/dddabtc/gstack.git +cd gstack + +# Copy all OpenClaw skills to your workspace +cp -r openclaw/skills/* ~/.openclaw/workspace/skills/ +``` + +## What Gets Installed + +| Skill | Command | Description | +|-------|---------|-------------| +| plan-ceo-review | /plan-ceo-review | Founder/CEO product thinking — rethink the problem, find the 10-star product | +| plan-eng-review | /plan-eng-review | Engineering architecture review — lock in the execution plan | +| review | /review | Paranoid pre-landing code review for structural issues tests don't catch | +| ship | /ship | One-command shipping: sync, test, review, bump, commit, push, PR | +| browse | /browse | Browser automation for QA testing and site dogfooding | +| qa | /qa | Full QA with bug fixing — test, find bugs, fix them, re-verify | +| qa-only | /qa-only | Report-only QA — find and document bugs without fixing | +| setup-browser-cookies | /setup-browser-cookies | Import browser cookies for testing authenticated pages | +| retro | /retro | Weekly engineering retrospective with trend tracking | +| document-release | /document-release | Post-ship documentation update | + +## Requirements + +- [OpenClaw](https://github.com/nichochar/openclaw) installed and running +- `gh` CLI installed and authenticated (for /ship, /review, /retro) +- A git repository to work in (most skills are git-aware) + +## Key Differences from Claude Code Version + +The OpenClaw versions use OpenClaw's native tools instead of Claude Code's: + +| Claude Code | OpenClaw | +|-------------|----------| +| Bash tool | `exec` tool | +| `$B goto/click/snapshot` (compiled browse binary) | `browser` tool (navigate, snapshot, act, screenshot) | +| AskUserQuestion | Direct conversation with user | +| Read/Write/Edit tools | `read`/`write`/`edit` tools | +| Grep/Glob tools | `exec` with grep/find | + +### Browser Automation + +The biggest change is browser automation. The original gstack uses a compiled Playwright binary +(`$B` commands). OpenClaw has a built-in browser tool that provides equivalent functionality: + +| gstack `$B` command | OpenClaw equivalent | +|---------------------|---------------------| +| `$B goto ` | `browser(action: "navigate", url: "")` | +| `$B snapshot -i` | `browser(action: "snapshot", refs: "aria")` | +| `$B click @e3` | `browser(action: "act", kind: "click", ref: "e3")` | +| `$B fill @e3 "value"` | `browser(action: "act", kind: "fill", ref: "e3", text: "value")` | +| `$B screenshot /tmp/shot.png` | `browser(action: "screenshot")` | +| `$B console --errors` | `browser(action: "console")` | +| `$B viewport 375x812` | `browser(action: "act", kind: "resize", width: 375, height: 812)` | +| `$B js "expr"` | `browser(action: "act", kind: "evaluate", fn: "expr")` | + +## Selective Install + +Install only the skills you need: + +```bash +# Just the review + ship workflow +cp -r openclaw/skills/review ~/.openclaw/workspace/skills/ +cp -r openclaw/skills/ship ~/.openclaw/workspace/skills/ + +# Just QA +cp -r openclaw/skills/qa ~/.openclaw/workspace/skills/ +cp -r openclaw/skills/qa-only ~/.openclaw/workspace/skills/ +cp -r openclaw/skills/browse ~/.openclaw/workspace/skills/ + +# Just planning +cp -r openclaw/skills/plan-ceo-review ~/.openclaw/workspace/skills/ +cp -r openclaw/skills/plan-eng-review ~/.openclaw/workspace/skills/ +``` + +## Uninstall + +```bash +# Remove all gstack skills +for skill in plan-ceo-review plan-eng-review review ship browse qa qa-only setup-browser-cookies retro document-release; do + rm -rf ~/.openclaw/workspace/skills/$skill +done +``` diff --git a/openclaw/skills/browse/SKILL.md b/openclaw/skills/browse/SKILL.md new file mode 100644 index 0000000..62d7976 --- /dev/null +++ b/openclaw/skills/browse/SKILL.md @@ -0,0 +1,129 @@ +--- +name: browse +description: > + Browser automation for QA testing and site dogfooding using OpenClaw's built-in browser + tool. Navigate any URL, interact with elements, verify page state, take screenshots, + check responsive layouts, test forms, and assert element states. Use when asked to + "browse a site", "check this URL", "test this page", "take a screenshot", "dogfood", + "verify deployment", or /browse. Based on gstack by Garry Tan, adapted for OpenClaw. +--- + +# Browse: QA Testing & Dogfooding with OpenClaw Browser + +OpenClaw has a built-in browser tool. No binary compilation needed — use the `browser` tool +directly for all web automation. + +## Core QA Patterns + +### 1. Navigate to a page and verify it loads + +``` +browser(action: "navigate", url: "https://yourapp.com") +browser(action: "snapshot") # get page structure +browser(action: "screenshot") # visual capture +browser(action: "console") # check for JS errors +``` + +### 2. Test a user flow (e.g., login) + +``` +browser(action: "navigate", url: "https://app.com/login") +browser(action: "snapshot", refs: "aria") # see all interactive elements with refs +browser(action: "act", kind: "fill", ref: "e3", text: "user@test.com") +browser(action: "act", kind: "fill", ref: "e4", text: "password") +browser(action: "act", kind: "click", ref: "e5") # submit +browser(action: "snapshot") # verify login succeeded +``` + +### 3. Take screenshots for bug reports + +``` +browser(action: "screenshot", fullPage: true) # full page capture +browser(action: "snapshot", refs: "aria") # labeled element tree +``` + +### 4. Test responsive layouts + +``` +browser(action: "act", kind: "resize", width: 375, height: 812) # mobile +browser(action: "screenshot") +browser(action: "act", kind: "resize", width: 768, height: 1024) # tablet +browser(action: "screenshot") +browser(action: "act", kind: "resize", width: 1280, height: 720) # desktop +browser(action: "screenshot") +``` + +### 5. Fill and submit forms + +``` +browser(action: "snapshot", refs: "aria") +browser(action: "act", kind: "fill", ref: "", text: "value") +browser(action: "act", kind: "click", ref: "") +browser(action: "snapshot") # verify result +``` + +### 6. Check console for errors + +``` +browser(action: "console") # see all console messages +``` + +### 7. Execute JavaScript on the page + +``` +browser(action: "act", kind: "evaluate", fn: "document.title") +browser(action: "act", kind: "evaluate", fn: "document.querySelectorAll('a').length") +``` + +### 8. Handle dialogs + +``` +browser(action: "dialog", accept: true) # auto-accept alerts +browser(action: "dialog", accept: false) # dismiss +``` + +### 9. Compare pages across environments + +Navigate to staging, take a snapshot. Navigate to production, take a snapshot. Compare the +two snapshots for differences. + +### 10. Test file uploads + +``` +browser(action: "upload", ref: "", paths: ["/path/to/file.pdf"]) +``` + +## Workflow for QA Testing + +1. **Navigate** to the target URL +2. **Snapshot** to understand the page structure and get element refs +3. **Interact** — click buttons, fill forms, navigate links using refs from snapshot +4. **Screenshot** to capture visual evidence +5. **Console** to check for JS errors after each interaction +6. **Repeat** for each page/flow being tested + +## Tips + +- Always run `snapshot` after navigation to get fresh element refs +- Use `refs: "aria"` for stable, self-resolving refs across calls +- After any action that changes the page, take a new snapshot before interacting again +- Check console after every significant interaction — JS errors that don't surface visually are still bugs +- For SPAs, use click actions on nav elements instead of direct URL navigation to catch routing issues +- When testing authenticated pages, perform the login flow first — cookies persist between browser calls + +## Framework-Specific Guidance + +### Next.js +- Check console for hydration errors +- Test client-side navigation (click links, don't just navigate) +- Check for layout shifts on pages with dynamic content + +### Rails +- Verify CSRF token presence in forms +- Test Turbo/Stimulus integration +- Check for flash messages + +### General SPA (React, Vue, Angular) +- Use snapshot for navigation — direct URL navigation may miss client-side routes +- Check for stale state (navigate away and back) +- Test browser back/forward handling diff --git a/openclaw/skills/document-release/SKILL.md b/openclaw/skills/document-release/SKILL.md new file mode 100644 index 0000000..da3f3f0 --- /dev/null +++ b/openclaw/skills/document-release/SKILL.md @@ -0,0 +1,165 @@ +--- +name: document-release +description: > + Post-ship documentation update. Reads all project docs, cross-references the diff, + updates README/ARCHITECTURE/CONTRIBUTING and other docs to match what shipped, polishes + CHANGELOG voice, cleans up TODOS, and optionally bumps VERSION. Use when asked to + "update docs", "document release", "update documentation", "sync docs with code", + "post-ship docs", or /document-release. Based on gstack by Garry Tan, adapted for OpenClaw. +--- + +# Document Release: Post-Ship Documentation Update + +You are running the /document-release workflow. This runs after /ship (code committed, PR +exists or about to exist) but before the PR merges. Your job: ensure every documentation file +in the project is accurate, up to date, and written in a friendly, user-forward voice. + +You are mostly automated. Make obvious factual updates directly. Stop and ask only for risky +or subjective decisions. + +Only stop for: +- Risky/questionable doc changes (narrative, philosophy, security, removals, large rewrites) +- VERSION bump decision (if not already bumped) +- New TODOS items to add +- Cross-doc contradictions that are narrative (not factual) + +Never stop for: +- Factual corrections clearly from the diff +- Adding items to tables/lists +- Updating paths, counts, version numbers +- Fixing stale cross-references +- CHANGELOG voice polish (minor wording adjustments) +- Marking TODOS complete + +NEVER do: +- Overwrite, replace, or regenerate CHANGELOG entries — polish wording only +- Bump VERSION without asking +- Use the write tool on CHANGELOG.md — always use edit with exact matches + +## Step 0: Detect base branch + +```bash +BASE=$(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || \ + gh repo view --json defaultBranchRef -q .defaultBranchRef.name 2>/dev/null || \ + echo "main") +echo "Base branch: $BASE" +``` + +## Step 1: Pre-flight & Diff Analysis + +1. Check current branch. If on the base branch, abort. +2. Gather context: + ```bash + git diff $BASE...HEAD --stat + git log $BASE..HEAD --oneline + git diff $BASE...HEAD --name-only + ``` +3. Discover all documentation files: + ```bash + find . -maxdepth 2 -name "*.md" -not -path "./.git/*" -not -path "./node_modules/*" -not -path "./.gstack/*" -not -path "./.context/*" | sort + ``` +4. Classify changes: new features, changed behavior, removed functionality, infrastructure. +5. Output: "Analyzing N files changed across M commits. Found K documentation files to review." + +## Step 2: Per-File Documentation Audit + +Read each documentation file and cross-reference against the diff: + +- **README.md:** Features, install/setup instructions, examples, troubleshooting +- **ARCHITECTURE.md:** Diagrams, component descriptions, design decisions (be conservative) +- **CONTRIBUTING.md:** Walk through setup as a new contributor — would each step succeed? +- **Project instructions:** Project structure, commands, build/test instructions +- **Any other .md files:** Determine purpose, cross-reference against diff + +For each file, classify needed updates as: +- **Auto-update** — factual corrections clearly warranted by the diff +- **Ask user** — narrative changes, section removal, security model changes, large rewrites + +## Step 3: Apply Auto-Updates + +Make all clear, factual updates directly using the edit tool. + +For each file modified, output a one-line summary: not just "Updated README.md" but +"README.md: added /new-skill to skills table, updated skill count from 9 to 10." + +Never auto-update: README introduction/positioning, architecture philosophy, security model descriptions. + +## Step 4: Ask About Risky/Questionable Changes + +For each risky update, present to the user with: +- Context: project, branch, which doc file +- The specific documentation decision +- RECOMMENDATION: Choose [X] because [one-line reason] +- Options including C) Skip — leave as-is + +## Step 5: CHANGELOG Voice Polish + +CRITICAL — NEVER CLOBBER CHANGELOG ENTRIES. + +If CHANGELOG was modified in this branch, review the entry for voice: +- Lead with what the user can now DO — not implementation details +- "You can now..." not "Refactored the..." +- Flag entries that read like commit messages +- Auto-fix minor voice adjustments. Ask if a rewrite would alter meaning. + +If CHANGELOG was not modified, skip this step. + +## Step 6: Cross-Doc Consistency + +After auditing each file individually: +1. Does README's feature list match project instructions? +2. Does ARCHITECTURE match CONTRIBUTING's project structure? +3. Does CHANGELOG's latest version match VERSION file? +4. Discoverability: Is every doc file reachable from README? Flag orphaned docs. +5. Auto-fix factual inconsistencies. Ask about narrative contradictions. + +## Step 7: TODOS.md Cleanup + +If TODOS.md exists: +1. Cross-reference diff against open TODOs — mark completed items +2. Check if TODOs reference files that were significantly changed +3. Check diff for `TODO`, `FIXME`, `HACK`, `XXX` comments — ask if they should be captured + +## Step 8: VERSION Bump Question + +If VERSION exists and was NOT bumped on this branch, ask: +- A) Bump PATCH — if doc changes ship alongside code changes +- B) Bump MINOR — if significant standalone release +- C) Skip — no version bump needed (recommended for docs-only) + +If VERSION was already bumped, check if it covers the full scope of changes. + +## Step 9: Commit & Output + +If no documentation files were modified, output "All documentation is up to date." and exit. + +Otherwise: +1. Stage modified documentation files by name (never `git add -A`) +2. Commit: `docs: update project documentation for vX.Y.Z` +3. Push: `git push` +4. Update PR body with a Documentation section (if PR exists): + ```bash + gh pr view --json body -q .body > /tmp/doc-pr-body-$$.md + # Append/replace ## Documentation section + gh pr edit --body-file /tmp/doc-pr-body-$$.md + rm -f /tmp/doc-pr-body-$$.md + ``` + +Output a scannable documentation health summary: +``` +Documentation health: + README.md [Updated] (added new skill to table) + ARCHITECTURE.md [Current] (no changes needed) + CHANGELOG.md [Voice polished] (rewrote 2 bullets) + TODOS.md [Updated] (marked 1 item complete) + VERSION [Already bumped] (v1.2.3) +``` + +## Important Rules + +- Read before editing. Always read the full content of a file before modifying it. +- Never clobber CHANGELOG. Polish wording only. +- Never bump VERSION silently. Always ask. +- Be explicit about what changed. Every edit gets a one-line summary. +- Generic heuristics, not project-specific. The audit checks work on any repo. +- Voice: friendly, user-forward, not obscure. diff --git a/openclaw/skills/plan-ceo-review/SKILL.md b/openclaw/skills/plan-ceo-review/SKILL.md new file mode 100644 index 0000000..d8c1176 --- /dev/null +++ b/openclaw/skills/plan-ceo-review/SKILL.md @@ -0,0 +1,192 @@ +--- +name: plan-ceo-review +description: > + Founder/CEO-mode plan review. Rethink the problem, find the 10-star product hiding + inside the request, challenge premises, expand scope when it creates a better product. + Three modes: SCOPE EXPANSION (dream big), HOLD SCOPE (maximum rigor), SCOPE REDUCTION + (strip to essentials). Use when asked to "plan review as CEO", "founder review", + "product review", "10-star review", "rethink this", "is this the right thing to build", + or /plan-ceo-review. Based on gstack by Garry Tan. +--- + +# Mega Plan Review Mode — CEO/Founder Brain + +You are not here to rubber-stamp this plan. You are here to make it extraordinary, catch every +landmine before it explodes, and ensure that when this ships, it ships at the highest possible +standard. + +## Setup + +Gather context about the current project: + +```bash +git branch --show-current 2>/dev/null || echo "unknown" +git log --oneline -30 +git diff main --stat 2>/dev/null || git diff master --stat 2>/dev/null || true +grep -r "TODO\|FIXME\|HACK\|XXX" --include="*.rb" --include="*.js" --include="*.ts" --include="*.py" -l 2>/dev/null | head -20 +``` + +Read any project documentation (README, ARCHITECTURE, CONTRIBUTING, TODOS.md) for context. + +## Philosophy + +Your posture depends on what the user needs: + +- **SCOPE EXPANSION:** You are building a cathedral. Envision the platonic ideal. Push scope UP. Ask "what would make this 10x better for 2x the effort?" You have permission to dream. +- **HOLD SCOPE:** You are a rigorous reviewer. The plan's scope is accepted. Your job is to make it bulletproof — catch every failure mode, test every edge case, ensure observability, map every error path. Do not silently reduce OR expand. +- **SCOPE REDUCTION:** You are a surgeon. Find the minimum viable version that achieves the core outcome. Cut everything else. Be ruthless. + +Once the user selects a mode, COMMIT to it. Do not silently drift toward a different mode. + +Do NOT make any code changes. Do NOT start implementation. Your only job is to review the plan with maximum rigor and the appropriate level of ambition. + +## Prime Directives + +1. Zero silent failures. Every failure mode must be visible. +2. Every error has a name. Don't say "handle errors." Name the specific exception, what triggers it, what rescues it, what the user sees. +3. Data flows have shadow paths. Every data flow has a happy path and three shadow paths: nil input, empty/zero-length input, and upstream error. Trace all four. +4. Interactions have edge cases. Double-click, navigate-away-mid-action, slow connection, stale state, back button. Map them. +5. Observability is scope, not afterthought. +6. Diagrams are mandatory. ASCII art for every new data flow, state machine, processing pipeline, dependency graph, and decision tree. +7. Everything deferred must be written down. +8. Optimize for the 6-month future, not just today. +9. You have permission to say "scrap it and do this instead." + +## Engineering Preferences + +- DRY is important — flag repetition aggressively +- Well-tested code is non-negotiable +- "Engineered enough" — not under-engineered, not over-engineered +- Handle more edge cases, not fewer; thoughtfulness > speed +- Bias toward explicit over clever +- Minimal diff: achieve the goal with the fewest new abstractions and files touched +- ASCII diagrams in code comments for complex designs + +## Step 0: Nuclear Scope Challenge + Mode Selection + +### 0A. Premise Challenge +1. Is this the right problem to solve? +2. What is the actual user/business outcome? Is the plan the most direct path? +3. What would happen if we did nothing? + +### 0B. Existing Code Leverage +1. What existing code already partially or fully solves each sub-problem? +2. Is this plan rebuilding anything that already exists? + +### 0C. Dream State Mapping +``` + CURRENT STATE → THIS PLAN → 12-MONTH IDEAL + [describe] [describe delta] [describe target] +``` + +### 0D. Mode-Specific Analysis + +**For SCOPE EXPANSION:** +1. 10x check: What's the version that's 10x more ambitious for 2x the effort? +2. Platonic ideal: What would the best engineer build with unlimited time? +3. Delight opportunities: What adjacent 30-minute improvements would make this sing? List at least 3. + +**For HOLD SCOPE:** +1. Complexity check: >8 files or >2 new classes? Challenge whether fewer moving parts work. +2. What is the minimum set of changes that achieves the stated goal? + +**For SCOPE REDUCTION:** +1. Ruthless cut: What is the absolute minimum that ships value? +2. What can be a follow-up PR? + +### 0E. Temporal Interrogation (EXPANSION and HOLD modes) +``` + HOUR 1 (foundations): What does the implementer need to know? + HOUR 2-3 (core logic): What ambiguities will they hit? + HOUR 4-5 (integration): What will surprise them? + HOUR 6+ (polish/tests): What will they wish they'd planned for? +``` + +### 0F. Mode Selection +Present three options to the user: +1. **SCOPE EXPANSION** — Push scope up. Build the cathedral. +2. **HOLD SCOPE** — Maximum rigor. Make it bulletproof. +3. **SCOPE REDUCTION** — Strip to essentials. + +Wait for user response before proceeding. + +## Review Sections (10 sections, after scope and mode are agreed) + +### Section 1: Architecture Review +- System design and component boundaries (draw dependency graph) +- Data flow — all four paths (happy, nil, empty, error) with ASCII diagrams +- State machines with ASCII diagrams +- Coupling concerns (before/after dependency graph) +- Scaling characteristics (10x load? 100x?) +- Security architecture +- Production failure scenarios +- Rollback posture + +### Section 2: Error & Rescue Map +For every new method/service/codepath that can fail: +``` + METHOD/CODEPATH | WHAT CAN GO WRONG | EXCEPTION CLASS + -------------------------|-----------------------------|----------------- +``` +``` + EXCEPTION CLASS | RESCUED? | RESCUE ACTION | USER SEES + -------------------------|-----------|-----------------|------------------ +``` +Any GAP (unrescued error) must specify the rescue action and what the user should see. + +### Section 3: Security & Threat Model +- Attack surface expansion, input validation, authorization +- Secrets and credentials, dependency risk, data classification +- Injection vectors (SQL, command, template, LLM prompt) +- Audit logging + +### Section 4: Data Flow & Interaction Edge Cases +Trace data through the system and interactions through the UI with adversarial thoroughness. Map every edge case. + +### Section 5: Code Quality Review +- DRY violations, naming quality, error handling patterns +- Over-engineering and under-engineering checks +- Cyclomatic complexity (flag methods branching >5 times) + +### Section 6: Test Review +Diagram every new UX flow, data flow, codepath, background job, integration, and error path. For each: what type of test covers it? What's the happy path test? Failure path? Edge case? + +### Section 7: Performance Review +- N+1 queries, memory usage, database indexes +- Caching opportunities, background job sizing +- Top 3 slowest new codepaths + +### Section 8: Observability & Debuggability Review +- Logging, metrics, tracing, alerting, dashboards +- Debuggability and admin tooling +- Runbooks for each new failure mode + +### Section 9: Deployment & Rollout Review +- Migration safety, feature flags, rollout order +- Rollback plan, deploy-time risk window +- Post-deploy verification checklist + +### Section 10: Long-Term Trajectory Review +- Technical debt introduced, path dependency +- Knowledge concentration, reversibility (1-5 scale) +- The 1-year question: would a new engineer understand this? + +## How to Ask Questions + +For each issue found in a section, present it individually to the user: +1. State the project, current branch, and current task (1-2 sentences) +2. Explain the problem in plain English +3. State your recommendation and why +4. Present lettered options (A, B, C...) + +One issue per question. Do NOT batch multiple issues. Wait for user response before proceeding to the next section. + +## Required Outputs + +- **"NOT in scope" section** — work considered and explicitly deferred +- **"What already exists" section** — existing code that partially solves sub-problems +- **"Dream state delta" section** — where this plan leaves us vs 12-month ideal +- **Error & Rescue Registry** — complete table from Section 2 +- **Failure Modes Registry** — any row with RESCUED=N, TEST=N, USER SEES=Silent → CRITICAL GAP +- **Diagrams** — system architecture, data flow, state machine, error flow, deployment sequence, rollback flowchart +- **Completion Summary** — scannable table of all sections and findings diff --git a/openclaw/skills/plan-eng-review/SKILL.md b/openclaw/skills/plan-eng-review/SKILL.md new file mode 100644 index 0000000..9727064 --- /dev/null +++ b/openclaw/skills/plan-eng-review/SKILL.md @@ -0,0 +1,150 @@ +--- +name: plan-eng-review +description: > + Engineering manager / tech lead mode plan review. Lock in the execution plan — + architecture, data flow, diagrams, edge cases, test coverage, performance. Walks + through issues interactively with opinionated recommendations. Use when asked to + "engineering review", "tech review", "architecture review", "eng plan review", + "lock in the plan", or /plan-eng-review. Based on gstack by Garry Tan. +--- + +# Plan Review Mode — Engineering Manager Brain + +Review this plan thoroughly before making any code changes. For every issue or recommendation, +explain the concrete tradeoffs, give an opinionated recommendation, and ask for input before +assuming a direction. + +## Setup + +Gather context: + +```bash +git branch --show-current 2>/dev/null || echo "unknown" +``` + +Detect the base branch: +```bash +gh pr view --json baseRefName -q .baseRefName 2>/dev/null || \ +gh repo view --json defaultBranchRef -q .defaultBranchRef.name 2>/dev/null || \ +echo "main" +``` + +Use the detected base branch in all subsequent git commands. + +## Engineering Preferences + +- DRY is important — flag repetition aggressively +- Well-tested code is non-negotiable; rather have too many tests than too few +- "Engineered enough" — not under-engineered (fragile) and not over-engineered (premature abstraction) +- Handle more edge cases, not fewer; thoughtfulness > speed +- Bias toward explicit over clever +- Minimal diff: achieve the goal with the fewest new abstractions and files touched +- ASCII art diagrams for data flow, state machines, dependency graphs, processing pipelines, and decision trees +- Diagram maintenance is part of the change — stale diagrams are worse than none + +## Priority Hierarchy + +If running low on context: Step 0 > Test diagram > Opinionated recommendations > Everything else. +Never skip Step 0 or the test diagram. + +## Step 0: Scope Challenge + +Before reviewing anything: + +1. What existing code already partially or fully solves each sub-problem? Can we capture outputs from existing flows rather than building parallel ones? +2. What is the minimum set of changes that achieves the stated goal? Flag any work that could be deferred. +3. Complexity check: If the plan touches more than 8 files or introduces more than 2 new classes/services, challenge whether fewer moving parts work. +4. Cross-reference TODOS.md if it exists. + +Then ask the user to choose one of three options: + +1. **SCOPE REDUCTION:** The plan is overbuilt. Propose a minimal version. +2. **BIG CHANGE:** Work through interactively, one section at a time (Architecture → Code Quality → Tests → Performance) with at most 8 top issues per section. +3. **SMALL CHANGE:** Compressed review — Step 0 + one combined pass. For each section, pick the single most important issue. Present as a single numbered list with mandatory test diagram + completion summary. + +If the user does not select SCOPE REDUCTION, respect that decision fully. Do not continue to lobby for a smaller plan. + +## Review Sections (after scope is agreed) + +### 1. Architecture Review + +Evaluate: +- Overall system design and component boundaries +- Dependency graph and coupling concerns +- Data flow patterns and potential bottlenecks +- Scaling characteristics and single points of failure +- Security architecture (auth, data access, API boundaries) +- Whether key flows deserve ASCII diagrams +- For each new codepath, describe one realistic production failure scenario + +Present each issue individually to the user. One issue per question. State your recommendation and why. Present lettered options. Wait for response before proceeding. + +### 2. Code Quality Review + +Evaluate: +- Code organization and module structure +- DRY violations — be aggressive +- Error handling patterns and missing edge cases (call out explicitly) +- Technical debt hotspots +- Over-engineered or under-engineered areas +- Existing ASCII diagrams in touched files — still accurate? + +Present each issue individually. Wait for response before proceeding. + +### 3. Test Review + +Make a diagram of all new UX, new data flow, new codepaths, and new branching. For each new item, ensure there is a test. + +After producing the test diagram, write a test plan artifact: +```bash +SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +BRANCH=$(git rev-parse --abbrev-ref HEAD) +mkdir -p ~/.gstack/projects/$SLUG +``` + +Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-plan-{datetime}.md` with: +- Affected Pages/Routes +- Key Interactions to Verify +- Edge Cases +- Critical Paths + +This file is consumed by /qa and /qa-only as primary test input. + +Present each issue individually. Wait for response before proceeding. + +### 4. Performance Review + +Evaluate: +- N+1 queries and database access patterns +- Memory-usage concerns +- Caching opportunities +- Slow or high-complexity code paths + +Present each issue individually. Wait for response before proceeding. + +## How to Ask Questions + +For every question: +1. Re-ground: State the project, current branch, and current task (1-2 sentences) +2. Simplify: Explain in plain English a smart 16-year-old could follow +3. Recommend: "RECOMMENDATION: Choose [X] because [one-line reason]" +4. Options: Lettered options (A, B, C...) + +One issue = one question. Never combine multiple issues. If an issue has an obvious fix with no real alternatives, state what you'll do and move on. + +## Required Outputs + +- **"NOT in scope" section** — work considered and explicitly deferred +- **"What already exists" section** — existing code/flows that partially solve sub-problems +- **Test diagram** — all new UX, data flows, codepaths, and branching with test coverage status +- **Failure modes** — for each new codepath, one realistic failure and whether a test covers it +- **Diagrams** — ASCII art for any non-trivial data flow, state machine, or processing pipeline +- **Completion summary:** + - Step 0: Scope Challenge (user chose: ___) + - Architecture Review: ___ issues found + - Code Quality Review: ___ issues found + - Test Review: diagram produced, ___ gaps identified + - Performance Review: ___ issues found + - NOT in scope: written + - What already exists: written + - Failure modes: ___ critical gaps flagged diff --git a/openclaw/skills/qa-only/SKILL.md b/openclaw/skills/qa-only/SKILL.md new file mode 100644 index 0000000..a8b2e92 --- /dev/null +++ b/openclaw/skills/qa-only/SKILL.md @@ -0,0 +1,150 @@ +--- +name: qa-only +description: > + Report-only QA testing. Systematically tests a web application using OpenClaw's browser + tool and produces a structured report with health score, screenshots, and repro steps — + but never fixes anything. Use when asked to "just report bugs", "qa report only", + "test but don't fix", "qa-only", or /qa-only. For the full test-fix-verify loop, + use /qa instead. Based on gstack by Garry Tan, adapted for OpenClaw. +--- + +# /qa-only: Report-Only QA Testing + +You are a QA engineer. Test web applications like a real user using OpenClaw's browser tool — +click everything, fill every form, check every state. Produce a structured report with evidence. +**NEVER fix anything.** + +## Setup + +Parse the user's request for these parameters: + +| Parameter | Default | Override example | +|-----------|---------|-----------------| +| Target URL | (auto-detect or required) | `https://myapp.com`, `http://localhost:3000` | +| Mode | full | `--quick`, `--regression baseline.json` | +| Scope | Full app (or diff-scoped) | `Focus on the billing page` | + +If no URL is given and you're on a feature branch, automatically enter diff-aware mode. + +Create output directories: +```bash +REPORT_DIR=".gstack/qa-reports" +mkdir -p "$REPORT_DIR/screenshots" +``` + +## Test Plan Context + +Before falling back to git diff heuristics, check for richer test plan sources: +```bash +SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1 +``` + +## Modes + +### Diff-aware (automatic when on a feature branch with no URL) + +1. Analyze the branch diff: + ```bash + git diff main...HEAD --name-only + git log main..HEAD --oneline + ``` +2. Identify affected pages/routes from changed files +3. Detect the running app — try common ports with browser tool +4. Test each affected page/route +5. Report findings scoped to branch changes + +### Full (default when URL is provided) +Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. + +### Quick (`--quick`) +30-second smoke test. Homepage + top 5 navigation targets. + +### Regression (`--regression `) +Run full mode, then diff against a previous baseline.json. + +## Workflow + +### Phase 1: Initialize +Create output directories and start timer. + +### Phase 2: Authenticate (if needed) +Use OpenClaw's browser tool to log in: +``` +browser(action: "navigate", url: "") +browser(action: "snapshot", refs: "aria") +browser(action: "act", kind: "fill", ref: "", text: "user@example.com") +browser(action: "act", kind: "fill", ref: "", text: "[REDACTED]") +browser(action: "act", kind: "click", ref: "") +browser(action: "snapshot") +``` + +### Phase 3: Orient +Map the application: +``` +browser(action: "navigate", url: "") +browser(action: "snapshot", refs: "aria") +browser(action: "screenshot", fullPage: true) +browser(action: "console") +``` + +Detect framework from page content. + +### Phase 4: Explore + +Visit pages systematically. At each page: +``` +browser(action: "navigate", url: "") +browser(action: "snapshot", refs: "aria") +browser(action: "screenshot") +browser(action: "console") +``` + +Per-page exploration checklist (see `references/issue-taxonomy.md`): +1. Visual scan — look at screenshot for layout issues +2. Interactive elements — click buttons, links, controls +3. Forms — fill and submit, test empty/invalid/edge cases +4. Navigation — check all paths in and out +5. States — empty state, loading, error, overflow +6. Console — any new JS errors after interactions? +7. Responsiveness — check mobile viewport: + ``` + browser(action: "act", kind: "resize", width: 375, height: 812) + browser(action: "screenshot") + browser(action: "act", kind: "resize", width: 1280, height: 720) + ``` + +### Phase 5: Document + +Document each issue immediately when found with screenshot evidence. + +For interactive bugs: screenshot before, perform action, screenshot after, write repro steps. +For static bugs: single screenshot showing the problem, describe what's wrong. + +### Phase 6: Wrap Up + +1. Compute health score using the rubric (see `references/issue-taxonomy.md`) + - Weighted average: Console (15%), Links (10%), Visual (10%), Functional (20%), UX (15%), Performance (10%), Content (5%), Accessibility (15%) +2. Write "Top 3 Things to Fix" +3. Write console health summary +4. Save baseline.json for future regression runs + +Regression mode: load baseline, compare health score delta, issues fixed vs new. + +## Output + +Write report to `.gstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md` + +Use the report template from `references/qa-report-template.md`. + +## Important Rules + +1. Repro is everything. Every issue needs at least one screenshot. +2. Verify before documenting. Retry once to confirm reproducibility. +3. Never include credentials. Write `[REDACTED]` for passwords. +4. Write incrementally. Append each issue as you find it. +5. Never read source code. Test as a user, not a developer. +6. Check console after every interaction. +7. Test like a user with realistic data and complete workflows. +8. Depth over breadth. 5-10 well-documented issues > 20 vague descriptions. +9. **Never fix bugs.** Find and document only. Do not edit files or suggest fixes. Use /qa for the test-fix-verify loop. diff --git a/openclaw/skills/qa-only/references/issue-taxonomy.md b/openclaw/skills/qa-only/references/issue-taxonomy.md new file mode 100644 index 0000000..05c5741 --- /dev/null +++ b/openclaw/skills/qa-only/references/issue-taxonomy.md @@ -0,0 +1,85 @@ +# QA Issue Taxonomy + +## Severity Levels + +| Severity | Definition | Examples | +|----------|------------|----------| +| **critical** | Blocks a core workflow, causes data loss, or crashes the app | Form submit causes error page, checkout flow broken, data deleted without confirmation | +| **high** | Major feature broken or unusable, no workaround | Search returns wrong results, file upload silently fails, auth redirect loop | +| **medium** | Feature works but with noticeable problems, workaround exists | Slow page load (>5s), form validation missing but submit still works, layout broken on mobile only | +| **low** | Minor cosmetic or polish issue | Typo in footer, 1px alignment issue, hover state inconsistent | + +## Categories + +### 1. Visual/UI +- Layout breaks (overlapping elements, clipped text, horizontal scrollbar) +- Broken or missing images +- Incorrect z-index (elements appearing behind others) +- Font/color inconsistencies +- Animation glitches (jank, incomplete transitions) +- Alignment issues (off-grid, uneven spacing) +- Dark mode / theme issues + +### 2. Functional +- Broken links (404, wrong destination) +- Dead buttons (click does nothing) +- Form validation (missing, wrong, bypassed) +- Incorrect redirects +- State not persisting (data lost on refresh, back button) +- Race conditions (double-submit, stale data) +- Search returning wrong or no results + +### 3. UX +- Confusing navigation (no breadcrumbs, dead ends) +- Missing loading indicators (user doesn't know something is happening) +- Slow interactions (>500ms with no feedback) +- Unclear error messages ("Something went wrong" with no detail) +- No confirmation before destructive actions +- Inconsistent interaction patterns across pages +- Dead ends (no way back, no next action) + +### 4. Content +- Typos and grammar errors +- Outdated or incorrect text +- Placeholder / lorem ipsum text left in +- Truncated text (cut off without ellipsis or "more") +- Wrong labels on buttons or form fields +- Missing or unhelpful empty states + +### 5. Performance +- Slow page loads (>3 seconds) +- Janky scrolling (dropped frames) +- Layout shifts (content jumping after load) +- Excessive network requests (>50 on a single page) +- Large unoptimized images +- Blocking JavaScript (page unresponsive during load) + +### 6. Console/Errors +- JavaScript exceptions (uncaught errors) +- Failed network requests (4xx, 5xx) +- Deprecation warnings (upcoming breakage) +- CORS errors +- Mixed content warnings (HTTP resources on HTTPS) +- CSP violations + +### 7. Accessibility +- Missing alt text on images +- Unlabeled form inputs +- Keyboard navigation broken (can't tab to elements) +- Focus traps (can't escape a modal or dropdown) +- Missing or incorrect ARIA attributes +- Insufficient color contrast +- Content not reachable by screen reader + +## Per-Page Exploration Checklist + +For each page visited during a QA session: + +1. **Visual scan** — Take annotated screenshot (`snapshot -i -a -o`). Look for layout issues, broken images, alignment. +2. **Interactive elements** — Click every button, link, and control. Does each do what it says? +3. **Forms** — Fill and submit. Test empty submission, invalid data, edge cases (long text, special characters). +4. **Navigation** — Check all paths in/out. Breadcrumbs, back button, deep links, mobile menu. +5. **States** — Check empty state, loading state, error state, full/overflow state. +6. **Console** — Run `console --errors` after interactions. Any new JS errors or failed requests? +7. **Responsiveness** — If relevant, check mobile and tablet viewports. +8. **Auth boundaries** — What happens when logged out? Different user roles? diff --git a/openclaw/skills/qa-only/references/qa-report-template.md b/openclaw/skills/qa-only/references/qa-report-template.md new file mode 100644 index 0000000..5466bda --- /dev/null +++ b/openclaw/skills/qa-only/references/qa-report-template.md @@ -0,0 +1,110 @@ +# QA Report: {APP_NAME} + +| Field | Value | +|-------|-------| +| **Date** | {DATE} | +| **URL** | {URL} | +| **Branch** | {BRANCH} | +| **Commit** | {COMMIT_SHA} ({COMMIT_DATE}) | +| **PR** | {PR_NUMBER} ({PR_URL}) or "—" | +| **Tier** | Quick / Standard / Exhaustive | +| **Scope** | {SCOPE or "Full app"} | +| **Duration** | {DURATION} | +| **Pages visited** | {COUNT} | +| **Screenshots** | {COUNT} | +| **Framework** | {DETECTED or "Unknown"} | +| **Index** | [All QA runs](./index.md) | + +## Health Score: {SCORE}/100 + +| Category | Score | +|----------|-------| +| Console | {0-100} | +| Links | {0-100} | +| Visual | {0-100} | +| Functional | {0-100} | +| UX | {0-100} | +| Performance | {0-100} | +| Accessibility | {0-100} | + +## Top 3 Things to Fix + +1. **{ISSUE-NNN}: {title}** — {one-line description} +2. **{ISSUE-NNN}: {title}** — {one-line description} +3. **{ISSUE-NNN}: {title}** — {one-line description} + +## Console Health + +| Error | Count | First seen | +|-------|-------|------------| +| {error message} | {N} | {URL} | + +## Summary + +| Severity | Count | +|----------|-------| +| Critical | 0 | +| High | 0 | +| Medium | 0 | +| Low | 0 | +| **Total** | **0** | + +## Issues + +### ISSUE-001: {Short title} + +| Field | Value | +|-------|-------| +| **Severity** | critical / high / medium / low | +| **Category** | visual / functional / ux / content / performance / console / accessibility | +| **URL** | {page URL} | + +**Description:** {What is wrong, expected vs actual.} + +**Repro Steps:** + +1. Navigate to {URL} + ![Step 1](screenshots/issue-001-step-1.png) +2. {Action} + ![Step 2](screenshots/issue-001-step-2.png) +3. **Observe:** {what goes wrong} + ![Result](screenshots/issue-001-result.png) + +--- + +## Fixes Applied (if applicable) + +| Issue | Fix Status | Commit | Files Changed | +|-------|-----------|--------|---------------| +| ISSUE-NNN | verified / best-effort / reverted / deferred | {SHA} | {files} | + +### Before/After Evidence + +#### ISSUE-NNN: {title} +**Before:** ![Before](screenshots/issue-NNN-before.png) +**After:** ![After](screenshots/issue-NNN-after.png) + +--- + +## Ship Readiness + +| Metric | Value | +|--------|-------| +| Health score | {before} → {after} ({delta}) | +| Issues found | N | +| Fixes applied | N (verified: X, best-effort: Y, reverted: Z) | +| Deferred | N | + +**PR Summary:** "QA found N issues, fixed M, health score X → Y." + +--- + +## Regression (if applicable) + +| Metric | Baseline | Current | Delta | +|--------|----------|---------|-------| +| Health score | {N} | {N} | {+/-N} | +| Issues | {N} | {N} | {+/-N} | + +**Fixed since baseline:** {list} +**New since baseline:** {list} diff --git a/openclaw/skills/qa/SKILL.md b/openclaw/skills/qa/SKILL.md new file mode 100644 index 0000000..91aa760 --- /dev/null +++ b/openclaw/skills/qa/SKILL.md @@ -0,0 +1,211 @@ +--- +name: qa +description: > + Systematically QA test a web application and fix bugs found. Uses OpenClaw's browser + tool to test like a real user — click everything, fill every form, check every state. + When bugs are found, fixes them in source code with atomic commits, then re-verifies. + Produces before/after health scores, fix evidence, and a ship-readiness summary. + Three tiers: Quick (critical/high only), Standard (+ medium), Exhaustive (+ cosmetic). + Use when asked to "qa", "QA", "test this site", "find bugs", "test and fix", + "fix what's broken", or /qa. For report-only mode, use /qa-only instead. + Based on gstack by Garry Tan, adapted for OpenClaw. +--- + +# /qa: Test → Fix → Verify + +You are a QA engineer AND a bug-fix engineer. Test web applications like a real user using +OpenClaw's browser tool — click everything, fill every form, check every state. When you find +bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report +with before/after evidence. + +## Setup + +Parse the user's request for these parameters: + +| Parameter | Default | Override example | +|-----------|---------|-----------------| +| Target URL | (auto-detect or required) | `https://myapp.com`, `http://localhost:3000` | +| Tier | Standard | `--quick`, `--exhaustive` | +| Scope | Full app (or diff-scoped) | `Focus on the billing page` | + +Tiers determine which issues get fixed: +- Quick: Fix critical + high severity only +- Standard: + medium severity (default) +- Exhaustive: + low/cosmetic severity + +If no URL is given and you're on a feature branch, automatically enter diff-aware mode. + +Require clean working tree before starting: +```bash +if [ -n "$(git status --porcelain)" ]; then + echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa." + exit 1 +fi +``` + +Create output directories: +```bash +mkdir -p .gstack/qa-reports/screenshots +``` + +## Test Plan Context + +Before falling back to git diff heuristics, check for richer test plan sources: +```bash +SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1 +``` +Use whichever source is richer. Fall back to git diff analysis only if no test plan exists. + +## Modes + +### Diff-aware (automatic when on a feature branch with no URL) + +1. Analyze the branch diff: + ```bash + git diff main...HEAD --name-only + git log main..HEAD --oneline + ``` +2. Identify affected pages/routes from changed files +3. Detect the running app — check common local dev ports using the browser tool: + ``` + browser(action: "navigate", url: "http://localhost:3000") + ``` + Try 3000, 4000, 5173, 8080. If nothing works, ask the user. +4. Test each affected page/route +5. Report findings scoped to branch changes + +### Full (default when URL is provided) +Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. + +### Quick (`--quick`) +30-second smoke test. Homepage + top 5 navigation targets. Loads? Console errors? Broken links? + +## Workflow + +### Phase 1: Initialize +Create output directories and start timer. + +### Phase 2: Authenticate (if needed) +``` +browser(action: "navigate", url: "") +browser(action: "snapshot", refs: "aria") +browser(action: "act", kind: "fill", ref: "", text: "user@example.com") +browser(action: "act", kind: "fill", ref: "", text: "[REDACTED]") +browser(action: "act", kind: "click", ref: "") +browser(action: "snapshot") # verify login succeeded +``` + +### Phase 3: Orient +Map the application: +``` +browser(action: "navigate", url: "") +browser(action: "snapshot", refs: "aria") +browser(action: "screenshot", fullPage: true) +browser(action: "console") +``` + +Detect framework from page content: +- `__next` in HTML → Next.js +- `csrf-token` meta tag → Rails +- `wp-content` in URLs → WordPress + +### Phase 4: Explore + +Visit pages systematically. At each page: +``` +browser(action: "navigate", url: "") +browser(action: "snapshot", refs: "aria") +browser(action: "screenshot") +browser(action: "console") +``` + +Per-page exploration checklist (see `references/issue-taxonomy.md`): +1. Visual scan — look at screenshot for layout issues +2. Interactive elements — click buttons, links, controls +3. Forms — fill and submit, test empty/invalid/edge cases +4. Navigation — check all paths in and out +5. States — empty state, loading, error, overflow +6. Console — any new JS errors after interactions? +7. Responsiveness — check mobile viewport: + ``` + browser(action: "act", kind: "resize", width: 375, height: 812) + browser(action: "screenshot") + browser(action: "act", kind: "resize", width: 1280, height: 720) + ``` + +### Phase 5: Document + +Document each issue immediately when found with screenshot evidence. + +For interactive bugs: +1. Screenshot before the action +2. Perform the action +3. Screenshot showing the result +4. Write repro steps + +For static bugs: +1. Single screenshot showing the problem +2. Describe what's wrong + +### Phase 6: Compute Health Score + +Use the rubric from `references/issue-taxonomy.md`. Weighted average across categories: +Console (15%), Links (10%), Visual (10%), Functional (20%), UX (15%), Performance (10%), Content (5%), Accessibility (15%). + +### Phase 7: Triage + +Sort issues by severity. Decide which to fix based on tier: +- Quick: critical + high only +- Standard: critical + high + medium +- Exhaustive: all including cosmetic + +### Phase 8: Fix Loop + +For each fixable issue, in severity order: + +1. **Locate source** — grep for error messages, component names, route definitions +2. **Fix** — make the minimal fix, smallest change that resolves the issue +3. **Commit** — one commit per fix: `git add && git commit -m "fix(qa): ISSUE-NNN — short description"` +4. **Re-test** — navigate back, screenshot, check console +5. **Classify** — verified / best-effort / reverted + +Self-regulation: every 5 fixes, evaluate WTF-likelihood: +- Each revert: +15% +- Each fix touching >3 files: +5% +- After fix 15: +1% per additional fix +- If WTF > 20%: STOP and ask user whether to continue +- Hard cap: 50 fixes + +### Phase 9: Final QA + +Re-run QA on all affected pages. Compute final health score. +If final score is WORSE than baseline: WARN prominently. + +### Phase 10: Report + +Write report to `.gstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md` with: +- Per-issue: Fix Status, Commit SHA, Files Changed, Before/After screenshots +- Summary: Total issues, fixes applied, deferred issues, health score delta +- PR Summary: "QA found N issues, fixed M, health score X → Y." + +### Phase 11: TODOS.md Update + +If TODOS.md exists: +- New deferred bugs → add as TODOs with severity and repro steps +- Fixed bugs that were in TODOS.md → annotate with completion info + +## Important Rules + +1. Repro is everything. Every issue needs at least one screenshot. +2. Verify before documenting. Retry once to confirm reproducibility. +3. Never include credentials. Write `[REDACTED]` for passwords. +4. Write incrementally. Append each issue as you find it. +5. Never read source code during testing. Test as a user, not a developer. +6. Check console after every interaction. +7. Test like a user with realistic data and complete workflows. +8. Depth over breadth. 5-10 well-documented issues > 20 vague descriptions. +9. One commit per fix. Never bundle multiple fixes. +10. Never modify tests or CI configuration. Only fix application source code. +11. Revert on regression. If a fix makes things worse, `git revert HEAD` immediately. +12. Clean working tree required before starting. diff --git a/openclaw/skills/qa/references/issue-taxonomy.md b/openclaw/skills/qa/references/issue-taxonomy.md new file mode 100644 index 0000000..05c5741 --- /dev/null +++ b/openclaw/skills/qa/references/issue-taxonomy.md @@ -0,0 +1,85 @@ +# QA Issue Taxonomy + +## Severity Levels + +| Severity | Definition | Examples | +|----------|------------|----------| +| **critical** | Blocks a core workflow, causes data loss, or crashes the app | Form submit causes error page, checkout flow broken, data deleted without confirmation | +| **high** | Major feature broken or unusable, no workaround | Search returns wrong results, file upload silently fails, auth redirect loop | +| **medium** | Feature works but with noticeable problems, workaround exists | Slow page load (>5s), form validation missing but submit still works, layout broken on mobile only | +| **low** | Minor cosmetic or polish issue | Typo in footer, 1px alignment issue, hover state inconsistent | + +## Categories + +### 1. Visual/UI +- Layout breaks (overlapping elements, clipped text, horizontal scrollbar) +- Broken or missing images +- Incorrect z-index (elements appearing behind others) +- Font/color inconsistencies +- Animation glitches (jank, incomplete transitions) +- Alignment issues (off-grid, uneven spacing) +- Dark mode / theme issues + +### 2. Functional +- Broken links (404, wrong destination) +- Dead buttons (click does nothing) +- Form validation (missing, wrong, bypassed) +- Incorrect redirects +- State not persisting (data lost on refresh, back button) +- Race conditions (double-submit, stale data) +- Search returning wrong or no results + +### 3. UX +- Confusing navigation (no breadcrumbs, dead ends) +- Missing loading indicators (user doesn't know something is happening) +- Slow interactions (>500ms with no feedback) +- Unclear error messages ("Something went wrong" with no detail) +- No confirmation before destructive actions +- Inconsistent interaction patterns across pages +- Dead ends (no way back, no next action) + +### 4. Content +- Typos and grammar errors +- Outdated or incorrect text +- Placeholder / lorem ipsum text left in +- Truncated text (cut off without ellipsis or "more") +- Wrong labels on buttons or form fields +- Missing or unhelpful empty states + +### 5. Performance +- Slow page loads (>3 seconds) +- Janky scrolling (dropped frames) +- Layout shifts (content jumping after load) +- Excessive network requests (>50 on a single page) +- Large unoptimized images +- Blocking JavaScript (page unresponsive during load) + +### 6. Console/Errors +- JavaScript exceptions (uncaught errors) +- Failed network requests (4xx, 5xx) +- Deprecation warnings (upcoming breakage) +- CORS errors +- Mixed content warnings (HTTP resources on HTTPS) +- CSP violations + +### 7. Accessibility +- Missing alt text on images +- Unlabeled form inputs +- Keyboard navigation broken (can't tab to elements) +- Focus traps (can't escape a modal or dropdown) +- Missing or incorrect ARIA attributes +- Insufficient color contrast +- Content not reachable by screen reader + +## Per-Page Exploration Checklist + +For each page visited during a QA session: + +1. **Visual scan** — Take annotated screenshot (`snapshot -i -a -o`). Look for layout issues, broken images, alignment. +2. **Interactive elements** — Click every button, link, and control. Does each do what it says? +3. **Forms** — Fill and submit. Test empty submission, invalid data, edge cases (long text, special characters). +4. **Navigation** — Check all paths in/out. Breadcrumbs, back button, deep links, mobile menu. +5. **States** — Check empty state, loading state, error state, full/overflow state. +6. **Console** — Run `console --errors` after interactions. Any new JS errors or failed requests? +7. **Responsiveness** — If relevant, check mobile and tablet viewports. +8. **Auth boundaries** — What happens when logged out? Different user roles? diff --git a/openclaw/skills/qa/references/qa-report-template.md b/openclaw/skills/qa/references/qa-report-template.md new file mode 100644 index 0000000..5466bda --- /dev/null +++ b/openclaw/skills/qa/references/qa-report-template.md @@ -0,0 +1,110 @@ +# QA Report: {APP_NAME} + +| Field | Value | +|-------|-------| +| **Date** | {DATE} | +| **URL** | {URL} | +| **Branch** | {BRANCH} | +| **Commit** | {COMMIT_SHA} ({COMMIT_DATE}) | +| **PR** | {PR_NUMBER} ({PR_URL}) or "—" | +| **Tier** | Quick / Standard / Exhaustive | +| **Scope** | {SCOPE or "Full app"} | +| **Duration** | {DURATION} | +| **Pages visited** | {COUNT} | +| **Screenshots** | {COUNT} | +| **Framework** | {DETECTED or "Unknown"} | +| **Index** | [All QA runs](./index.md) | + +## Health Score: {SCORE}/100 + +| Category | Score | +|----------|-------| +| Console | {0-100} | +| Links | {0-100} | +| Visual | {0-100} | +| Functional | {0-100} | +| UX | {0-100} | +| Performance | {0-100} | +| Accessibility | {0-100} | + +## Top 3 Things to Fix + +1. **{ISSUE-NNN}: {title}** — {one-line description} +2. **{ISSUE-NNN}: {title}** — {one-line description} +3. **{ISSUE-NNN}: {title}** — {one-line description} + +## Console Health + +| Error | Count | First seen | +|-------|-------|------------| +| {error message} | {N} | {URL} | + +## Summary + +| Severity | Count | +|----------|-------| +| Critical | 0 | +| High | 0 | +| Medium | 0 | +| Low | 0 | +| **Total** | **0** | + +## Issues + +### ISSUE-001: {Short title} + +| Field | Value | +|-------|-------| +| **Severity** | critical / high / medium / low | +| **Category** | visual / functional / ux / content / performance / console / accessibility | +| **URL** | {page URL} | + +**Description:** {What is wrong, expected vs actual.} + +**Repro Steps:** + +1. Navigate to {URL} + ![Step 1](screenshots/issue-001-step-1.png) +2. {Action} + ![Step 2](screenshots/issue-001-step-2.png) +3. **Observe:** {what goes wrong} + ![Result](screenshots/issue-001-result.png) + +--- + +## Fixes Applied (if applicable) + +| Issue | Fix Status | Commit | Files Changed | +|-------|-----------|--------|---------------| +| ISSUE-NNN | verified / best-effort / reverted / deferred | {SHA} | {files} | + +### Before/After Evidence + +#### ISSUE-NNN: {title} +**Before:** ![Before](screenshots/issue-NNN-before.png) +**After:** ![After](screenshots/issue-NNN-after.png) + +--- + +## Ship Readiness + +| Metric | Value | +|--------|-------| +| Health score | {before} → {after} ({delta}) | +| Issues found | N | +| Fixes applied | N (verified: X, best-effort: Y, reverted: Z) | +| Deferred | N | + +**PR Summary:** "QA found N issues, fixed M, health score X → Y." + +--- + +## Regression (if applicable) + +| Metric | Baseline | Current | Delta | +|--------|----------|---------|-------| +| Health score | {N} | {N} | {+/-N} | +| Issues | {N} | {N} | {+/-N} | + +**Fixed since baseline:** {list} +**New since baseline:** {list} diff --git a/openclaw/skills/retro/SKILL.md b/openclaw/skills/retro/SKILL.md new file mode 100644 index 0000000..8a5094f --- /dev/null +++ b/openclaw/skills/retro/SKILL.md @@ -0,0 +1,232 @@ +--- +name: retro +description: > + Weekly engineering retrospective. Analyzes commit history, work patterns, and code + quality metrics with persistent history and trend tracking. Team-aware: breaks down + per-person contributions with specific praise and growth opportunities. Use when asked + to "retro", "retrospective", "weekly review", "what did we ship", "engineering retro", + or /retro. Supports time windows: /retro 24h, /retro 14d, /retro compare. + Based on gstack by Garry Tan, adapted for OpenClaw. +--- + +# /retro — Weekly Engineering Retrospective + +Generates a comprehensive engineering retrospective analyzing commit history, work patterns, +and code quality metrics. Team-aware: identifies the user running the command, then analyzes +every contributor with per-person praise and growth opportunities. + +## Arguments + +- `/retro` — default: last 7 days +- `/retro 24h` — last 24 hours +- `/retro 14d` — last 14 days +- `/retro 30d` — last 30 days +- `/retro compare` — compare current window vs prior same-length window +- `/retro compare 14d` — compare with explicit window + +## Setup + +Detect the default branch and current user: +```bash +DEFAULT_BRANCH=$(gh repo view --json defaultBranchRef -q .defaultBranchRef.name 2>/dev/null || echo "main") +git fetch origin $DEFAULT_BRANCH --quiet +git config user.name +git config user.email +``` + +The name returned by `git config user.name` is "you" — the person reading this retro. +All other authors are teammates. + +## Step 1: Gather Raw Data + +Run ALL of these git commands (they are independent): + +```bash +# 1. All commits with timestamps, subject, hash, author, stats +git log origin/$DEFAULT_BRANCH --since="" --format="%H|%aN|%ae|%ai|%s" --shortstat + +# 2. Per-commit test vs total LOC breakdown +git log origin/$DEFAULT_BRANCH --since="" --format="COMMIT:%H|%aN" --numstat + +# 3. Commit timestamps for session detection (use local timezone) +git log origin/$DEFAULT_BRANCH --since="" --format="%at|%aN|%ai|%s" | sort -n + +# 4. Hotspot analysis +git log origin/$DEFAULT_BRANCH --since="" --format="" --name-only | grep -v '^$' | sort | uniq -c | sort -rn + +# 5. PR numbers from commit messages +git log origin/$DEFAULT_BRANCH --since="" --format="%s" | grep -oE '#[0-9]+' | sort -n | uniq + +# 6. Per-author commit counts +git shortlog origin/$DEFAULT_BRANCH --since="" -sn --no-merges + +# 7. TODOS.md backlog (if available) +cat TODOS.md 2>/dev/null || true +``` + +## Step 2: Compute Metrics + +Calculate and present in a summary table: + +| Metric | Value | +|--------|-------| +| Commits to main | N | +| Contributors | N | +| PRs merged | N | +| Total insertions | N | +| Total deletions | N | +| Net LOC added | N | +| Test LOC ratio | N% | +| Active days | N | +| Detected sessions | N | + +Then show a per-author leaderboard: +``` +Contributor Commits +/- Top area +You (name) 32 +2400/-300 src/ +alice 12 +800/-150 app/services/ +``` + +## Step 3: Commit Time Distribution + +Show hourly histogram using bar chart: +``` +Hour Commits ████████████████ + 00: 4 ████ + 07: 5 █████ +``` + +Identify peak hours, dead zones, late-night clusters. + +## Step 4: Work Session Detection + +Detect sessions using 45-minute gap threshold between consecutive commits. + +Classify: +- Deep sessions (50+ min) +- Medium sessions (20-50 min) +- Micro sessions (<20 min) + +Calculate total active coding time, average session length, LOC per hour. + +## Step 5: Commit Type Breakdown + +Categorize by conventional commit prefix (feat/fix/refactor/test/chore/docs): +``` +feat: 20 (40%) ████████████████████ +fix: 27 (54%) ███████████████████████████ +``` + +Flag if fix ratio exceeds 50%. + +## Step 6: Hotspot Analysis + +Top 10 most-changed files. Flag files changed 5+ times. + +## Step 7: PR Size Distribution + +Bucket PRs: Small (<100 LOC), Medium (100-500), Large (500-1500), XL (1500+). + +## Step 8: Focus Score + Ship of the Week + +Focus score: percentage of commits touching the single most-changed top-level directory. +Ship of the week: highest-LOC PR in the window. + +## Step 9: Team Member Analysis + +For each contributor: +1. Commits and LOC +2. Areas of focus (top 3 directories) +3. Commit type mix +4. Session patterns +5. Test discipline +6. Biggest ship + +For the current user ("You"): deepest treatment with all detail. + +For each teammate: +- **Praise** (1-2 specific things): Anchor in actual commits. Not "great work" — say exactly what was good. +- **Opportunity for growth** (1 specific thing): Frame as leveling-up, not criticism. Anchor in data. + +If solo repo: skip team breakdown. + +## Step 10: Week-over-Week Trends (if window >= 14d) + +Split into weekly buckets. Show trends for commits, LOC, test ratio, fix ratio. + +## Step 11: Streak Tracking + +```bash +# Team streak +git log origin/$DEFAULT_BRANCH --format="%ad" --date=format:"%Y-%m-%d" | sort -u +# Personal streak +git log origin/$DEFAULT_BRANCH --author="" --format="%ad" --date=format:"%Y-%m-%d" | sort -u +``` + +Count consecutive days with at least 1 commit, going back from today. + +## Step 12: Load History & Compare + +```bash +ls -t .context/retros/*.json 2>/dev/null +``` + +If prior retros exist, load the most recent one and show trends: +``` + Last Now Delta +Test ratio: 22% → 41% ↑19pp +Sessions: 10 → 14 ↑4 +``` + +## Step 13: Save Retro History + +```bash +mkdir -p .context/retros +``` + +Save a JSON snapshot with metrics, authors, version range, streak, and tweetable summary. + +## Step 14: Write the Narrative + +Structure: + +1. **Tweetable summary** (first line): + `Week of Mar 1: 47 commits (3 contributors), 3.2k LOC, 38% tests, 12 PRs, peak: 10pm | Streak: 47d` + +2. **Summary Table** (from Step 2) +3. **Trends vs Last Retro** (from Step 12, skip if first retro) +4. **Time & Session Patterns** (Steps 3-4) +5. **Shipping Velocity** (Steps 5-7) +6. **Code Quality Signals** — test ratio, hotspots, XL PRs +7. **Focus & Highlights** (Step 8) +8. **Your Week** — personal deep-dive +9. **Team Breakdown** — per-teammate sections (skip if solo) +10. **Top 3 Team Wins** +11. **3 Things to Improve** — specific, actionable, anchored in commits +12. **3 Habits for Next Week** — small, practical, <5 min to adopt + +## Compare Mode + +When `/retro compare`: +1. Compute metrics for current window +2. Compute metrics for prior same-length window (using `--since` and `--until`) +3. Show side-by-side comparison with deltas +4. Narrative highlighting biggest improvements and regressions + +## Tone + +- Encouraging but candid, no coddling +- Specific and concrete — always anchor in actual commits +- Skip generic praise — say exactly what was good and why +- Frame improvements as leveling up, not criticism +- Never compare teammates negatively +- Keep total output around 3000-4500 words +- Output directly to the conversation — only file written is `.context/retros/` JSON snapshot + +## Important Rules + +- Use `origin/$DEFAULT_BRANCH` for all git queries (not local main) +- If the window has zero commits, say so and suggest a different window +- Round LOC/hour to nearest 50 +- Treat merge commits as PR boundaries +- On first run (no prior retros), skip comparison sections gracefully diff --git a/openclaw/skills/review/SKILL.md b/openclaw/skills/review/SKILL.md new file mode 100644 index 0000000..7fe6cfa --- /dev/null +++ b/openclaw/skills/review/SKILL.md @@ -0,0 +1,95 @@ +--- +name: review +description: > + Paranoid pre-landing PR review. Analyzes diff against the base branch for SQL safety, + race conditions, LLM trust boundary violations, conditional side effects, and other + structural issues that tests don't catch. Use when asked to "review my PR", "review + this branch", "pre-landing review", "code review", "paranoid review", or /review. + Based on gstack by Garry Tan. +--- + +# Pre-Landing PR Review + +You are running the /review workflow. Analyze the current branch's diff against the base +branch for structural issues that tests don't catch. + +## Step 0: Detect base branch + +```bash +BASE=$(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || \ + gh repo view --json defaultBranchRef -q .defaultBranchRef.name 2>/dev/null || \ + echo "main") +echo "Base branch: $BASE" +CURRENT=$(git branch --show-current 2>/dev/null || echo "unknown") +echo "Current branch: $CURRENT" +``` + +Use the detected base branch in all subsequent commands. + +## Step 1: Check branch + +1. If on the base branch, output: "Nothing to review — you're on the base branch." and stop. +2. Run `git fetch origin $BASE --quiet && git diff origin/$BASE --stat` to check for a diff. If no diff, stop. + +## Step 2: Read the checklist + +Read `references/checklist.md` in this skill's directory. + +If the file cannot be read, STOP and report the error. Do not proceed without the checklist. + +## Step 3: Get the diff + +```bash +git fetch origin $BASE --quiet +git diff origin/$BASE +``` + +This includes both committed and uncommitted changes against the latest base branch. + +## Step 4: Two-pass review + +Apply the checklist against the diff in two passes: + +**Pass 1 (CRITICAL):** SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Enum & Value Completeness + +**Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend + +Enum & Value Completeness requires reading code OUTSIDE the diff. When the diff introduces a new enum value, use `grep` to find all files that reference sibling values, then read those files to check if the new value is handled. + +## Step 5: Output findings + +Always output ALL findings — both critical and informational. + +- If CRITICAL issues found: output all findings, then for EACH critical issue present it to the user with: + - The problem (file:line + description) + - RECOMMENDATION: Choose A because [one-line reason] + - Options: A) Fix it now, B) Acknowledge, C) False positive — skip + After all critical questions are answered, apply fixes for any where user chose A. + +- If only non-critical issues found: output findings. No further action needed. + +- If no issues found: output `Pre-Landing Review: No issues found.` + +## Step 5.5: TODOS cross-reference + +Read `TODOS.md` in the repository root (if it exists). Cross-reference the PR: +- Does this PR close any open TODOs? +- Does this PR create work that should become a TODO? +- Are there related TODOs that provide context? + +If TODOS.md doesn't exist, skip silently. + +## Step 5.6: Documentation staleness check + +Cross-reference the diff against documentation files. For each `.md` file in the repo root: +1. Check if code changes affect features described in that doc file +2. If the doc was NOT updated but the code it describes WAS changed, flag as INFORMATIONAL: + "Documentation may be stale: [file] describes [feature] but code changed in this branch." + +## Important Rules + +- Read the FULL diff before commenting. Do not flag issues already addressed in the diff. +- Read-only by default. Only modify files if the user explicitly chooses "Fix it now." +- Be terse. One line problem, one line fix. No preamble. +- Only flag real problems. Skip anything that's fine. +- Never commit, push, or create PRs. diff --git a/openclaw/skills/review/references/checklist.md b/openclaw/skills/review/references/checklist.md new file mode 100644 index 0000000..6052c33 --- /dev/null +++ b/openclaw/skills/review/references/checklist.md @@ -0,0 +1,132 @@ +# Pre-Landing Review Checklist + +## Instructions + +Review the `git diff origin/main` output for the issues listed below. Be specific — cite `file:line` and suggest fixes. Skip anything that's fine. Only flag real problems. + +**Two-pass review:** +- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. These can block `/ship`. +- **Pass 2 (INFORMATIONAL):** Run all remaining categories. These are included in the PR body but do not block. + +**Output format:** + +``` +Pre-Landing Review: N issues (X critical, Y informational) + +**CRITICAL** (blocking /ship): +- [file:line] Problem description + Fix: suggested fix + +**Issues** (non-blocking): +- [file:line] Problem description + Fix: suggested fix +``` + +If no issues found: `Pre-Landing Review: No issues found.` + +Be terse. For each issue: one line describing the problem, one line with the fix. No preamble, no summaries, no "looks good overall." + +--- + +## Review Categories + +### Pass 1 — CRITICAL + +#### SQL & Data Safety +- String interpolation in SQL (even if values are `.to_i`/`.to_f` — use `sanitize_sql_array` or Arel) +- TOCTOU races: check-then-set patterns that should be atomic `WHERE` + `update_all` +- `update_column`/`update_columns` bypassing validations on fields that have or should have constraints +- N+1 queries: `.includes()` missing for associations used in loops/views (especially avatar, attachments) + +#### Race Conditions & Concurrency +- Read-check-write without uniqueness constraint or `rescue RecordNotUnique; retry` (e.g., `where(hash:).first` then `save!` without handling concurrent insert) +- `find_or_create_by` on columns without unique DB index — concurrent calls can create duplicates +- Status transitions that don't use atomic `WHERE old_status = ? UPDATE SET new_status` — concurrent updates can skip or double-apply transitions +- `html_safe` on user-controlled data (XSS) — check any `.html_safe`, `raw()`, or string interpolation into `html_safe` output + +#### LLM Output Trust Boundary +- LLM-generated values (emails, URLs, names) written to DB or passed to mailers without format validation. Add lightweight guards (`EMAIL_REGEXP`, `URI.parse`, `.strip`) before persisting. +- Structured tool output (arrays, hashes) accepted without type/shape checks before database writes. + +#### Enum & Value Completeness +When the diff introduces a new enum value, status string, tier name, or type constant: +- **Trace it through every consumer.** Read (don't just grep — READ) each file that switches on, filters by, or displays that value. If any consumer doesn't handle the new value, flag it. Common miss: adding a value to the frontend dropdown but the backend model/compute method doesn't persist it. +- **Check allowlists/filter arrays.** Search for arrays or `%w[]` lists containing sibling values (e.g., if adding "revise" to tiers, find every `%w[quick lfg mega]` and verify "revise" is included where needed). +- **Check `case`/`if-elsif` chains.** If existing code branches on the enum, does the new value fall through to a wrong default? +To do this: use Grep to find all references to the sibling values (e.g., grep for "lfg" or "mega" to find all tier consumers). Read each match. This step requires reading code OUTSIDE the diff. + +### Pass 2 — INFORMATIONAL + +#### Conditional Side Effects +- Code paths that branch on a condition but forget to apply a side effect on one branch. Example: item promoted to verified but URL only attached when a secondary condition is true — the other branch promotes without the URL, creating an inconsistent record. +- Log messages that claim an action happened but the action was conditionally skipped. The log should reflect what actually occurred. + +#### Magic Numbers & String Coupling +- Bare numeric literals used in multiple files — should be named constants documented together +- Error message strings used as query filters elsewhere (grep for the string — is anything matching on it?) + +#### Dead Code & Consistency +- Variables assigned but never read +- Version mismatch between PR title and VERSION/CHANGELOG files +- CHANGELOG entries that describe changes inaccurately (e.g., "changed from X to Y" when X never existed) +- Comments/docstrings that describe old behavior after the code changed + +#### LLM Prompt Issues +- 0-indexed lists in prompts (LLMs reliably return 1-indexed) +- Prompt text listing available tools/capabilities that don't match what's actually wired up in the `tool_classes`/`tools` array +- Word/token limits stated in multiple places that could drift + +#### Test Gaps +- Negative-path tests that assert type/status but not the side effects (URL attached? field populated? callback fired?) +- Assertions on string content without checking format (e.g., asserting title present but not URL format) +- `.expects(:something).never` missing when a code path should explicitly NOT call an external service +- Security enforcement features (blocking, rate limiting, auth) without integration tests verifying the enforcement path works end-to-end + +#### Crypto & Entropy +- Truncation of data instead of hashing (last N chars instead of SHA-256) — less entropy, easier collisions +- `rand()` / `Random.rand` for security-sensitive values — use `SecureRandom` instead +- Non-constant-time comparisons (`==`) on secrets or tokens — vulnerable to timing attacks + +#### Time Window Safety +- Date-key lookups that assume "today" covers 24h — report at 8am PT only sees midnight→8am under today's key +- Mismatched time windows between related features — one uses hourly buckets, another uses daily keys for the same data + +#### Type Coercion at Boundaries +- Values crossing Ruby→JSON→JS boundaries where type could change (numeric vs string) — hash/digest inputs must normalize types +- Hash/digest inputs that don't call `.to_s` or equivalent before serialization — `{ cores: 8 }` vs `{ cores: "8" }` produce different hashes + +#### View/Frontend +- Inline `