diff --git a/BROWSER.md b/BROWSER.md index 2d828eb..df4a6d1 100644 --- a/BROWSER.md +++ b/BROWSER.md @@ -127,6 +127,18 @@ The `console`, `network`, and `dialog` commands read from the in-memory buffers, Dialogs (alert, confirm, prompt) are auto-accepted by default to prevent browser lockup. The `dialog-accept` and `dialog-dismiss` commands control this behavior. For prompts, `dialog-accept ` provides the response text. All dialogs are logged to the dialog buffer with type, message, and action taken. +### JavaScript execution (`js` and `eval`) + +`js` runs a single expression, `eval` runs a JS file. Both support `await` — expressions containing `await` are automatically wrapped in an async context: + +```bash +$B js "await fetch('/api/data').then(r => r.json())" # works +$B js "document.title" # also works (no wrapping needed) +$B eval my-script.js # file with await works too +``` + +For `eval` files, single-line files return the expression value directly. Multi-line files need explicit `return` when using `await`. Comments containing "await" don't trigger wrapping. + ### Multi-workspace support Each workspace gets its own isolated browser instance with its own Chromium process, tabs, cookies, and logs. State is stored in `.gstack/` inside the project root (detected via `git rev-parse --show-toplevel`). diff --git a/CHANGELOG.md b/CHANGELOG.md index 3d67a91..9b4e93f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,8 @@ ## 0.4.2 — 2026-03-16 +- **`$B js "await fetch(...)"` now just works.** Any `await` expression in `$B js` or `$B eval` is automatically wrapped in an async context. No more `SyntaxError: await is only valid in async functions`. Single-line eval files return values directly; multi-line files use explicit `return`. +- **Contributor mode now reflects, not just reacts.** Instead of only filing reports when something breaks, contributor mode now prompts periodic reflection: "Rate your gstack experience 0-10. Not a 10? Think about why." Catches quality-of-life issues and friction that passive detection misses. Reports now include a 0-10 rating and "What would make this a 10" to focus on actionable improvements. - **Skills now respect your branch target.** `/ship`, `/review`, `/qa`, and `/plan-ceo-review` detect which branch your PR actually targets instead of assuming `main`. Stacked branches, Conductor workspaces targeting feature branches, and repos using `master` all just work now. - **`/retro` works on any default branch.** Repos using `master`, `develop`, or other default branch names are detected automatically — no more empty retros because the branch name was wrong. - **New `{{BASE_BRANCH_DETECT}}` placeholder** for skill authors — drop it into any template and get 3-step branch detection (PR base → repo default → fallback) for free. @@ -9,6 +11,10 @@ ### For contributors +- Added `hasAwait()` helper with comment-stripping to avoid false positives on `// await` in eval files. +- Smart eval wrapping: single-line → expression `(...)`, multi-line → block `{...}` with explicit `return`. +- 6 new async wrapping unit tests, 40 new contributor mode preamble validation tests. +- Calibration example framed as historical ("used to fail") to avoid implying a live bug post-fix. - Added "Writing SKILL templates" section to CLAUDE.md — rules for natural language over bash-isms, dynamic branch detection, self-contained code blocks. - Hardcoded-main regression test scans all `.tmpl` files for git commands with hardcoded `main`. - QA template cleaned up: removed `REPORT_DIR` shell variable, simplified port detection to prose. diff --git a/CLAUDE.md b/CLAUDE.md index bc21f60..6f12dea 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -118,6 +118,21 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n - No jargon: say "every question now tells you which project and branch you're in" not "AskUserQuestion format standardized across skill templates via preamble resolver." +## E2E eval failure blame protocol + +When an E2E eval fails during `/ship` or any other workflow, **never claim "not +related to our changes" without proving it.** These systems have invisible couplings — +a preamble text change affects agent behavior, a new helper changes timing, a +regenerated SKILL.md shifts prompt context. + +**Required before attributing a failure to "pre-existing":** +1. Run the same eval on main (or base branch) and show it fails there too +2. If it passes on main but fails on the branch — it IS your change. Trace the blame. +3. If you can't run on main, say "unverified — may or may not be related" and flag it + as a risk in the PR body + +"Pre-existing" without receipts is a lazy claim. Prove it or don't say it. + ## Deploying to the active skill The active skill lives at `~/.claude/skills/gstack/`. After making changes: diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b06b837..4af2e88 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -22,9 +22,11 @@ bin/dev-teardown # deactivate — back to your global install ## Contributor mode -Contributor mode is for people who want to fix gstack when it annoys them. Enable it -and Claude Code will automatically log issues to `~/.gstack/contributor-logs/` as you -work — what you were doing, what went wrong, repro steps, raw output. +Contributor mode turns gstack into a self-improving tool. Enable it and Claude Code +will periodically reflect on its gstack experience — rating it 0-10 at the end of +each major workflow step. When something isn't a 10, it thinks about why and files +a report to `~/.gstack/contributor-logs/` with what happened, repro steps, and what +would make it better. ```bash ~/.claude/skills/gstack/bin/gstack-config set gstack_contributor true @@ -36,7 +38,7 @@ the issue, fix it, and open a PR. ### The contributor workflow -1. **Hit friction while using gstack** — contributor mode logs it automatically +1. **Use gstack normally** — contributor mode reflects and logs issues automatically 2. **Check your logs:** `ls ~/.gstack/contributor-logs/` 3. **Fork and clone gstack** (if you haven't already) 4. **Symlink your fork into the project where you hit the bug:** diff --git a/SKILL.md b/SKILL.md index b362e82..2239a91 100644 --- a/SKILL.md +++ b/SKILL.md @@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli ## Contributor Mode -If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. -**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. -**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! -**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): ``` # {Title} @@ -58,20 +61,23 @@ Hey gstack team — ran into this while using /{skill-name}: **What I was trying to do:** {what the user/agent was attempting} **What happened instead:** {what actually happened} -**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} ## Steps to reproduce 1. {step} ## Raw output -(wrap any error messages or unexpected output in a markdown code block) +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} ``` -Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` - -Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" # gstack browse: QA Testing & Dogfooding diff --git a/browse/SKILL.md b/browse/SKILL.md index 28e976d..c0d7a4e 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli ## Contributor Mode -If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. -**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. -**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! -**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): ``` # {Title} @@ -58,20 +61,23 @@ Hey gstack team — ran into this while using /{skill-name}: **What I was trying to do:** {what the user/agent was attempting} **What happened instead:** {what actually happened} -**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} ## Steps to reproduce 1. {step} ## Raw output -(wrap any error messages or unexpected output in a markdown code block) +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} ``` -Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` - -Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" # browse: QA Testing & Dogfooding diff --git a/browse/src/read-commands.ts b/browse/src/read-commands.ts index 53efec8..a7d7635 100644 --- a/browse/src/read-commands.ts +++ b/browse/src/read-commands.ts @@ -11,6 +11,12 @@ import type { Page } from 'playwright'; import * as fs from 'fs'; import * as path from 'path'; +/** Detect await keyword, ignoring comments. Accepted risk: await in string literals triggers wrapping (harmless). */ +function hasAwait(code: string): boolean { + const stripped = code.replace(/\/\/.*$/gm, '').replace(/\/\*[\s\S]*?\*\//g, ''); + return /\bawait\b/.test(stripped); +} + // Security: Path validation to prevent path traversal attacks const SAFE_DIRECTORIES = ['/tmp', process.cwd()]; @@ -118,7 +124,8 @@ export async function handleReadCommand( case 'js': { const expr = args[0]; if (!expr) throw new Error('Usage: browse js '); - const result = await page.evaluate(expr); + const wrapped = hasAwait(expr) ? `(async()=>(${expr}))()` : expr; + const result = await page.evaluate(wrapped); return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? ''); } @@ -128,6 +135,13 @@ export async function handleReadCommand( validateReadPath(filePath); if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`); const code = fs.readFileSync(filePath, 'utf-8'); + if (hasAwait(code)) { + const trimmed = code.trim(); + const isSingleExpr = trimmed.split('\n').length === 1; + const wrapped = isSingleExpr ? `(async()=>(${trimmed}))()` : `(async()=>{\n${code}\n})()`; + const result = await page.evaluate(wrapped); + return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? ''); + } const result = await page.evaluate(code); return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? ''); } diff --git a/browse/test/commands.test.ts b/browse/test/commands.test.ts index a3e201d..d8aaeab 100644 --- a/browse/test/commands.test.ts +++ b/browse/test/commands.test.ts @@ -144,6 +144,60 @@ describe('Inspection', () => { expect(obj.b).toBe(2); }); + test('js supports await expressions', async () => { + const result = await handleReadCommand('js', ['await Promise.resolve(42)'], bm); + expect(result).toBe('42'); + }); + + test('js does not false-positive on await substring', async () => { + const result = await handleReadCommand('js', ['(() => { const awaitable = 5; return awaitable })()'], bm); + expect(result).toBe('5'); + }); + + test('eval supports await in single-line file', async () => { + const tmp = '/tmp/eval-await-test.js'; + fs.writeFileSync(tmp, 'await Promise.resolve("hello from eval")'); + try { + const result = await handleReadCommand('eval', [tmp], bm); + expect(result).toBe('hello from eval'); + } finally { + fs.unlinkSync(tmp); + } + }); + + test('eval does not wrap when await is only in a comment', async () => { + const tmp = '/tmp/eval-comment-test.js'; + fs.writeFileSync(tmp, '// no need to await this\ndocument.title'); + try { + const result = await handleReadCommand('eval', [tmp], bm); + expect(result).toBe('Test Page - Basic'); + } finally { + fs.unlinkSync(tmp); + } + }); + + test('eval multi-line with await and explicit return', async () => { + const tmp = '/tmp/eval-multiline-await.js'; + fs.writeFileSync(tmp, 'const data = await Promise.resolve("multi");\nreturn data;'); + try { + const result = await handleReadCommand('eval', [tmp], bm); + expect(result).toBe('multi'); + } finally { + fs.unlinkSync(tmp); + } + }); + + test('eval multi-line with await but no return gives empty string', async () => { + const tmp = '/tmp/eval-multiline-no-return.js'; + fs.writeFileSync(tmp, 'const data = await Promise.resolve("lost");\ndata;'); + try { + const result = await handleReadCommand('eval', [tmp], bm); + expect(result).toBe(''); + } finally { + fs.unlinkSync(tmp); + } + }); + test('css returns computed property', async () => { const result = await handleReadCommand('css', ['h1', 'color'], bm); // Navy color diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index 77ca643..0783099 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli ## Contributor Mode -If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. -**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. -**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! -**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): ``` # {Title} @@ -58,20 +61,23 @@ Hey gstack team — ran into this while using /{skill-name}: **What I was trying to do:** {what the user/agent was attempting} **What happened instead:** {what actually happened} -**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} ## Steps to reproduce 1. {step} ## Raw output -(wrap any error messages or unexpected output in a markdown code block) +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} ``` -Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` - -Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" ## Step 0: Detect base branch diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index 819ef07..ad2baca 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli ## Contributor Mode -If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. -**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. -**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! -**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): ``` # {Title} @@ -58,20 +61,23 @@ Hey gstack team — ran into this while using /{skill-name}: **What I was trying to do:** {what the user/agent was attempting} **What happened instead:** {what actually happened} -**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} ## Steps to reproduce 1. {step} ## Raw output -(wrap any error messages or unexpected output in a markdown code block) +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} ``` -Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` - -Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" # Plan Review Mode diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md index 438b782..27d939b 100644 --- a/qa-only/SKILL.md +++ b/qa-only/SKILL.md @@ -43,12 +43,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli ## Contributor Mode -If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. -**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. -**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! -**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): ``` # {Title} @@ -57,20 +60,23 @@ Hey gstack team — ran into this while using /{skill-name}: **What I was trying to do:** {what the user/agent was attempting} **What happened instead:** {what actually happened} -**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} ## Steps to reproduce 1. {step} ## Raw output -(wrap any error messages or unexpected output in a markdown code block) +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} ``` -Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` - -Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" # /qa-only: Report-Only QA Testing diff --git a/qa/SKILL.md b/qa/SKILL.md index 5ea4643..938bf10 100644 --- a/qa/SKILL.md +++ b/qa/SKILL.md @@ -48,12 +48,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli ## Contributor Mode -If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. -**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. -**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! -**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): ``` # {Title} @@ -62,20 +65,23 @@ Hey gstack team — ran into this while using /{skill-name}: **What I was trying to do:** {what the user/agent was attempting} **What happened instead:** {what actually happened} -**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} ## Steps to reproduce 1. {step} ## Raw output -(wrap any error messages or unexpected output in a markdown code block) +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} ``` -Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` - -Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" ## Step 0: Detect base branch diff --git a/retro/SKILL.md b/retro/SKILL.md index ca44f49..39b7ee1 100644 --- a/retro/SKILL.md +++ b/retro/SKILL.md @@ -43,12 +43,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli ## Contributor Mode -If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. -**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. -**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! -**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): ``` # {Title} @@ -57,20 +60,23 @@ Hey gstack team — ran into this while using /{skill-name}: **What I was trying to do:** {what the user/agent was attempting} **What happened instead:** {what actually happened} -**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} ## Steps to reproduce 1. {step} ## Raw output -(wrap any error messages or unexpected output in a markdown code block) +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} ``` -Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` - -Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" ## Detect default branch diff --git a/review/SKILL.md b/review/SKILL.md index 949a0c6..b94f8a3 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli ## Contributor Mode -If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. -**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. -**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! -**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): ``` # {Title} @@ -58,20 +61,23 @@ Hey gstack team — ran into this while using /{skill-name}: **What I was trying to do:** {what the user/agent was attempting} **What happened instead:** {what actually happened} -**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} ## Steps to reproduce 1. {step} ## Raw output -(wrap any error messages or unexpected output in a markdown code block) +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} ``` -Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` - -Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" ## Step 0: Detect base branch diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 9c81e96..f3d93db 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -123,12 +123,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli ## Contributor Mode -If \`_CONTRIB\` is \`true\`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." +If \`_CONTRIB\` is \`true\`: you are in **contributor mode**. You're a gstack user who also helps make it better. -**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. -**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! -**To file:** write \`~/.gstack/contributor-logs/{slug}.md\` with this structure: +**Calibration — this is the bar:** For example, \`$B js "await fetch(...)"\` used to fail with \`SyntaxError: await is only valid in async functions\` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write \`~/.gstack/contributor-logs/{slug}.md\` with **all sections below** (do not truncate — include every section through the Date/Version footer): \`\`\` # {Title} @@ -137,20 +140,23 @@ Hey gstack team — ran into this while using /{skill-name}: **What I was trying to do:** {what the user/agent was attempting} **What happened instead:** {what actually happened} -**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} ## Steps to reproduce 1. {step} ## Raw output -(wrap any error messages or unexpected output in a markdown code block) +\`\`\` +{paste the actual error or unexpected output here} +\`\`\` + +## What would make this a 10 +{one sentence: what gstack should have done differently} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} \`\`\` -Then run: \`mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md\` - -Slug: lowercase, hyphens, max 60 chars (e.g. \`browse-snapshot-ref-gap\`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"`; +Slug: lowercase, hyphens, max 60 chars (e.g. \`browse-js-no-await\`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"`; } function generateBrowseSetup(): string { diff --git a/setup-browser-cookies/SKILL.md b/setup-browser-cookies/SKILL.md index 0623024..d522b27 100644 --- a/setup-browser-cookies/SKILL.md +++ b/setup-browser-cookies/SKILL.md @@ -41,12 +41,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli ## Contributor Mode -If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. -**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. -**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! -**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): ``` # {Title} @@ -55,20 +58,23 @@ Hey gstack team — ran into this while using /{skill-name}: **What I was trying to do:** {what the user/agent was attempting} **What happened instead:** {what actually happened} -**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} ## Steps to reproduce 1. {step} ## Raw output -(wrap any error messages or unexpected output in a markdown code block) +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} ``` -Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` - -Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" # Setup Browser Cookies diff --git a/ship/SKILL.md b/ship/SKILL.md index 47a9da1..7791f4b 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -43,12 +43,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli ## Contributor Mode -If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. -**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. -**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! -**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): ``` # {Title} @@ -57,20 +60,23 @@ Hey gstack team — ran into this while using /{skill-name}: **What I was trying to do:** {what the user/agent was attempting} **What happened instead:** {what actually happened} -**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} ## Steps to reproduce 1. {step} ## Raw output -(wrap any error messages or unexpected output in a markdown code block) +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} ``` -Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` - -Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" ## Step 0: Detect base branch diff --git a/test/skill-e2e.test.ts b/test/skill-e2e.test.ts index 4978ce5..aa50a97 100644 --- a/test/skill-e2e.test.ts +++ b/test/skill-e2e.test.ts @@ -13,6 +13,11 @@ import * as os from 'os'; const ROOT = path.resolve(import.meta.dir, '..'); // Skip unless EVALS=1. Session runner strips CLAUDE* env vars to avoid nested session issues. +// +// BLAME PROTOCOL: When an eval fails, do NOT claim "pre-existing" or "not related +// to our changes" without proof. Run the same eval on main to verify. These tests +// have invisible couplings — preamble text, SKILL.md content, and timing all affect +// agent behavior. See CLAUDE.md "E2E eval failure blame protocol" for details. const evalsEnabled = !!process.env.EVALS; const describeE2E = evalsEnabled ? describe : describe.skip; @@ -322,10 +327,16 @@ File a contributor report about this issue. Then tell me what you filed.`, const logFiles = fs.readdirSync(logsDir).filter(f => f.endsWith('.md')); expect(logFiles.length).toBeGreaterThan(0); + // Verify new reflection-based format const logContent = fs.readFileSync(path.join(logsDir, logFiles[0]), 'utf-8'); expect(logContent).toContain('Hey gstack team'); expect(logContent).toContain('What I was trying to do'); expect(logContent).toContain('What happened instead'); + expect(logContent).toMatch(/rating/i); + // Verify report has repro steps (agent may use "Steps to reproduce", "Repro Steps", etc.) + expect(logContent).toMatch(/repro|steps to reproduce|how to reproduce/i); + // Verify report has date/version footer (agent may format differently) + expect(logContent).toMatch(/date.*2026|2026.*date/i); // Clean up try { fs.rmSync(contribDir, { recursive: true, force: true }); } catch {} @@ -424,16 +435,20 @@ describeE2E('QA skill E2E', () => { test('/qa quick completes without browse errors', async () => { const result = await runSkillTest({ - prompt: `You have a browse binary at ${browseBin}. Assign it to B variable like: B="${browseBin}" + prompt: `B="${browseBin}" + +The test server is already running at: ${testServer.url} +Target page: ${testServer.url}/basic.html Read the file qa/SKILL.md for the QA workflow instructions. Run a Quick-depth QA test on ${testServer.url}/basic.html Do NOT use AskUserQuestion — run Quick tier directly. +Do NOT try to start a server or discover ports — the URL above is ready. Write your report to ${qaDir}/qa-reports/qa-report.md`, workingDirectory: qaDir, maxTurns: 35, - timeout: 180_000, + timeout: 240_000, testName: 'qa-quick', runId, }); @@ -448,7 +463,7 @@ Write your report to ${qaDir}/qa-reports/qa-report.md`, } // Accept error_max_turns — the agent doing thorough QA work is not a failure expect(['success', 'error_max_turns']).toContain(result.exitReason); - }, 240_000); + }, 300_000); }); // --- B5: Review skill E2E --- diff --git a/test/skill-validation.test.ts b/test/skill-validation.test.ts index 2a947b1..3ff5c35 100644 --- a/test/skill-validation.test.ts +++ b/test/skill-validation.test.ts @@ -496,6 +496,44 @@ describe('v0.4.1 preamble features', () => { } }); +// --- Contributor mode preamble structure validation --- + +describe('Contributor mode preamble structure', () => { + const skillsWithPreamble = [ + 'SKILL.md', 'browse/SKILL.md', 'qa/SKILL.md', + 'qa-only/SKILL.md', + 'setup-browser-cookies/SKILL.md', + 'ship/SKILL.md', 'review/SKILL.md', + 'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md', + 'retro/SKILL.md', + ]; + + for (const skill of skillsWithPreamble) { + test(`${skill} has 0-10 rating in contributor mode`, () => { + const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8'); + expect(content).toContain('0 to 10'); + expect(content).toContain('My rating'); + }); + + test(`${skill} has calibration example`, () => { + const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8'); + expect(content).toContain('Calibration'); + expect(content).toContain('the bar'); + }); + + test(`${skill} has "what would make this a 10" field`, () => { + const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8'); + expect(content).toContain('What would make this a 10'); + }); + + test(`${skill} uses periodic reflection (not per-command)`, () => { + const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8'); + expect(content).toContain('workflow step'); + expect(content).not.toContain('After you use gstack-provided CLIs'); + }); + } +}); + describe('Enum & Value Completeness in review checklist', () => { const checklist = fs.readFileSync(path.join(ROOT, 'review', 'checklist.md'), 'utf-8');