Skip to content
12 changes: 12 additions & 0 deletions BROWSER.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,18 @@ The `console`, `network`, and `dialog` commands read from the in-memory buffers,

Dialogs (alert, confirm, prompt) are auto-accepted by default to prevent browser lockup. The `dialog-accept` and `dialog-dismiss` commands control this behavior. For prompts, `dialog-accept <text>` provides the response text. All dialogs are logged to the dialog buffer with type, message, and action taken.

### JavaScript execution (`js` and `eval`)

`js` runs a single expression, `eval` runs a JS file. Both support `await` β€” expressions containing `await` are automatically wrapped in an async context:

```bash
$B js "await fetch('/api/data').then(r => r.json())" # works
$B js "document.title" # also works (no wrapping needed)
$B eval my-script.js # file with await works too
```

For `eval` files, single-line files return the expression value directly. Multi-line files need explicit `return` when using `await`. Comments containing "await" don't trigger wrapping.

### Multi-workspace support

Each workspace gets its own isolated browser instance with its own Chromium process, tabs, cookies, and logs. State is stored in `.gstack/` inside the project root (detected via `git rev-parse --show-toplevel`).
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,19 @@

## 0.4.2 β€” 2026-03-16

- **`$B js "await fetch(...)"` now just works.** Any `await` expression in `$B js` or `$B eval` is automatically wrapped in an async context. No more `SyntaxError: await is only valid in async functions`. Single-line eval files return values directly; multi-line files use explicit `return`.
- **Contributor mode now reflects, not just reacts.** Instead of only filing reports when something breaks, contributor mode now prompts periodic reflection: "Rate your gstack experience 0-10. Not a 10? Think about why." Catches quality-of-life issues and friction that passive detection misses. Reports now include a 0-10 rating and "What would make this a 10" to focus on actionable improvements.
- **Skills now respect your branch target.** `/ship`, `/review`, `/qa`, and `/plan-ceo-review` detect which branch your PR actually targets instead of assuming `main`. Stacked branches, Conductor workspaces targeting feature branches, and repos using `master` all just work now.
- **`/retro` works on any default branch.** Repos using `master`, `develop`, or other default branch names are detected automatically β€” no more empty retros because the branch name was wrong.
- **New `{{BASE_BRANCH_DETECT}}` placeholder** for skill authors β€” drop it into any template and get 3-step branch detection (PR base β†’ repo default β†’ fallback) for free.
- **3 new E2E smoke tests** validate base branch detection works end-to-end across ship, review, and retro skills.

### For contributors

- Added `hasAwait()` helper with comment-stripping to avoid false positives on `// await` in eval files.
- Smart eval wrapping: single-line β†’ expression `(...)`, multi-line β†’ block `{...}` with explicit `return`.
- 6 new async wrapping unit tests, 40 new contributor mode preamble validation tests.
- Calibration example framed as historical ("used to fail") to avoid implying a live bug post-fix.
- Added "Writing SKILL templates" section to CLAUDE.md β€” rules for natural language over bash-isms, dynamic branch detection, self-contained code blocks.
- Hardcoded-main regression test scans all `.tmpl` files for git commands with hardcoded `main`.
- QA template cleaned up: removed `REPORT_DIR` shell variable, simplified port detection to prose.
Expand Down
15 changes: 15 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,21 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
- No jargon: say "every question now tells you which project and branch you're in" not
"AskUserQuestion format standardized across skill templates via preamble resolver."

## E2E eval failure blame protocol

When an E2E eval fails during `/ship` or any other workflow, **never claim "not
related to our changes" without proving it.** These systems have invisible couplings β€”
a preamble text change affects agent behavior, a new helper changes timing, a
regenerated SKILL.md shifts prompt context.

**Required before attributing a failure to "pre-existing":**
1. Run the same eval on main (or base branch) and show it fails there too
2. If it passes on main but fails on the branch β€” it IS your change. Trace the blame.
3. If you can't run on main, say "unverified β€” may or may not be related" and flag it
as a risk in the PR body

"Pre-existing" without receipts is a lazy claim. Prove it or don't say it.

## Deploying to the active skill

The active skill lives at `~/.claude/skills/gstack/`. After making changes:
Expand Down
10 changes: 6 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,11 @@ bin/dev-teardown # deactivate β€” back to your global install

## Contributor mode

Contributor mode is for people who want to fix gstack when it annoys them. Enable it
and Claude Code will automatically log issues to `~/.gstack/contributor-logs/` as you
work β€” what you were doing, what went wrong, repro steps, raw output.
Contributor mode turns gstack into a self-improving tool. Enable it and Claude Code
will periodically reflect on its gstack experience β€” rating it 0-10 at the end of
each major workflow step. When something isn't a 10, it thinks about why and files
a report to `~/.gstack/contributor-logs/` with what happened, repro steps, and what
would make it better.

```bash
~/.claude/skills/gstack/bin/gstack-config set gstack_contributor true
Expand All @@ -36,7 +38,7 @@ the issue, fix it, and open a PR.

### The contributor workflow

1. **Hit friction while using gstack** β€” contributor mode logs it automatically
1. **Use gstack normally** β€” contributor mode reflects and logs issues automatically
2. **Check your logs:** `ls ~/.gstack/contributor-logs/`
3. **Fork and clone gstack** (if you haven't already)
4. **Symlink your fork into the project where you hit the bug:**
Expand Down
24 changes: 15 additions & 9 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli

## Contributor Mode

If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened."
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.

**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance β€” even minor stuff.
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown β€” file a field report. Maybe our contributor will help make us better!

**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure:
**Calibration β€” this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it β€” that's the kind of thing worth filing. Things less consequential than this, ignore.

**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.

**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate β€” include every section through the Date/Version footer):

```
# {Title}
Expand All @@ -58,20 +61,23 @@ Hey gstack team β€” ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker}
**My rating:** {0-10} β€” {one sentence on why it wasn't a 10}
## Steps to reproduce
1. {step}
## Raw output
(wrap any error messages or unexpected output in a markdown code block)
```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
```

Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md`

Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue β€” don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue β€” don't stop the workflow. Tell user: "Filed gstack field report: {title}"

# gstack browse: QA Testing & Dogfooding

Expand Down
24 changes: 15 additions & 9 deletions browse/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli

## Contributor Mode

If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened."
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.

**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance β€” even minor stuff.
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown β€” file a field report. Maybe our contributor will help make us better!

**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure:
**Calibration β€” this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it β€” that's the kind of thing worth filing. Things less consequential than this, ignore.

**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.

**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate β€” include every section through the Date/Version footer):

```
# {Title}
Expand All @@ -58,20 +61,23 @@ Hey gstack team β€” ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker}
**My rating:** {0-10} β€” {one sentence on why it wasn't a 10}
## Steps to reproduce
1. {step}
## Raw output
(wrap any error messages or unexpected output in a markdown code block)
```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
```

Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md`

Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue β€” don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue β€” don't stop the workflow. Tell user: "Filed gstack field report: {title}"

# browse: QA Testing & Dogfooding

Expand Down
16 changes: 15 additions & 1 deletion browse/src/read-commands.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ import type { Page } from 'playwright';
import * as fs from 'fs';
import * as path from 'path';

/** Detect await keyword, ignoring comments. Accepted risk: await in string literals triggers wrapping (harmless). */
function hasAwait(code: string): boolean {
const stripped = code.replace(/\/\/.*$/gm, '').replace(/\/\*[\s\S]*?\*\//g, '');
return /\bawait\b/.test(stripped);
}

// Security: Path validation to prevent path traversal attacks
const SAFE_DIRECTORIES = ['/tmp', process.cwd()];

Expand Down Expand Up @@ -118,7 +124,8 @@ export async function handleReadCommand(
case 'js': {
const expr = args[0];
if (!expr) throw new Error('Usage: browse js <expression>');
const result = await page.evaluate(expr);
const wrapped = hasAwait(expr) ? `(async()=>(${expr}))()` : expr;
const result = await page.evaluate(wrapped);
return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
}

Expand All @@ -128,6 +135,13 @@ export async function handleReadCommand(
validateReadPath(filePath);
if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`);
const code = fs.readFileSync(filePath, 'utf-8');
if (hasAwait(code)) {
const trimmed = code.trim();
const isSingleExpr = trimmed.split('\n').length === 1;
const wrapped = isSingleExpr ? `(async()=>(${trimmed}))()` : `(async()=>{\n${code}\n})()`;
const result = await page.evaluate(wrapped);
return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
}
const result = await page.evaluate(code);
return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
}
Expand Down
54 changes: 54 additions & 0 deletions browse/test/commands.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,60 @@ describe('Inspection', () => {
expect(obj.b).toBe(2);
});

test('js supports await expressions', async () => {
const result = await handleReadCommand('js', ['await Promise.resolve(42)'], bm);
expect(result).toBe('42');
});

test('js does not false-positive on await substring', async () => {
const result = await handleReadCommand('js', ['(() => { const awaitable = 5; return awaitable })()'], bm);
expect(result).toBe('5');
});

test('eval supports await in single-line file', async () => {
const tmp = '/tmp/eval-await-test.js';
fs.writeFileSync(tmp, 'await Promise.resolve("hello from eval")');
try {
const result = await handleReadCommand('eval', [tmp], bm);
expect(result).toBe('hello from eval');
} finally {
fs.unlinkSync(tmp);
}
});

test('eval does not wrap when await is only in a comment', async () => {
const tmp = '/tmp/eval-comment-test.js';
fs.writeFileSync(tmp, '// no need to await this\ndocument.title');
try {
const result = await handleReadCommand('eval', [tmp], bm);
expect(result).toBe('Test Page - Basic');
} finally {
fs.unlinkSync(tmp);
}
});

test('eval multi-line with await and explicit return', async () => {
const tmp = '/tmp/eval-multiline-await.js';
fs.writeFileSync(tmp, 'const data = await Promise.resolve("multi");\nreturn data;');
try {
const result = await handleReadCommand('eval', [tmp], bm);
expect(result).toBe('multi');
} finally {
fs.unlinkSync(tmp);
}
});

test('eval multi-line with await but no return gives empty string', async () => {
const tmp = '/tmp/eval-multiline-no-return.js';
fs.writeFileSync(tmp, 'const data = await Promise.resolve("lost");\ndata;');
try {
const result = await handleReadCommand('eval', [tmp], bm);
expect(result).toBe('');
} finally {
fs.unlinkSync(tmp);
}
});

test('css returns computed property', async () => {
const result = await handleReadCommand('css', ['h1', 'color'], bm);
// Navy color
Expand Down
24 changes: 15 additions & 9 deletions plan-ceo-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli

## Contributor Mode

If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened."
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.

**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance β€” even minor stuff.
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown β€” file a field report. Maybe our contributor will help make us better!

**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure:
**Calibration β€” this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it β€” that's the kind of thing worth filing. Things less consequential than this, ignore.

**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.

**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate β€” include every section through the Date/Version footer):

```
# {Title}
Expand All @@ -58,20 +61,23 @@ Hey gstack team β€” ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker}
**My rating:** {0-10} β€” {one sentence on why it wasn't a 10}
## Steps to reproduce
1. {step}
## Raw output
(wrap any error messages or unexpected output in a markdown code block)
```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
```

Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md`

Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue β€” don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue β€” don't stop the workflow. Tell user: "Filed gstack field report: {title}"

## Step 0: Detect base branch

Expand Down
Loading