Skip to content

feat: await support in browse js/eval + contributor mode v2#104

Merged
garrytan merged 9 commits intomainfrom
garrytan/await-contrib-reflect-clean
Mar 16, 2026
Merged

feat: await support in browse js/eval + contributor mode v2#104
garrytan merged 9 commits intomainfrom
garrytan/await-contrib-reflect-clean

Conversation

@garrytan
Copy link
Owner

Summary

  • $B js "await fetch(...)" now works — auto-wrapped in async IIFE context with comment-stripping to avoid false positives
  • Smart eval wrapping: single-line files return values directly via (...), multi-line files use {...} block (explicit return needed)
  • Contributor mode redesigned from passive error detection to active periodic reflection with 0-10 rating scale
  • Reports now include calibration example (historical), "What would make this a 10" field for actionable improvements
  • E2E eval blame protocol: never claim failures are "pre-existing" without running on main first

Pre-Landing Review

No issues found.

Test plan

  • 167 browse command tests pass (including 6 new async wrapping tests)
  • 114 skill validation tests pass (including 40 new contributor mode tests)
  • E2E contributor mode eval passes with new flexible assertions
  • E2E qa-quick eval passes with improved prompt + timeout

🤖 Generated with Claude Code

garrytan and others added 9 commits March 16, 2026 11:17
Auto-wrap await expressions in async IIFE context so
$B js "await fetch(...)" works without SyntaxError.

- hasAwait() strips comments before detection
- js: expression wrapping (async()=>(expr))()
- eval: smart wrapping — single-line=expression, multi-line=block
- 6 new unit tests covering async, false-positive, and return semantics
Replace passive "report when things break" with active reflection:
- Rate gstack experience 0-10 at workflow step boundaries
- Historical calibration example (await bug) anchors the reporting bar
- "What would make this a 10" field focuses on actionable improvements
- Removed category lists in favor of judgment-based assessment
40 new skill-validation tests (4 checks × 10 skills) verify:
- 0-10 rating scale present
- Calibration example present
- "What would make this a 10" field present
- Periodic reflection (not per-command)

Update existing E2E contributor eval for new report format.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor mode:
- Add "do not truncate" directive to template — agent was stopping
  after "My rating" without completing Steps/Raw output/What would
  make this a 10 sections
- Restore assertions for Steps to reproduce and Date footer

QA quick:
- Make test server URL prominent: top of prompt, explicit "already
  running" and "do NOT discover ports" instructions
- Bump session timeout 180s→240s and test timeout 240s→300s
- Set B= at top of prompt (was buried in prose)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agent writes thorough reports with creative section names
("Repro Steps" vs "Steps to reproduce"). Match intent not formatting:
- /repro|steps to reproduce/ for reproduction steps
- /date.*2026/ for date footer presence

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
"Not related to our changes" is an extraordinary claim that requires
extraordinary proof. When evals fail during /ship:

1. Run the same eval on main — prove it fails there too
2. If it passes on main, it IS your change — trace the blame
3. If you can't verify, say "unverified" not "pre-existing"

Added to CLAUDE.md and as a comment in skill-e2e.test.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CONTRIBUTING.md: update contributor mode description — now describes
periodic 0-10 reflection loop instead of passive friction detection.

BROWSER.md: add js/eval async documentation — await expressions are
auto-wrapped in async context, single-line eval returns values directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The base branch detection entries from main were dropped when resolving
the CHANGELOG conflict — should have merged both sets, not replaced.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@garrytan garrytan merged commit 78e519e into main Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant