Skip to content

v1.31.1.0 fix wave: 3 community PRs (careful BSD sed, codex Step 0 rename, make-pdf setup ordering)#1413

Merged
garrytan merged 5 commits intomainfrom
pr-wave-32
May 10, 2026
Merged

v1.31.1.0 fix wave: 3 community PRs (careful BSD sed, codex Step 0 rename, make-pdf setup ordering)#1413
garrytan merged 5 commits intomainfrom
pr-wave-32

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented May 10, 2026

Summary

Three small, well-scoped community fixes:

  • Cross-platform fixes: /careful works on macOS again (BSD sed compatibility — \s[[:space:]])
  • Skill template hygiene: Codex skill Step 0 renamed to Step 0.4 to avoid collision with the platform-detect prelude
  • /make-pdf setup ordering: SETUP block now runs immediately after the Preamble Bash instead of after the Telemetry footer

5 commits / 8 files / +64/-62 source lines (excluding generated SKILL.md regen).

Wave assembly

Triaged ~140 open community PRs. Final wave is 3 of an initial 14 candidates. Cuts:

Test Coverage

All 3 wave PRs ship with their own regression tests:

Pre-Landing Review

Codex review on the trimmed wave (post-#1171-drop): clean, no regressions.

"I did not find any concrete regressions in the diff against the specified merge base. The behavioral change in check-careful.sh fixes the macOS-safe rm exception path as intended, and the remaining edits are documentation/generator ordering updates without a clear functional break."

The earlier full codex review on the 4-commit wave (with #1171) flagged 1 P1 critical on browse/src/token-registry.ts:164-167:

"If a caller sends a bearer token whose JS string length matches rootToken but whose UTF-8 byte length differs (for example \"é\".repeat(36)), Buffer.from(token) and Buffer.from(rootToken) end up with different sizes and crypto.timingSafeEqual throws instead of returning false. Because isRootToken() is used in both the tunnel 403 path and normal token validation, those requests now error out rather than being cleanly rejected as unauthorized."

This is what drove the #1171 drop. Per D8-B strict cherry-pick rule: contributor PR lands as-is or gets rejected; we don't rewrite contributor work and re-attribute it.

Eval Results

Full paid eval suite (test:evals:all + test:e2e:all, ~$25) ran on the 4-commit collector:

  • 145 pass, 39 skip, 9 fail in ~150 minutes
  • All 9 failures in subsystems disjoint from this wave (autoplan, browser-skills, context-save, office-hours-builder/forcing, sidebar-css/url/navigate, multi-provider gemini benchmark) — almost certainly pre-existing on main, none touch the wave's 8 changed files
  • Codex review pass on the fix(token-registry): constant-time root-token comparison #1171 file is what caught the bug; that's a value moment for the eval suite

Pre-existing test failures (not introduced by this wave)

3 golden-file tests fail on bun test:

  • golden-file regression > Claude ship skill matches golden baseline
  • golden-file regression > Codex ship skill matches golden baseline
  • golden-file regression > Factory ship skill matches golden baseline

These regressed when v1.31.0.0 (#1390) updated the preamble template's plan-mode-handshake prose but didn't regenerate the per-host golden baselines. Verified pre-existing on origin/main. Separate fix needed; not blocking this wave.

TODOS

No TODO items completed in this PR.

Test plan

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

ToraDady and others added 5 commits May 10, 2026 01:58
…acOS

The sed regex in check-careful.sh uses \s+, which is a GNU sed
extension not supported by BSD sed (macOS default). On macOS, this
causes the RM_ARGS strip to fail silently, making rm -rf of safe
exceptions (node_modules, .next, dist, etc.) trigger the destructive
warning instead of being permitted as designed.

Fix: replace \s+ with POSIX [[:space:]]+, which works on both GNU sed
(Linux) and BSD sed (macOS).

The existing test/hook-scripts.test.ts already documented this
limitation via a detectSafeRmWorks() helper and a platform-conditional
assertion ("if GNU sed: expect undefined, else: expect ask"). Now that
the regex works on both platforms, this dead path is removed and the
safe-exception tests assert the same expectation on every OS.

Note: the grep regex in the same file also uses \s+, but BSD grep -E
on macOS does support \s (verified via bash -x trace), so only the
sed expression needs the fix.

Discovered while translating the careful skill for a Japanese
derivative project (uzustack). Reference:
uzumaki-inc/uzustack@bc67c8d
…elude

The codex skill template had its own '## Step 0: Check codex binary'
heading (line 42), which after gen-skill-docs collided with the
platform-detection prelude '## Step 0: Detect platform and base branch'
(injected by scripts/resolvers/utility.ts). The generated codex/SKILL.md
ended up with two H2 headings labeled Step 0, which is ambiguous to an
agent reading the skill in order.

Renamed the local heading to Step 0.4, slotting it between the prelude
(Step 0) and the existing Step 0.5 / Step 0.6 sections. No renumbering
of downstream steps needed.

Closes #1388

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

E2E Evals: ✅ PASS

6/6 tests passed | $1.80 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 1/1 $0.04
e2e-design 1/1 $0.3
e2e-plan 1/1 $0.61
e2e-review 1/1 $0.53
llm-judge 1/1 $0.02
e2e-design 1/1 $0.3

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

@garrytan garrytan merged commit 49cc4ff into main May 10, 2026
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants