Skip to content

v1.34.2.0 fix wave: /codex review on CLI 0.130+, /investigate learnings, /sync-gbrain on Supabase (3 community-reported bugs)#1478

Open
garrytan wants to merge 5 commits into
mainfrom
garrytan/macau-v2
Open

v1.34.2.0 fix wave: /codex review on CLI 0.130+, /investigate learnings, /sync-gbrain on Supabase (3 community-reported bugs)#1478
garrytan wants to merge 5 commits into
mainfrom
garrytan/macau-v2

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented May 13, 2026

Summary

Three filed bugs land together. All three were silent-or-loud regressions on user flows; each ships with a regression test that locks the fix in.

Fixes:

Test infrastructure (0ef4ff80): raised timeouts for two pre-existing slow integration tests (gstack-artifacts-init × 16, gstack-next-version × 1). Both were timing out at the default 5s on developer machines due to subprocess fork+exec overhead; same flake reproduced on origin/main before this branch. Matches the brain-sync.test.ts pattern already in the repo (alias test to a 30s-timeout wrapper).

Triage: GitHub issue #1412 (PM skill proposal) got a polite decline comment explaining the proposed shape is enterprise-PM coordination ceremony that gstack is explicitly trying to compress. Suggested the proposer publish it as a separate skill collection.

Test Coverage

[+] bin/gstack-learnings-log               [★★★ TESTED]  test/learnings.test.ts:106-115 (round-trip + caller-contract)
[+] lib/gstack-memory-helpers.ts           [★★★ TESTED]  test/gstack-memory-helpers.test.ts:316-330 (schema_v2 fallback)
[+] codex/SKILL.md.tmpl Step 2A            [★★★ TESTED]  test/codex-hardening.test.ts:368-426 (template + generated SKILL.md)
[+] test/gstack-artifacts-init.test.ts     [INFRA]       12-line timeout-wrapper, no production code touched
[+] test/gstack-next-version.test.ts       [INFRA]       single-test timeout=30000, no production code touched

COVERAGE: 3/3 production-bug paths tested (100%)  |  QUALITY: ★★★:3

bun test exit 0, all green (full suite, including the previously-timing-out integration tests).

Pre-Landing Review

No structural issues. Diff has no SQL, no LLM trust boundary, no auth surface. Plan already cleared /plan-eng-review (CLEAR) and /codex plan-review (12 findings, all 12 incorporated).

Plan Completion

Plan at ~/.claude/plans/system-instruction-you-are-working-floating-mccarthy.md. All planned items DONE: Fix 1 (codex template, both bare + exec paths), Fix 2 (learnings allowlist), Fix 3 (gbrain detect with full codex-review hardening: execFileSync, GBRAIN_HOME, error logging), 3 regression tests, defensive -c system_prompt probe (confirmed Codex 0.130 doesn't expose the key, fell through to documented bare-path-without-boundary). PM proposal #1412 triaged separately as planned (comment posted).

Verification Results

All three manual smoke-tests passed using isolated temp HOME/GSTACK_HOME/GBRAIN_HOME (the destructive verification steps from the original plan were rewritten to never touch the developer's real state — codex review finding #11).

TODOS

No items completed in this PR. TODOS.md unchanged.

Test plan

  • bun test passes (exit 0, 0 failures)
  • codex review --base main argv-parses without error: the argument '[PROMPT]' cannot be used with '--base <BRANCH>'
  • gstack-learnings-log accepts type:"investigation" and writes to learnings file
  • detectEngineTier() resolves to engine:"supabase" against synthetic ~/.gbrain/config.json with no gbrain on PATH (proves the config.json fallback path)
  • Codex plan-review with all 12 findings incorporated (see ## GSTACK REVIEW REPORT in plan file)

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

garrytan and others added 5 commits May 13, 2026 11:21
The /investigate skill instructed agents to log learnings with type:"investigation",
but bin/gstack-learnings-log:22 rejected anything not in
[pattern, pitfall, preference, architecture, tool, operational]. Every
investigation run exited 1 to stderr and the learning was dropped, silently
to the user.

Fix: add 'investigation' to ALLOWED_TYPES.

Regression test: round-trips a learning with type:"investigation" and asserts
exit 0 + file write; second test reads investigate/SKILL.md.tmpl and asserts
it emits the literal type:"investigation" string, guarding the
template/validator contract at both ends.

Fixes #1423. Reported by diogolealassis.
… doctor exit

freshDetectEngineTier() in lib/gstack-memory-helpers.ts returned engine:
"unknown" for every Supabase user on gbrain ≥0.25. Two stacking bugs:

1. execSync("gbrain doctor --json --fast 2>/dev/null") threw on non-zero
   exit. gbrain doctor exits 1 whenever health_score < 100, which is
   essentially every fresh install due to resolver_health warnings. The
   JSON output never reached the parser.
2. gbrain ≥0.25 shipped schema_version:2 doctor output that dropped the
   top-level 'engine' field entirely.

Result: every /sync-gbrain on Supabase logged 'engine=unknown' and skipped
all sync stages silently.

Fix:
- Replace execSync with execFileSync (no shell, no bash-specific 2>/dev/null
  redirect; portable to Windows).
- Recover stdout from the thrown error object so non-zero exits still parse.
- Fall back to reading gbrain's config.json (respecting GBRAIN_HOME env var,
  defaulting to ~/.gbrain/config.json) when doctor output doesn't surface
  an engine field.
- Add logGbrainError() helper that appends one-line JSONL to
  ~/.gstack/.gbrain-errors.jsonl on parse failure, so future regressions
  leave a forensic trail.

The "supabase" tier here means "remote postgres" in practice — gbrain
config uses engine:"postgres" for both real Supabase and any other
remote postgres (e.g. local-postgres-for-testing). Downstream sync code
treats them identically, so the label compression is intentional and
documented inline.

Regression test: existing detectEngineTier suite now isolates HOME +
GBRAIN_HOME + PATH to temp dirs (closes a flake source where the prior
tests would read whatever was on the reviewer's machine). New test
forces gbrain off PATH, writes a synthetic config.json with
engine:"postgres", asserts detectEngineTier() returns
engine:"supabase".

Fixes #1415. Patch shape contributed by Shiv @shivasymbl (tested on
gstack v1.31.0.0 + gbrain v0.31.3 + Supabase).
Codex CLI 0.130.0 made [PROMPT] and --base <BRANCH> mutually exclusive at
argv level. Step 2A of codex/SKILL.md.tmpl had always passed both (the
filesystem boundary prefix as the prompt argument + the base branch), so
every /codex review call died with:

  error: the argument '[PROMPT]' cannot be used with '--base <BRANCH>'

Fix: split Step 2A into two paths.

Default (no custom user instructions): bare 'codex review --base <base>'.
Codex's review prompt is internally diff-scoped, so the model focuses on
the changes against base. The filesystem boundary prefix is dropped here
because Codex 0.130 has no documented system-prompt config key
(probed -c 'system_prompt="..."' against 0.130 — the flag is silently
accepted but the value isn't applied). Skill files under .claude/ and
agents/ are public, so this is a token-efficiency concern, not a safety
one.

Custom instructions (/codex review <focus>): route through codex exec
with the diff written to a tempfile, inlined into the prompt between
explicit DIFF_START / DIFF_END markers. The boundary is preserved here
because codex exec isn't auto-scoped to the diff. The DIFF_START/END
delimiters tell the model where data ends and instructions resume, which
materially reduces prompt-injection hijack rates when the diff contains
adversarial content.

Note on bash semantics: codex's earlier review flagged the exec route as
"command injection via $_DIFF interpolation." That framing is wrong —
bash parameter expansion does not re-evaluate $(...) or backticks inside
the expanded value, so a diff containing $(rm -rf /) is plain string
data to codex exec. The real risk is prompt injection (model-side, not
shell-side), which the DIFF_START/END pattern mitigates.

Regression tests in test/codex-hardening.test.ts assert across BOTH
codex/SKILL.md.tmpl AND the generated codex/SKILL.md:
1. No 'codex review' invocation line combines a quoted-string OR variable
   positional argument with --base.
2. Step 2A still contains either bare 'codex review --base' OR 'codex
   exec' (guards against accidental deletion of both fix paths).

Fixes #1428. Reported by Stashub.
Two test files were timing out at the default 5s on developer machines,
both pre-existing on origin/main but unrelated to this branch's bug fixes:

- test/gstack-artifacts-init.test.ts: 13 tests spawning real subprocesses
  via fake gh/glab/git shims in PATH. bun's fork+exec overhead pushed
  these past 5s consistently. Added a local test-wrapper that aliases
  test() with a 30s timeout (matches the brain-sync.test.ts pattern
  already in the repo).
- test/gstack-next-version.test.ts: one integration smoke test that
  spawns 'bun run ./bin/gstack-next-version' and parses the resulting
  JSON. The subprocess does a 'gh pr list' against the live GitHub API
  to enumerate claimed version slots. Network latency makes 5s tight;
  raised this single test to 30s.

No production code changed. The tests already passed deterministically
once given enough wall-clock time.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

E2E Evals: ✅ PASS

3/3 tests passed | $1.20 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 1/1 $0.04
e2e-review 1/1 $0.58
e2e-review 1/1 $0.58

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant