fix(codex): surface non-zero exits so wrappers stop reading as silent stalls by genisis0x · Pull Request #1467 · garrytan/gstack

genisis0x · 2026-05-13T09:08:08Z

Summary

The `/codex` Review, Challenge, Consult, and Consult-resume wrappers all inspect `_CODEX_EXIT` for the 124 timeout sentinel only. Any other non-zero exit — parse error from a Codex CLI breaking arg change (exit 2), auth failure that didn't match the existing auth-text regex, network drop returning a partial frame — gets swallowed. The calling agent sees empty stdout, indistinguishable from a model API stall, and proceeds to misdiagnose for 30-60min before someone checks `$TMPERR` by hand.

The reporter's CLAUDE.md #54f from claude-teams-bot makes the same observation across ~5 hits in one week: "Don't trust silent-stall framing... it's exit non-zero with empty stdout, swallowed by a wrapper that only handles the timeout exit code."

Approach

Add an `elif != "0"` branch to each of the four wrappers (Review, Challenge, Consult new-session, Consult-resume). On a non-zero non-timeout exit, the wrapper now:

Prints `[codex exit ] ` so the calling agent has an immediate, structured diagnostic line.


Prints the first 20 lines of `$TMPERR` indented two spaces so the full parse error / arg-shape rejection is right there in the conversation, not hidden in a temp file.
Logs `codex_nonzero_exit::<exit_code>` via the existing telemetry helper so this becomes queryable, distinct from `codex_timeout`.


Behavior on success (exit 0) and timeout (exit 124) is unchanged — the existing branches still fire first. Auth-error detection in the Challenge wrapper (`grep -qiE "auth|login|unauthorized" "$TMPERR"`) also still fires unchanged; the new branch sits between the timeout branch and the auth branch and doesn't shadow either.
Defensive, not curative
This is the layer the reporter explicitly framed as separate from #1209's curative fix for the specific parse-error trigger: even after #1209 lands, future Codex CLI arg-shape breaks will keep masquerading as silent stalls until/unless someone manually inspects the captured stderr. Both fixes are complementary; this one is the cheaper of the two and self-corrects on the next regression.
Files

`codex/SKILL.md.tmpl` — canonical source. Four `elif != "0"` branches added next to the existing `if = "124"` checks (Review @ 178, Challenge @ 336, Consult @ 489, Consult-resume @ 517).
`codex/SKILL.md` — generated mirror, same four edits applied so the PR is self-consistent without requiring the reviewer to run `bun run gen:skill-docs` to inspect the output.

Both files are kept in sync per the `SKILL.md workflow` section of CLAUDE.md.
Validation

`bash -n` on the embedded blocks — clean (the wrappers are markdown code fences but each block is a self-contained bash snippet that runs fine when extracted).
No `bun test` runner available locally; the change is additive (new elif branch only; no existing branch touched) so the existing skill-validation suite should remain green.

Fixes #1327




^{Need help on this PR? Tag @codesmith with what you need.}

 Let Codesmith autofix CI failures and bot reviews

… stalls Fixes garrytan#1327. The /codex Review, Challenge, Consult, and Consult-resume wrappers all inspect _CODEX_EXIT for the 124 timeout sentinel only. Any other non-zero exit — parse error from a Codex CLI breaking arg change (exit 2), auth failure that didn't match the existing auth-text regex, network drop returning a partial frame — gets swallowed. The calling agent sees empty stdout, indistinguishable from a model API stall, and proceeds to misdiagnose for 30-60min before someone checks $TMPERR by hand. The reporter's CLAUDE.md #54f from claude-teams-bot makes the same observation across ~5 hits in one week: "Don't trust silent-stall framing... it's exit non-zero with empty stdout, swallowed by a wrapper that only handles the timeout exit code." Add an `elif != "0"` branch to each of the four wrappers (Review, Challenge, Consult new-session, Consult-resume). On a non-zero non-timeout exit, the wrapper now: - Prints `[codex exit <code>] <first stderr line>` so the calling agent has an immediate, structured diagnostic line. - Prints the first 20 lines of $TMPERR indented two spaces so the full parse error / arg-shape rejection is right there in the conversation, not hidden in a temp file. - Logs `codex_nonzero_exit:<mode>:<exit_code>` via the existing telemetry helper so this becomes queryable, distinct from `codex_timeout`. Behavior on success (exit 0) and timeout (exit 124) is unchanged — the existing branches still fire first. Auth-error detection in the Challenge wrapper (`grep -qiE "auth|login|unauthorized" "$TMPERR"`) also still fires unchanged; the new branch sits between the timeout branch and the auth branch and doesn't shadow either. Defensive — the canonical fix for the original parse-error trigger (garrytan#1196 / PR garrytan#1209) addresses the specific arg-shape break. This change addresses the wrapper's blindness to all non-zero exits, including the next time Codex changes its arg contract. Both fixes are complementary as the reporter notes; this one is the cheaper of the two. Edit applied to SKILL.md.tmpl (canonical source) and codex/SKILL.md (generated) so the PR is self-consistent without requiring the reviewer to run `bun run gen:skill-docs`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(codex): surface non-zero exits so wrappers stop reading as silent stalls#1467

fix(codex): surface non-zero exits so wrappers stop reading as silent stalls#1467
genisis0x wants to merge 1 commit into
garrytan:mainfrom
genisis0x:fix/codex-exit-surface-1327

genisis0x commented May 13, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

genisis0x commented May 13, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach

Defensive, not curative

Files

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

genisis0x commented May 13, 2026 •

edited by blacksmith-sh Bot

Loading