Skip to content

fix(autoname): make Haiku auto-rename reliable and diagnosable#806

Merged
drn merged 2 commits into
masterfrom
argus/Recent-task-renames-aren-t
Jun 24, 2026
Merged

fix(autoname): make Haiku auto-rename reliable and diagnosable#806
drn merged 2 commits into
masterfrom
argus/Recent-task-renames-aren-t

Conversation

@drn

@drn drn commented Jun 24, 2026

Copy link
Copy Markdown
Owner

Recent task auto-renames failed inconsistently, leaving tasks on their
regex slug. Root causes (confirmed against live ~/.argus logs):

  • claude -p writes RUNTIME errors (budget exceeded / usage-rate limit /
    overload) to STDOUT and exits non-zero with EMPTY stderr. The prior fix
    folded only ExitError.Stderr, so every real failure logged as a bare,
    undiagnosable "exit status 1".
  • The --max-budget-usd 0.01 cap was tuned to a stale ~$0.0002/call
    estimate; a live call measures ~1235 input + 111 output tokens
    (
    $0.0034), leaving only ~3x headroom.

Changes (internal/llm/namegen.go):

  • wrapRunError folds stdout first (the runtime-error channel), then
    stderr when both present, scrubbing control chars so an untrusted
    reason cannot forge a second log line (log-injection defense).
  • --max-budget-usd 0.01 -> 0.05 (~15x measured cost); comments corrected.
  • Retry the CLI once on a non-zero exit; validation failures terminal;
    keeps the first (richest) reason on exhaustion. DefaultTimeout 30s -> 45s.

Routed through openspec (auto-naming), archived in-PR. Adds 7 tests;
gotchas/misc.md updated.

Co-Authored-By: Claude noreply@anthropic.com

drn and others added 2 commits June 24, 2026 14:22
Recent Haiku auto-renames failed inconsistently, leaving tasks on their
regex slug. Root causes (both confirmed against live ~/.argus logs):

- claude -p writes RUNTIME errors (budget exceeded / usage-rate limit /
  overload) to STDOUT and exits non-zero with an EMPTY stderr. The prior
  fix folded only ExitError.Stderr, so every real failure logged as a
  bare "exit status 1" — undiagnosable.
- The --max-budget-usd 0.01 cap was tuned to a stale "~$0.0002/call"
  estimate; a live call measures ~1235 input + ~111 output tokens
  (~$0.0034), leaving only ~3x headroom. A longer pasted prompt crossed
  it → "Error: Exceeded USD budget" → exit 1 → slug kept.

Fixes in internal/llm/namegen.go:
- wrapRunError folds stdout FIRST (the runtime-error channel), then
  stderr, so failures are diagnosable.
- Raise --max-budget-usd 0.01 → 0.05 (~15x measured cost); correct the
  stale cost comments.
- Retry the CLI once with short backoff on a non-zero exit (transient
  overload/limit); validation failures are NOT retried.
- Raise DefaultTimeout 30s → 45s (a signal: killed was seen at 30s).

Adds 4 tests (stdout-fold regression, retry-then-succeed,
retry-exhausted, budget-flag pin). Routed through openspec
(auto-naming) and archived in-PR; gotchas/misc.md updated.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…lassify

Address code-review findings on the retry/diagnostics path:

- Log-injection (WARNING): the folded claude stdout reason is untrusted
  (may echo prompt text) and flowed unescaped into uxlog (%s, newline-
  terminated) and slog's TextHandler (unquoted). A newline could forge a
  second physical log line (a fake "[autoname] renamed" record). New
  scrubReason maps CR/LF/tab/ANSI-ESC/control runes to spaces before
  folding.
- wrapRunError now folds BOTH stdout and stderr when both are present
  (was: stdout-only, dropping a concurrent flag-parse stderr).
- Retry exhaustion kept the LAST attempt's error while the comment
  claimed "first" — a rich attempt-0 reason (budget) was lost when a
  retry died bare. Now keeps firstErr; comment corrected; shared-deadline
  behavior documented honestly.
- generateNameOnce returns (name, retryable bool, err) — clearer than the
  implicit-mutual-exclusion three-error-return.
- Dedup double string(out) conversion; tests use testutil.Error.

Adds tests: control-char scrub (log-injection guard), keep-first-reason
on exhaustion, scrubReason table.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@drn drn merged commit dfb0f06 into master Jun 24, 2026
@github-actions

Copy link
Copy Markdown

Merging this branch will not change overall coverage

Impacted Packages Coverage Δ 🤖
github.com/drn/argus/internal/llm 94.12% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/drn/argus/internal/llm/namegen.go 94.12% (ø) 68 64 4

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/drn/argus/internal/llm/namegen_test.go

@drn drn deleted the argus/Recent-task-renames-aren-t branch June 25, 2026 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant