feat: retry Iron Loop executor on API overload (529) with configurable backoff by davidbijl · Pull Request #7 · robotijn/ctoc

davidbijl · 2026-05-25T15:10:53Z

Fixes #6.

Summary

Layer 1 — executor agent (agents/iron-loop/iron-loop-executor.md): new API Overload (529) Handling section instructs the executor to distinguish pre-write overloads (safe to retry) from mid-write overloads (human gate required) and write the appropriate .status file.
Layer 2 — state layer (background.js, actions.js, state.js): overload-retry and overload-partial added to the status enum; cleanupStaleInProgress skips overload plans; startAgent resumes an overload-retry plan in-progress instead of picking a new todo plan and blocks with a human-gate error for overload-partial; getAgentStatus surfaces overload states when no lock is held.
Layer 3 — dashboard (menu-screens.js): AGENT section shows ⏳ retry in Xm — <plan> for scheduled retries and ⚠ partial write — review: <plan> for mid-write overloads.
Config (settings.js): new retry category with overloadIntervalSeconds (default 600 s / 10 min).
Tests (tests/overload-retry.test.js): 9 unit tests covering all three layers (icon enum, writeStatus fields, cleanup skip logic, startAgent resume/block paths, dashboard labels, config schema).

Answers to the open questions in #6

Preferred layer for retry logic: the executor agent writes the status (Layer 1) and exits; the state layer drives resume/block on the next startAgent call (Layer 2). No ScheduleWakeup dependency — the operator restarts via the menu when ready, or the executor can call ScheduleWakeup if it's available in its context (the agent instructions mention it as optional).
Step-level resume vs full restart: full restart from the beginning of the current plan. The plan's completed [x] checkboxes are on disk, so the executor can fast-forward past already-done steps. No separate step-marker mechanism is needed for a first pass.
ScheduleWakeup availability: treated as optional in the agent instructions — if available, use it; if not, exit cleanly. The dashboard indicator and the menu's Start Agent button serve as the manual resume path.
Scope: all three layers are included, but the changes are minimal and additive — no existing behaviour is modified except cleanupStaleInProgress (now skips overload plans) and startAgent (now checks in-progress before picking todo).

Test plan

Run node --test tests/overload-retry.test.js — 9 tests, 0 failures
Run node --test tests/*.test.js — existing suite passes (the 1 pre-existing failure in update.test.js is unrelated to this PR and was failing on main before these changes)
Manually: simulate an overload-retry status file in a plan under plans/in-progress/, open /ctoc:menu, confirm the dashboard AGENT section shows ⏳ retry in Xm
Manually: simulate an overload-partial status file, confirm ⚠ partial write — review
Manually: click Start Agent with an overload-retry plan present, confirm the executor resumes that plan rather than picking a new todo plan

🤖 Generated with Claude Code

…e backoff Implements three-layer recovery for HTTP 529 (API overloaded) errors during Iron Loop executor runs, resolving issue robotijn#6. Layer 1 — iron-loop-executor.md: adds explicit instructions for the executor agent to distinguish pre-write (safe to retry) from mid-write (human review required) overload events and write the appropriate status to the plan's .status file. Layer 2 — state layer: - background.js: adds overload-retry and overload-partial to the status enum, preserves retry_at timestamp in writeStatus, adds markOverloadRetry() and markOverloadPartial() helpers. - actions.js: cleanupStaleInProgress now skips overload plans; startAgent resumes an overload-retry plan in-progress instead of picking a new todo plan, and blocks with a human-gate error when an overload-partial plan exists. - state.js: getAgentStatus surfaces overload-retry / overload-partial from in-progress plan status files when no lock is held. Layer 3 — menu-screens.js: dashboard AGENT section shows ⏳ retry in Xm for scheduled retries and ⚠ partial write — review for mid-write overloads. Config — settings.js: adds retry.overloadIntervalSeconds (default 600s / 10 min). Tests — tests/overload-retry.test.js: 9 unit tests covering all three layers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: retry Iron Loop executor on API overload (529) with configurable backoff#7

feat: retry Iron Loop executor on API overload (529) with configurable backoff#7
davidbijl wants to merge 1 commit into
robotijn:mainfrom
davidbijl:feat/overload-retry-529

davidbijl commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidbijl commented May 25, 2026

Summary

Answers to the open questions in #6

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant