Skip to content

fix(forge): resolve abort race condition in agentic loop#4

Open
Mog9 wants to merge 1 commit into
tensormux:mainfrom
Mog9:fix-abort-race-condition
Open

fix(forge): resolve abort race condition in agentic loop#4
Mog9 wants to merge 1 commit into
tensormux:mainfrom
Mog9:fix-abort-race-condition

Conversation

@Mog9

@Mog9 Mog9 commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

The Problem

The agentic loop's abort mechanism had a race condition that made the abort button silently fail. When a user tried to cancel a running kernel optimization, the orchestrator would overwrite the abort flag, preventing the run from stopping.

Before:

  • User clicks abort → API sets abort_requested=true on disk
  • Orchestrator finishes tool execution → calls save_state() with in-memory state
  • In-memory state has abort_requested=false → overwrites disk state
  • User's abort request is lost → run continues burning API credits

After:

  • User clicks abort → API sets abort_requested=true on disk
  • Orchestrator calls _sync_abort_flag() before every save_state()
  • Sync function reads disk state → merges abort flag into memory
  • Abort flag is preserved → run stops correctly

The Fix

Added _sync_abort_flag() function in app/services/forge/agent_runner.py that reads the abort flag from disk and merges it into the in-memory state before every save_state() call. This ensures the API's abort request survives the orchestrator's state saves.

    """Merge abort flag from disk into in-memory state before save_state()."""
    disk_state = load_state(run, repo_root)
    if disk_state and disk_state.abort_requested:
        state.abort_requested = True

Files Changed

  • app/services/forge/agent_runner.py — Added _sync_abort_flag() and called it before every save_state() in the main loop
  • tests/test_abort_race_condition.py — New file with 6 tests covering the fix

The orchestrator's in-memory state overwrites the abort flag set by the
API when save_state() is called after tool execution. This made the abort
button silently fail — users couldn't cancel expensive runs.

Added _sync_abort_flag() that reads the abort flag from disk and merges
it into in-memory state before every save_state() call. This ensures the
API's abort request is preserved through the orchestrator's state saves.

Added 6 tests covering the fix:
- _sync_abort_flag preserves/respects abort flag
- Abort during tool execution is preserved
- Abort after iteration stops loop correctly
- Abort flag survives multiple save_state calls

All 109 tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant