Problem
Stopping a hung or non-progressing auto-run frequently does not work. The UI appears to accept the stop request but the underlying process continues running, forcing a full app restart to recover.
Root Cause Analysis
Extracted and analyzed the compiled source (ProcessManager.js, web-server-factory.js, messageHandlers.js, IPC handlers). The stop mechanism has several architectural weaknesses:
1. Six-hop fire-and-forget kill chain
UI "Stop" → WebSocket → Web Server → IPC to renderer → Renderer handler → IPC to main → ProcessManager.kill()
If any link in this chain is blocked (renderer busy, IPC backed up, handler not registered), the stop request silently disappears. The stopAutoRun callback returns true/false but is fire-and-forget with no confirmation back to the UI.
2. Process removed from tracking before confirmed dead
ProcessManager.kill() removes the process from the processes Map immediately (line ~239), then sets a 2-second escalation timer. If SIGKILL also fails or the process is in uninterruptible I/O, Maestro thinks it's dead but it isn't.
3. No process-group killing
Child processes spawned by the CLI agent (tool calls, bash commands, subagents) may not be in the same process group. Killing the parent leaves orphan children holding resources, keeping the session effectively hung.
4. Race condition with auto-run re-spawn
If the auto-run engine advances to the next task while the kill is in flight, the new spawn can race with the termination — the kill targets a process that's already been replaced.
5. Silent error swallowing in escalation path
On macOS/Linux, the SIGTERM → SIGKILL escalation has no verification step. On Windows, taskkill errors are logged at debug level only (line ~260) and don't trigger further action.
Proposed Fix
- Process-group killing — Spawn agent processes with
setsid / detached: true and kill via kill(-pgid, signal) so the entire process tree dies together.
- Synchronous kill confirmation — Don't report "stopped" to the UI until
waitpid or a process exit event confirms death. Keep a "stopping…" state visible to the user.
- Direct main-process kill path — Add a
process:forceKill IPC handler in the main process that bypasses the renderer entirely. The renderer may itself be blocked by a hung IPC call to the agent.
- Timeout + auto-escalate — If soft stop (SIGINT → SIGTERM) doesn't produce a confirmed exit within 5 seconds, automatically escalate to SIGKILL on the entire process group.
- UI fallback button — If the first stop attempt doesn't confirm within ~3 seconds, surface a "Force Kill" button that uses the direct main-process path.
Environment
- macOS Darwin 25.4.0
- Maestro desktop app (Electron)
- Agent: Claude Code CLI
Problem
Stopping a hung or non-progressing auto-run frequently does not work. The UI appears to accept the stop request but the underlying process continues running, forcing a full app restart to recover.
Root Cause Analysis
Extracted and analyzed the compiled source (
ProcessManager.js,web-server-factory.js,messageHandlers.js, IPC handlers). The stop mechanism has several architectural weaknesses:1. Six-hop fire-and-forget kill chain
If any link in this chain is blocked (renderer busy, IPC backed up, handler not registered), the stop request silently disappears. The
stopAutoRuncallback returnstrue/falsebut is fire-and-forget with no confirmation back to the UI.2. Process removed from tracking before confirmed dead
ProcessManager.kill()removes the process from theprocessesMap immediately (line ~239), then sets a 2-second escalation timer. If SIGKILL also fails or the process is in uninterruptible I/O, Maestro thinks it's dead but it isn't.3. No process-group killing
Child processes spawned by the CLI agent (tool calls, bash commands, subagents) may not be in the same process group. Killing the parent leaves orphan children holding resources, keeping the session effectively hung.
4. Race condition with auto-run re-spawn
If the auto-run engine advances to the next task while the kill is in flight, the new spawn can race with the termination — the kill targets a process that's already been replaced.
5. Silent error swallowing in escalation path
On macOS/Linux, the SIGTERM → SIGKILL escalation has no verification step. On Windows,
taskkillerrors are logged at debug level only (line ~260) and don't trigger further action.Proposed Fix
setsid/detached: trueand kill viakill(-pgid, signal)so the entire process tree dies together.waitpidor a process exit event confirms death. Keep a "stopping…" state visible to the user.process:forceKillIPC handler in the main process that bypasses the renderer entirely. The renderer may itself be blocked by a hung IPC call to the agent.Environment