Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions src/app/api/agents/[id]/run/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import { db, schema } from '@/lib/db'
import { eq } from 'drizzle-orm'
import { runAgent, writeAgentSkillConfig } from '@/lib/claude'
import { resolveAgentCwd } from '@/lib/directories'
import { registerRun, unregisterRun } from '@/lib/run-registry'

export const runtime = 'nodejs'
// Allow long-running streams; default Vercel limit is short, but we're
Expand Down Expand Up @@ -72,6 +73,9 @@ export async function POST(
const encoder = new TextEncoder()
const collected: string[] = []
let exitCode: number | null = null
// Track whether the run was killed via the Stop endpoint so we can
// record status='cancelled' instead of 'failed' on a non-zero exit.
let cancelled = false

const stream = new ReadableStream({
async start(controller) {
Expand All @@ -94,6 +98,14 @@ export async function POST(
systemPrompt: agent.systemPrompt ?? '',
model: agent.model,
skills: agent.skills ?? [],
onChild: (child) => {
registerRun(run.id, child)
// Detect external SIGTERM (i.e. our Stop endpoint) so we
// can persist 'cancelled' instead of 'failed' on exit.
child.once('exit', (_code, signal) => {
if (signal === 'SIGTERM') cancelled = true
})
},
})) {
send(event as object)
collected.push(JSON.stringify(event))
Expand All @@ -112,10 +124,15 @@ export async function POST(
type: 'error',
message: err instanceof Error ? err.message : String(err),
})
} finally {
unregisterRun(run.id)
}

const status: 'completed' | 'failed' =
exitCode === 0 ? 'completed' : 'failed'
const status: 'completed' | 'failed' | 'cancelled' = cancelled
? 'cancelled'
: exitCode === 0
? 'completed'
: 'failed'
Comment on lines +131 to +135
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid reporting successful-looking exit codes for cancelled runs.

When status is cancelled, persisting/emitting a normal exit code is misleading in history and live output.

💡 Proposed fix
       const status: 'completed' | 'failed' | 'cancelled' = cancelled
         ? 'cancelled'
         : exitCode === 0
           ? 'completed'
           : 'failed'
+      const effectiveExitCode = status === 'cancelled' ? null : exitCode

       try {
         await db
           .update(schema.runs)
           .set({
             output: collected.join('\n'),
             status,
-            exitCode: exitCode ?? -1,
+            exitCode: effectiveExitCode ?? -1,
             endedAt: new Date(),
           })
           .where(eq(schema.runs.id, run.id))
       } catch {
         // best-effort persistence; ignore.
       }

-      send({ type: 'run:end', runId: run.id, status, exitCode })
+      send({ type: 'run:end', runId: run.id, status, exitCode: effectiveExitCode })

Also applies to: 140-143, 150-150

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/app/api/agents/`[id]/run/route.ts around lines 131 - 135, The code sets
status and implicitly treats exitCode === 0 as success even when cancelled,
which misleads history/live output; update the logic around the status and
exitCode handling (the variables status, cancelled, exitCode and the places
where you persist/emit run results) so that when cancelled is true you set
status = 'cancelled' and avoid emitting or persisting a success exit code (e.g.,
set exitCode to null/undefined or a dedicated cancelled marker and ensure
emit/persist code checks cancelled before using exitCode); apply the same change
to the other occurrences that compute status/emit results using cancelled and
exitCode.

try {
await db
.update(schema.runs)
Expand Down
24 changes: 24 additions & 0 deletions src/app/api/runs/[id]/stop/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import { NextResponse } from 'next/server'
import { killRun, isRunning } from '@/lib/run-registry'

export const runtime = 'nodejs'

/**
* POST /api/runs/[id]/stop
*
* SIGTERM the run's child process group if it's still alive. The
* run-route's SSE loop sees the child exit, persists `status='cancelled'`
* (detected via the SIGTERM signal), and closes the stream. Idempotent:
* stopping an already-finished run returns 200 with stopped:false.
*/
export async function POST(
_req: Request,
{ params }: { params: Promise<{ id: string }> },
) {
const { id } = await params
if (!isRunning(id)) {
return NextResponse.json({ ok: true, stopped: false })
}
const killed = killRun(id)
return NextResponse.json({ ok: true, stopped: killed })
}
9 changes: 7 additions & 2 deletions src/app/workspace/[id]/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -105,9 +105,14 @@ export default async function WorkspacePage({
<h2 className="text-sm font-semibold uppercase tracking-[0.18em] text-neutral-500">
Agents
</h2>
<div className="mt-3 flex flex-col gap-4">
<p className="mt-1 max-w-2xl text-xs text-neutral-500">
Each agent runs independently. Hit Run on more than one to
watch them work in parallel; Stop sends SIGTERM to that
agent&apos;s child without touching the others.
</p>
<div className="mt-3 grid grid-cols-1 gap-4 xl:grid-cols-2">
{agents.length === 0 && (
<div className="rounded-lg border border-dashed border-neutral-800 p-6 text-sm text-neutral-500">
<div className="rounded-lg border border-dashed border-neutral-800 p-6 text-sm text-neutral-500 xl:col-span-2">
No agents in this workspace yet. Create one below.
</div>
)}
Expand Down
60 changes: 51 additions & 9 deletions src/components/AgentPanel.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,30 @@ export function AgentPanel({
const [prompt, setPrompt] = useState('')
const [events, setEvents] = useState<StreamEvent[]>([])
const [running, setRunning] = useState(false)
// RunId becomes known after the server emits its run:start frame.
// The Stop button needs this to address the right child process.
const [runId, setRunId] = useState<string | null>(null)
// Bumped each time a live run ends, so RunHistory re-fetches the
// newly persisted row without a full page reload.
const [historyVersion, setHistoryVersion] = useState(0)
const outputRef = useRef<HTMLDivElement>(null)

async function stop() {
if (!runId) return
try {
await fetch(`/api/runs/${runId}/stop`, { method: 'POST' })
} catch {
// Server might be gone or restarted. The run-end frame will
// never arrive in that case; the UI just stays in "running"
// until the user reloads. Not worth more handling for v1.
}
}

async function run() {
if (!prompt.trim() || running) return
setRunning(true)
setEvents([])
setRunId(null)
try {
const res = await fetch(`/api/agents/${agent.id}/run`, {
method: 'POST',
Expand Down Expand Up @@ -70,6 +85,14 @@ export function AgentPanel({
if (!dataLine) continue
try {
const event = JSON.parse(dataLine.slice('data: '.length))
if (
event &&
typeof event === 'object' &&
event.type === 'run:start' &&
typeof event.runId === 'string'
) {
setRunId(event.runId)
}
setEvents((prev) => [...prev, event])
// Auto-scroll to bottom of output
requestAnimationFrame(() => {
Expand All @@ -92,6 +115,7 @@ export function AgentPanel({
])
} finally {
setRunning(false)
setRunId(null)
setHistoryVersion((v) => v + 1)
}
}
Expand Down Expand Up @@ -159,14 +183,26 @@ export function AgentPanel({
disabled={running}
className="rounded border border-neutral-800 bg-neutral-950 px-3 py-2 text-sm focus:border-neutral-600 focus:outline-none disabled:opacity-50"
/>
<div className="flex justify-between">
<button
onClick={run}
disabled={running || !prompt.trim()}
className="rounded bg-amber-500 px-4 py-2 text-sm font-medium text-neutral-950 transition hover:bg-amber-400 disabled:cursor-not-allowed disabled:opacity-50"
>
{running ? 'running...' : 'Run'}
</button>
<div className="flex items-center justify-between gap-2">
<div className="flex items-center gap-2">
<button
onClick={run}
disabled={running || !prompt.trim()}
className="rounded bg-amber-500 px-4 py-2 text-sm font-medium text-neutral-950 transition hover:bg-amber-400 disabled:cursor-not-allowed disabled:opacity-50"
>
{running ? 'running...' : 'Run'}
</button>
{running && (
<button
onClick={stop}
disabled={!runId}
className="rounded border border-red-900 px-3 py-2 text-sm font-medium text-red-400 transition hover:bg-red-950 disabled:cursor-not-allowed disabled:opacity-50"
title={runId ? 'send SIGTERM to the running child' : 'waiting for runId'}
>
Stop
</button>
)}
</div>
{events.length > 0 && (
<button
onClick={() => setEvents([])}
Expand Down Expand Up @@ -206,8 +242,14 @@ function EventLine({ event }: { event: StreamEvent }) {
status: string
exitCode: number | null
}
const color =
e.status === 'completed'
? 'text-green-500'
: e.status === 'cancelled'
? 'text-amber-400'
: 'text-red-400'
return (
<div className={e.status === 'completed' ? 'text-green-500' : 'text-red-400'}>
<div className={color}>
▸ run {e.status} (exit {e.exitCode ?? '?'})
</div>
)
Expand Down
2 changes: 1 addition & 1 deletion src/components/RunHistory.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ function RunRow({
const status = run.status
const exit = run.exitCode
const statusColor =
status === 'running'
status === 'running' || status === 'cancelled'
? 'text-amber-400'
: exit === 0
? 'text-green-500'
Expand Down
13 changes: 13 additions & 0 deletions src/lib/claude.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,13 @@ export type AgentRunOptions = {
systemPrompt?: string
model?: string | null
skills?: string[]
/**
* Called once with the freshly spawned child process. The caller can
* use this to register the process in a kill-from-elsewhere registry
* (so a Stop button on a parallel pane can SIGTERM this run). The
* child is unspecified if spawn itself fails before this callback.
*/
onChild?: (child: ChildProcessWithoutNullStreams) => void
}

/**
Expand Down Expand Up @@ -93,6 +100,11 @@ export async function* runAgent(opts: AgentRunOptions) {
child = spawn('claude', args, {
cwd: opts.cwd,
env: { ...process.env },
// detached:true puts the child in its own process group. SIGTERM
// sent to -pid then kills the whole group, including any tool
// subprocesses claude spawned. We still wait for the child in the
// current process, so the OS treats this as a regular child.
detached: true,
})
} catch (err) {
yield {
Expand All @@ -102,6 +114,7 @@ export async function* runAgent(opts: AgentRunOptions) {
yield { type: 'exit', code: 1 }
return
}
if (opts.onChild) opts.onChild(child)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard onChild callback failures to prevent leaked child processes.

If Line 117 throws, the stream setup aborts but the spawned process can keep running unmanaged.

💡 Proposed fix
-  if (opts.onChild) opts.onChild(child)
+  if (opts.onChild) {
+    try {
+      opts.onChild(child)
+    } catch (err) {
+      try {
+        child.kill('SIGTERM')
+      } catch {
+        // ignore best-effort cleanup failure
+      }
+      yield {
+        type: 'error',
+        message: `onChild callback failed: ${err instanceof Error ? err.message : String(err)}`,
+      }
+      yield { type: 'exit', code: 1 }
+      return
+    }
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (opts.onChild) opts.onChild(child)
if (opts.onChild) {
try {
opts.onChild(child)
} catch (err) {
try {
child.kill('SIGTERM')
} catch {
// ignore best-effort cleanup failure
}
yield {
type: 'error',
message: `onChild callback failed: ${err instanceof Error ? err.message : String(err)}`,
}
yield { type: 'exit', code: 1 }
return
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/claude.ts` at line 117, The call to opts.onChild(child) is unguarded
and if that callback throws the spawned child process can be left running; wrap
the invocation of opts.onChild in a try/catch around the site where the child is
created (the code that calls opts.onChild(child) in src/lib/claude.ts) and on
error ensure you clean up the child (e.g.,
child.kill()/child.removeAllListeners() or call the same shutdown path used
elsewhere) and rethrow or handle the error; this keeps the child from leaking
while preserving the original error behavior.


// Forward stdout as JSONL events. Buffer partial lines across
// chunks so we never yield half a JSON object.
Expand Down
67 changes: 67 additions & 0 deletions src/lib/run-registry.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
import type { ChildProcessWithoutNullStreams } from 'node:child_process'

/**
* Process-local registry mapping runId -> the child process spawned
* for that run. Used by the Stop endpoint to find and SIGTERM a run
* that was started in a different request.
*
* In-memory only, single process. A `next start` worker restart drops
* the map and any leftover process becomes an orphan (cleaned up by
* the OS or by exiting on its own). Good enough for a local single-
* user app; revisit if Argus ever runs multi-process.
*
* Cached on globalThis so Next.js dev's module hot reload doesn't
* silently fork the map.
*/

declare global {
// eslint-disable-next-line no-var
var __argus_run_registry: Map<string, ChildProcessWithoutNullStreams> | undefined
}

const registry: Map<string, ChildProcessWithoutNullStreams> =
globalThis.__argus_run_registry ??
(globalThis.__argus_run_registry = new Map())

export function registerRun(
runId: string,
child: ChildProcessWithoutNullStreams,
): void {
registry.set(runId, child)
// Auto-clean when the process exits, so the map doesn't grow forever
// and so a Stop request after a natural exit just no-ops cleanly.
child.once('exit', () => {
if (registry.get(runId) === child) registry.delete(runId)
})
}

export function unregisterRun(runId: string): void {
registry.delete(runId)
}

export function isRunning(runId: string): boolean {
return registry.has(runId)
}

/**
* SIGTERM the run's process group. Returns true if a process was
* found and signaled, false if there's nothing to kill (already
* exited, never registered, etc).
*
* Targets `-pid` (negative) so the signal hits the whole process group
* the child was placed into (see detached:true on spawn). That kills
* any tool subprocesses claude itself started.
*/
export function killRun(runId: string): boolean {
const child = registry.get(runId)
if (!child || child.pid == null) return false
try {
process.kill(-child.pid, 'SIGTERM')
return true
Comment on lines +55 to +60
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Node.js, does process.kill() support negative PIDs (process group signaling) on Windows, and what is the recommended cross-platform approach to terminate a child process tree?

💡 Result:

No. On Windows, Node’s process.kill does not support killing a “process group” via a negative PID; the Node docs state that Windows will throw an error if the pid is used to kill a process group [1]. Recommended cross-platform approach to terminate a child process tree: - Don’t rely on negative-PID / process-group signaling for tree termination; it’s Unix-oriented and not supported on Windows by Node [1]. - Instead, track the spawned child and explicitly terminate descendants in a platform-appropriate way (e.g., enumerate children and kill them recursively, or use a cross-platform library that does this). For example, libraries like node-tree-kill/execa’s “tree kill” approach exist specifically because Windows and Unix have very different process/tree and signaling semantics [2]. - If you control the child process, implement cooperative shutdown (listen for a shutdown message/signal and exit), and/or use an out-of-band watchdog so the parent/manager can kill the entire subtree when the parent is terminated [3]. If you want a practical pattern in Node: spawn with child_process (optionally {detached: true} only if you understand the lifecycle implications), then perform explicit recursive cleanup rather than process.kill(-pid) on Windows; use a maintained process-tree kill utility so behavior is consistent across OSes [2][4].

Citations:


Fix Windows process termination: avoid negative-PID process-group kill in killRun

File: src/lib/run-registry.ts (lines 55-60)

export function killRun(runId: string): boolean {
  const child = registry.get(runId)
  if (!child || child.pid == null) return false
  try {
    process.kill(-child.pid, 'SIGTERM')
    return true

process.kill(-child.pid, ...) relies on Unix process-group semantics; Node does not support this “negative PID” process-group signaling on Windows (it will error / not behave as intended). Since this function catches errors and returns false, Stop can report stopped: false while the run is still alive.

 export function killRun(runId: string): boolean {
   const child = registry.get(runId)
   if (!child || child.pid == null) return false
   try {
-    process.kill(-child.pid, 'SIGTERM')
+    if (process.platform === 'win32') {
+      // Best-effort fallback: terminate the child process itself.
+      // (Process-group kill via negative PID is POSIX-specific.)
+      process.kill(child.pid, 'SIGTERM')
+    } else {
+      process.kill(-child.pid, 'SIGTERM')
+    }
     return true
   } catch {

This fallback avoids the Windows negative-PID issue but only targets the parent process; if the goal is to stop the whole process tree consistently across OSes, implement cross-platform descendant termination (e.g., recursive child enumeration or a maintained “tree kill” utility) instead of relying on process-group signaling.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/run-registry.ts` around lines 55 - 60, The current killRun function
uses process.kill(-child.pid, ...) which fails on Windows; update killRun (and
its use of registry, child.pid, process.kill) to branch by platform: on POSIX
keep sending signal to the negative pid to target the process group, but on
Windows call process.kill(child.pid, 'SIGTERM') and then implement or call a
cross-platform descendant-termination routine (e.g., a recursive
child-enumeration/tree-kill utility) to ensure the whole subprocess tree is
terminated; ensure errors are propagated/logged rather than silently swallowed
so callers can know if stopping actually failed.

} catch {
// Process already gone or signal not permitted; either way, drop
// it from the map so we don't try again.
registry.delete(runId)
return false
}
}
Loading