feat(agent): runner wire contract and tool execution by mmabrouk · Pull Request #4773 · Agenta-AI/agenta

mmabrouk · 2026-06-19T16:29:01Z

Agent-workflows: functional PR set

Sliced by functional area, final code only (no intermediate churn). Most PRs are independent off main; two pairs are stacked. This PR's base is main.

feat(sdk): agent runtime behind backend/harness ports #4771: SDK agent runtime: ports, adapters, tools, messages protocol
- feat(agent): agent workflow service and tool-resolution API #4772: Agent service + tool-resolution API
feat(agent): runner wire contract and tool execution #4773: Runner wire contract + tool execution <- you are here
- feat(agent): runner engines, server, and tracing #4774: Runner engines, server, tracing
feat(frontend): agent config playground controls #4775: Playground agent config UI
chore(hosting): wire the agent runner sidecar into compose #4776: Hosting compose wiring
docs(agent): agent-workflows design and ground truth #4777: Docs: design + ground truth

Context

The agent service drives an LLM harness (Pi in process, or Pi/Claude over ACP) for one turn. The harness needs two things from Agenta: a payload shape it can deserialize, and a way to run the tools the agent calls. This PR is the lower half of the TypeScript runner. It defines the /run wire contract and the tool executors, with nothing that drives a model yet. It branches off main and is independent; the runner-engine PR stacks on top of it and imports these executors. This is a functional slice that shows the final code, not the path the code took to get there.

What this changes

The PR adds a new services/agent TypeScript package with two parts.

protocol.ts is the /run wire contract shared by both engines: AgentRunRequest, AgentRunResult, ResolvedToolSpec, HarnessCapabilities, the AgentEvent union, the StreamRecord NDJSON line, and the helpers resolvePromptText and resolveRunSessionId. The Python side mirrors these names in sdks/python/agenta/sdk/agents/utils/wire.py, and shared golden fixtures pin the two together. Defining the types here, rather than in one engine that the other imports from, is what lets the Pi and rivet engines stay peers.

The tools/ folder executes a tool the backend already resolved. Before, each delivery path carried its own copy of "look at the spec, decide how to run it." Now dispatch.ts owns that decision once and branches on spec.kind: a code tool runs in a sandbox subprocess, a callback tool (the default) POSTs back to Agenta's /tools/call, and a client tool throws because the browser fulfills it across a turn boundary. The call sites keep their own result-wrapping shape; only the execution itself is shared. relay.ts adds the Daytona file relay, used when the in-sandbox process cannot reach a private Agenta backend. mcp-bridge.ts and mcp-server.ts expose the same resolved specs as a stdio MCP server for harnesses driven over ACP, which only accept tools through MCP.

Key architectural decision to review

Two decisions carry the weight.

The dispatch-by-kind seam in tools/dispatch.ts. runResolvedTool is the single place that turns a ResolvedToolSpec into a result, and the three executor kinds are orthogonal axes on the spec, not subclasses. The tradeoff: every caller funnels through this one function, so an executor bug surfaces everywhere at once, but a change to how a kind runs is a one-line edit instead of three. Check that the default (absent kind) stays callback for back-compat, and that the client branch throwing is what the call sites expect.

The subprocess env in tools/code.ts. A code tool runs author-supplied snippet code, so buildChildEnv is the security boundary. The child inherits only BASE_ENV_ALLOWLIST (PATH, HOME, locale, temp) plus the tool's own scoped secrets. It does not inherit process.env, so the provider keys the in-process Pi path writes there (OPENAI_API_KEY and friends), AGENTA_* config, and COMPOSIO_* / DAYTONA_* never reach the snippet. Scrutinize the allowlist itself: anything added to it leaks to every code tool. The leak test in test/code-tool.test.ts is the guard.

How to review this PR

Read src/protocol.ts first. It is the contract everything else speaks. Read the doc comments on AgentRunRequest and ResolvedToolSpec; they explain each field's owner and lifetime.
Then tools/dispatch.ts for the seam, tools/code.ts for the sandbox env, tools/callback.ts for the /tools/call envelope.
Then mcp-bridge.ts (which specs justify attaching the server) and mcp-server.ts (the JSON-RPC stdio loop).
Skip pnpm-lock.yaml (3712 of the lines, generated) and the tsconfig / .dockerignore scaffolding.

Likely regression: the env allowlist. A future hand that adds a variable here to "fix" a code tool would punch a hole in the isolation boundary. The other watch point is the callback default: a spec that arrives with no kind must still route to /tools/call, not get dropped.

Tests / notes

test/ holds four assertion scripts run with tsx, no test framework. code-tool.test.ts exercises both runtimes through real subprocesses, including the node bare-function main case and the env-isolation guarantee (provider key absent, scoped secret present). tool-dispatch.test.ts covers the kind branching, tool-bridge.test.ts the MCP round-trip, and mcp-servers.test.ts the attach decision. The engines that consume runResolvedTool and buildToolMcpServers land in the stacked runner-engine PR, so the wiring is exercised there.

vercel · 2026-06-19T16:29:09Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 19, 2026 4:29pm

coderabbitai · 2026-06-19T16:29:24Z

📝 Walkthrough

Walkthrough

Introduces services/agent as a new private ESM TypeScript package (agenta-agent-pi-wrapper) containing a shared wire-protocol contract (protocol.ts), three tool execution backends (HTTP callback, local code subprocess, filesystem relay for Daytona), an MCP stdio bridge and JSON-RPC server, and integration/unit tests for each component.

Changes

Agent Pi Wrapper Package

Layer / File(s)	Summary
Project scaffolding `.gitignore`, `services/agent/.dockerignore`, `services/agent/package.json`, `services/agent/tsconfig.json`	Adds the `agenta-agent-pi-wrapper` package manifest with Pi/ACP/OTel dependencies and scripts, a strict ESNext TypeScript config, a `.gitignore` rule for the built `dist/` bundle, and a `.dockerignore` excluding `node_modules`/logs/`.git` from Docker build context.
Shared wire-protocol types and helpers `services/agent/src/protocol.ts`	Defines all shared TypeScript contracts: message/content primitives (`ContentBlock`, `ChatMessage`, `TraceContext`), tool/harness configuration (`ResolvedToolSpec`, `ToolCallbackContext`, `McpServerConfig`, `HarnessCapabilities`), the `AgentEvent` discriminated union, run payload contracts (`AgentRunRequest`, `AgentRunResult`, `StreamRecord`), and helper functions `messageText`, `resolvePromptText`, and `resolveRunSessionId`.
Tool callback HTTP transport `services/agent/src/tools/callback.ts`	Implements `callAgentaTool()` to POST tool-call envelopes to an Agenta `/tools/call` endpoint with configurable timeout, abort signal merging, HTTP error propagation, and JSON content extraction.
Code tool subprocess executor `services/agent/src/tools/code.ts`, `services/agent/test/code-tool.test.ts`	Implements `runCodeTool()`, which writes a snippet to a temp directory, spawns Python or Node with runtime-specific bootstrap scripts, enforces `CODE_TOOL_TIMEOUT_MS` via SIGKILL, and communicates via stdin/stdout JSON. Tests exercise multiple entrypoint patterns, env isolation, and error rejection.
Filesystem relay for Daytona/sandbox isolation `services/agent/src/tools/relay.ts`	Implements `startToolRelay()`, a runner-side polling loop that reads `.req.json` files from a relay directory, dispatches `callAgentaTool()`, and writes `.res.json` responses back, enabling tool calls across network-isolated sandboxes.
Tool dispatch routing `services/agent/src/tools/dispatch.ts`, `services/agent/test/tool-dispatch.test.ts`	Implements `runResolvedTool()` branching by `spec.kind`: `code` → `runCodeTool`, `client` → throw, `callback` → `relayToolCall` (filesystem) or `callAgentaTool` (HTTP). Tests validate routing and end-to-end code execution.
MCP stdio bridge and server `services/agent/src/tools/mcp-bridge.ts`, `services/agent/src/tools/mcp-server.ts`, `services/agent/test/tool-bridge.test.ts`, `services/agent/test/mcp-servers.test.ts`	`mcp-bridge.ts` builds `McpServerStdio` entries via `buildToolMcpServers()`, filtering client tools and wiring `AGENTA_TOOL_SPECS`/endpoint env vars. `mcp-server.ts` is a minimal JSON-RPC 2.0 stdio server handling `initialize`, `tools/list`, and `tools/call` via `runResolvedTool`. Tests validate bridge attachment logic and MCP→ACP env conversion.

Sequence Diagram(s)

sequenceDiagram
  participant AcpAgent as ACP Agent
  participant McpBridge as mcp-bridge.ts
  participant McpServer as mcp-server.ts
  participant Dispatch as runResolvedTool
  participant CodeExec as runCodeTool
  participant CallbackHTTP as callAgentaTool
  participant RelayFS as startToolRelay

  rect rgba(100, 149, 237, 0.5)
    note over AcpAgent,McpBridge: Bridge setup at harness start
    AcpAgent->>McpBridge: buildToolMcpServers(specs, callback)
    McpBridge-->>AcpAgent: McpServerStdio config (command, args, env)
    AcpAgent->>McpServer: spawn via stdio (AGENTA_TOOL_SPECS in env)
  end

  rect rgba(60, 179, 113, 0.5)
    note over AcpAgent,Dispatch: Per tool-call execution
    AcpAgent->>McpServer: tools/call {name, arguments} (JSON-RPC stdin)
    McpServer->>Dispatch: runResolvedTool(spec, params, opts)
    alt spec.kind == "code"
      Dispatch->>CodeExec: runCodeTool(runtime, code, env, args)
      CodeExec-->>Dispatch: stdout JSON string
    else spec.kind == "callback" and relayDir set
      Dispatch->>RelayFS: write *.req.json, poll for *.res.json
      RelayFS->>CallbackHTTP: callAgentaTool(endpoint, auth, args)
      CallbackHTTP-->>RelayFS: result string
      RelayFS-->>Dispatch: result string
    else spec.kind == "callback" direct
      Dispatch->>CallbackHTTP: callAgentaTool(endpoint, auth, args)
      CallbackHTTP-->>Dispatch: result string
    end
    Dispatch-->>McpServer: result string
    McpServer-->>AcpAgent: JSON-RPC success response (stdout)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(agent): runner wire contract and tool execution' accurately and specifically summarizes the main changes: introducing wire protocol contracts and tool execution capabilities for the agent service.
Docstring Coverage	✅ Passed	Docstring coverage is 76.47% which is sufficient. The required threshold is 60.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description comprehensively explains the changes, context, architecture decisions, and review guidance related to the agent runner wire contract and tool execution.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/agent-runner-tools

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mmabrouk · 2026-06-19T16:33:34Z

Reviewer guide: interesting code

Start here to find the load-bearing code fast.

services/agent/src/protocol.ts:64 — ResolvedToolSpec: the three orthogonal axes (executor kind, needsApproval, render) and why absent kind means callback.
services/agent/src/protocol.ts:293 — resolveRunSessionId: prefer the platform conversation id over the harness's ephemeral id.
services/agent/src/tools/dispatch.ts:104 — runResolvedTool: the one place that branches on kind (code runs local, client throws, callback routes back or relays).
services/agent/src/tools/code.ts:82 — BASE_ENV_ALLOWLIST and buildChildEnv: the deny-by-default boundary that keeps provider keys and AGENTA_* out of an author's snippet.
services/agent/src/tools/callback.ts:32 — callAgentaTool: the /tools/call envelope and response parse, kept in one place so the contract changes once.
services/agent/src/tools/relay.ts:51 — startToolRelay: the Daytona file relay for when the in-sandbox process cannot reach Agenta directly.

mmabrouk · 2026-06-19T16:33:44Z

+// excludes everything secret-bearing or sidecar-specific: no AGENTA_*, no *_API_KEY /
+// *_TOKEN, no COMPOSIO_* / DAYTONA_*, no provider keys that the in-process Pi path writes
+// into process.env before a run. Only the tool's declared scoped `env` is layered on top.
+const BASE_ENV_ALLOWLIST = [


Security boundary to scrutinize: this is a deny-by-default allowlist for the code-tool subprocess. The child never inherits the sidecar process.env, so provider keys, AGENTA_* config, and COMPOSIO_/DAYTONA_ stay out of an author-supplied snippet. Only the tool's own resolved secrets are layered on top. Check that no entry here can carry a secret, and note the tradeoff: a runtime needing an env var not listed will fail to start.

mmabrouk · 2026-06-19T16:33:55Z

+  params: unknown,
+  opts: RunResolvedToolOpts,
+): Promise<string> {
+  if (spec.kind === "code") {


Dispatch-by-kind: this is the single home for 'branch on spec.kind to run a resolved tool', previously duplicated across the Pi engine, the Pi-under-rivet extension, and the MCP server. Note the default: a spec with no kind falls through to the callback path (preserving the older gateway-only spec). Worth confirming a malformed spec that loses its kind routing to callback is intended, and that the callback path rejects what it cannot fulfil rather than failing silently.

mmabrouk · 2026-06-19T16:33:56Z

+}
+
+/** Prefer the platform conversation id, falling back to the harness's ephemeral id. */
+export function resolveRunSessionId(request: AgentRunRequest, fallback: string): string {


resolveRunSessionId prefers the platform conversation id over the harness's ephemeral id. This is what keeps a session stable across turns when the harness mints its own throwaway id. Check the fallback: an empty/whitespace sessionId must fall back, not pin the run to a blank id.

mmabrouk · 2026-06-19T16:33:57Z

+      let stdout = "";
+      let stderr = "";
+      let settled = false;
+      const finish = (fn: () => void) => {


Subprocess lifecycle is the likely regression surface. The 'settled' guard plus finish() must resolve/reject exactly once across the timeout, abort, error, and close paths, and the temp dir is removed in finally. Worth a close read that no path can settle twice or leak the snippet dir.

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

services/agent/test/tool-dispatch.test.ts (1)

64-83: ⚡ Quick win

Add callback/relay branch coverage for runResolvedTool

This file currently verifies only code and client branches. Adding tests for callback direct-call and relay paths would protect the new dispatch contract from regressions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e248ef9d-0540-48ef-b446-fb03d48acbe9

📥 Commits

Reviewing files that changed from the base of the PR and between a97e608 and 2f467ce.

⛔ Files ignored due to path filters (1)

services/agent/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (15)

.gitignore
services/agent/.dockerignore
services/agent/package.json
services/agent/src/protocol.ts
services/agent/src/tools/callback.ts
services/agent/src/tools/code.ts
services/agent/src/tools/dispatch.ts
services/agent/src/tools/mcp-bridge.ts
services/agent/src/tools/mcp-server.ts
services/agent/src/tools/relay.ts
services/agent/test/code-tool.test.ts
services/agent/test/mcp-servers.test.ts
services/agent/test/tool-bridge.test.ts
services/agent/test/tool-dispatch.test.ts
services/agent/tsconfig.json

coderabbitai · 2026-06-19T16:37:48Z

+  const timeoutSignal = AbortSignal.timeout(TOOL_CALL_TIMEOUT_MS);
+  const anyOf = (AbortSignal as any).any;
+  const combined =
+    signal && typeof anyOf === "function" ? anyOf([signal, timeoutSignal]) : timeoutSignal;
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Declared Node/runtime constraints ==" if [ -f services/agent/package.json ]; then jq '{engines, packageManager, volta}' services/agent/package.json else echo "services/agent/package.json not found" fi echo echo "== CI / container node version hints ==" rg -n --iglob '*.yml' --iglob '*.yaml' --iglob 'Dockerfile*' \ 'node-version|setup-node|FROM node|NODE_VERSION' echo echo "== AbortSignal API usage in repo ==" rg -n 'AbortSignal\.any|AbortSignal\.timeout'

Repository: Agenta-AI/agenta

Length of output: 2201

🏁 Script executed:

#!/bin/bash echo "== .nvmrc or .tool-versions hints ==" find . -maxdepth 3 -name ".nvmrc" -o -name ".tool-versions" | head -20 echo echo "== services/agent directory structure ==" ls -la services/agent/ | head -20 echo echo "== Check if services/agent has its own .nvmrc ==" if [ -f services/agent/.nvmrc ]; then cat services/agent/.nvmrc else echo "No .nvmrc found in services/agent" fi echo echo "== Full services/agent/package.json (snippet) ==" head -30 services/agent/package.json

Repository: Agenta-AI/agenta

Length of output: 1839

Caller abort is dropped when AbortSignal.any is unavailable.

At lines 44–46, if AbortSignal.any is missing, the function only uses timeoutSignal, so external cancellation (signal) never propagates. While all current deployments target Node 20+, the services/agent/package.json lacks an explicit engines constraint, leaving this vulnerability open if the code is ever run on an older Node version.

Add an explicit minimum Node version to engines in services/agent/package.json (Node 20 or higher), or implement the fallback merge logic to handle runtimes lacking AbortSignal.any:

💡 Suggested fix (if engines constraint is not added)

const timeoutSignal = AbortSignal.timeout(TOOL_CALL_TIMEOUT_MS); const anyOf = (AbortSignal as any).any; - const combined = - signal && typeof anyOf === "function" ? anyOf([signal, timeoutSignal]) : timeoutSignal; + let combined: AbortSignal = timeoutSignal; + if (signal) { + if (typeof anyOf === "function") { + combined = anyOf([signal, timeoutSignal]); + } else { + const ctrl = new AbortController(); + const abort = () => ctrl.abort(); + timeoutSignal.addEventListener("abort", abort, { once: true }); + signal.addEventListener("abort", abort, { once: true }); + combined = ctrl.signal; + } + }

coderabbitai · 2026-06-19T16:37:48Z

+export async function relayToolCall(
+  dir: string,
+  callRef: string,
+  toolCallId: string,
+  params: unknown,
+  signal?: AbortSignal,
+): Promise<string> {
+  const id = sanitizeRelayId(toolCallId);
+  const reqPath = `${dir}/${id}${RELAY_REQ_SUFFIX}`;
+  const resPath = `${dir}/${id}${RELAY_RES_SUFFIX}`;
+  try {
+    mkdirSync(dir, { recursive: true });
+  } catch {
+    // The runner also creates it; a race here is harmless.
+  }
+  writeFileSync(reqPath, JSON.stringify({ callRef, toolCallId, args: params ?? {} }), "utf-8");
+
+  const deadline = Date.now() + RELAY_TIMEOUT_MS;
+  while (Date.now() < deadline) {
+    if (signal?.aborted) throw new Error("aborted");
+    if (existsSync(resPath)) {
+      const res = JSON.parse(readFileSync(resPath, "utf-8")) as RelayResponse;
+      try {
+        unlinkSync(reqPath);
+      } catch {
+        /* best-effort cleanup */
+      }
+      try {
+        unlinkSync(resPath);
+      } catch {
+        /* best-effort cleanup */
+      }
+      if (res.ok) return res.text ?? "";
+      throw new Error(res.error || `tool relay failed for ${callRef}`);
+    }
+    await sleep(RELAY_POLL_MS);
+  }
+  throw new Error(`tool relay timed out for ${callRef}`);
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Aborted or timed-out relay calls can still execute later

On Line 70 and Line 92, relayToolCall leaves the request file behind when aborted/timed out. The runner loop in services/agent/src/tools/relay.ts will still pick it up and execute the external tool call, creating side effects after cancellation.

Suggested fix

export async function relayToolCall( dir: string, callRef: string, toolCallId: string, params: unknown, signal?: AbortSignal, ): Promise<string> { const id = sanitizeRelayId(toolCallId); const reqPath = `${dir}/${id}${RELAY_REQ_SUFFIX}`; const resPath = `${dir}/${id}${RELAY_RES_SUFFIX}`; + if (signal?.aborted) throw new Error("aborted"); try { mkdirSync(dir, { recursive: true }); } catch { // The runner also creates it; a race here is harmless. } writeFileSync(reqPath, JSON.stringify({ callRef, toolCallId, args: params ?? {} }), "utf-8"); - const deadline = Date.now() + RELAY_TIMEOUT_MS; - while (Date.now() < deadline) { - if (signal?.aborted) throw new Error("aborted"); - if (existsSync(resPath)) { - const res = JSON.parse(readFileSync(resPath, "utf-8")) as RelayResponse; - try { - unlinkSync(reqPath); - } catch { - /* best-effort cleanup */ - } - try { - unlinkSync(resPath); - } catch { - /* best-effort cleanup */ - } - if (res.ok) return res.text ?? ""; - throw new Error(res.error || `tool relay failed for ${callRef}`); + try { + const deadline = Date.now() + RELAY_TIMEOUT_MS; + while (Date.now() < deadline) { + if (signal?.aborted) throw new Error("aborted"); + if (existsSync(resPath)) { + const res = JSON.parse(readFileSync(resPath, "utf-8")) as RelayResponse; + if (res.ok) return res.text ?? ""; + throw new Error(res.error || `tool relay failed for ${callRef}`); + } + await sleep(RELAY_POLL_MS); } - await sleep(RELAY_POLL_MS); + throw new Error(`tool relay timed out for ${callRef}`); + } finally { + try { unlinkSync(reqPath); } catch {} + try { unlinkSync(resPath); } catch {} } - throw new Error(`tool relay timed out for ${callRef}`); }

coderabbitai · 2026-06-19T16:37:48Z

+  // callback (default): route back to Agenta's /tools/call (directly or via the Daytona relay).
+  if (opts.relayDir) {
+    return relayToolCall(opts.relayDir, spec.callRef ?? "", opts.toolCallId, params, opts.signal);
+  }
+  return callAgentaTool(
+    opts.endpoint ?? "",
+    opts.authorization,
+    spec.callRef ?? "",
+    opts.toolCallId,
+    params,
+    opts.signal,
+  );


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate callback prerequisites before dispatching

On Line 122 and Line 124, falling back to "" for endpoint/callRef turns contract mistakes into opaque runtime failures. Fail fast with explicit validation before relay/HTTP dispatch.

Suggested fix

export async function runResolvedTool( spec: ResolvedToolSpec, params: unknown, opts: RunResolvedToolOpts, ): Promise<string> { if (spec.kind === "code") { return runCodeTool(spec.runtime, spec.code ?? "", spec.env, params, opts.signal); } if (spec.kind === "client") { throw new Error( `client tool '${spec.name}' is browser-fulfilled and cannot be executed in-sandbox`, ); } + if (!spec.callRef?.trim()) { + throw new Error(`callback tool '${spec.name}' is missing callRef`); + } // callback (default): route back to Agenta's /tools/call (directly or via the Daytona relay). if (opts.relayDir) { - return relayToolCall(opts.relayDir, spec.callRef ?? "", opts.toolCallId, params, opts.signal); + return relayToolCall(opts.relayDir, spec.callRef, opts.toolCallId, params, opts.signal); } + if (!opts.endpoint?.trim()) { + throw new Error(`callback tool '${spec.name}' requires a callback endpoint`); + } return callAgentaTool( - opts.endpoint ?? "", + opts.endpoint, opts.authorization, - spec.callRef ?? "", + spec.callRef, opts.toolCallId, params, opts.signal, ); }

coderabbitai · 2026-06-19T16:37:48Z

+import { EMPTY_OBJECT_SCHEMA } from "./callback.ts";
+import { runResolvedTool } from "./dispatch.ts";
+
+const SPECS: ResolvedToolSpec[] = JSON.parse(process.env.AGENTA_TOOL_SPECS ?? "[]");


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard AGENTA_TOOL_SPECS parsing to prevent hard startup crashes.

Line 25 parses env JSON at module load without error handling. A malformed value crashes the bridge before it can even reply to initialize. Default to [] with a stderr warning instead.

Suggested fix

-const SPECS: ResolvedToolSpec[] = JSON.parse(process.env.AGENTA_TOOL_SPECS ?? "[]"); +function parseSpecs(raw: string | undefined): ResolvedToolSpec[] { + try { + const parsed = JSON.parse(raw ?? "[]"); + return Array.isArray(parsed) ? (parsed as ResolvedToolSpec[]) : []; + } catch { + process.stderr.write("[tool-bridge] invalid AGENTA_TOOL_SPECS JSON; defaulting to []\n"); + return []; + } +} + +const SPECS: ResolvedToolSpec[] = parseSpecs(process.env.AGENTA_TOOL_SPECS);

coderabbitai · 2026-06-19T16:37:48Z

+  process.stdin.setEncoding("utf8");
+  process.stdin.on("data", (chunk: string) => {
+    buffer += chunk;
+    let newline: number;
+    while ((newline = buffer.indexOf("\n")) !== -1) {
+      const line = buffer.slice(0, newline).trim();
+      buffer = buffer.slice(newline + 1);
+      if (!line) continue;
+      let parsed: any;
+      try {
+        parsed = JSON.parse(line);
+      } catch {
+        log(`skipping non-JSON line: ${line.slice(0, 120)}`);
+        continue;
+      }
+      Promise.resolve(handle(parsed))
+        .then((response) => {
+          if (response) send(response);
+        })
+        .catch((err) => log(`handler error: ${err?.message ?? err}`));
+    }
+  });
+  process.stdin.on("end", () => process.exit(0));
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid exiting while requests are still in flight.

Line 131 calls process.exit(0) as soon as stdin ends. If a tools/call is still running, the process can terminate before sending its response.

Suggested fix

function main(): void { log(`serving ${SPECS.length} tool(s) -> ${ENDPOINT || "(no endpoint)"}`); let buffer = ""; + let inFlight = 0; + let stdinEnded = false; + const maybeExit = () => { + if (stdinEnded && inFlight === 0) process.exit(0); + }; + process.stdin.setEncoding("utf8"); process.stdin.on("data", (chunk: string) => { buffer += chunk; @@ - Promise.resolve(handle(parsed)) + inFlight += 1; + Promise.resolve(handle(parsed)) .then((response) => { if (response) send(response); }) - .catch((err) => log(`handler error: ${err?.message ?? err}`)); + .catch((err) => log(`handler error: ${err?.message ?? err}`)) + .finally(() => { + inFlight -= 1; + maybeExit(); + }); } }); - process.stdin.on("end", () => process.exit(0)); + process.stdin.on("end", () => { + stdinEnded = true; + maybeExit(); + }); }

coderabbitai · 2026-06-19T16:37:48Z

+): { stop: () => Promise<void> } {
+  let active = true;
+  const seen = new Set<string>();
+  const inflight: Promise<void>[] = [];


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

inflight grows forever across relay lifetime.

At Lines 58 and 103-104, every handled request is retained permanently in inflight; this can create steady memory growth on long-running runners.

💡 Suggested fix

- const inflight: Promise<void>[] = []; + const inflight = new Set<Promise<void>>(); @@ - inflight.push(handle(name)); + const task = handle(name).finally(() => inflight.delete(task)); + inflight.add(task); @@ - await Promise.allSettled(inflight); + await Promise.allSettled([...inflight]);

Also applies to: 103-104, 110-110

mmabrouk · 2026-06-19T17:51:47Z

Reviewer guide: interesting code

Start here, in order:

services/agent/src/protocol.ts:70 — ResolvedToolSpec.kind is the executor axis (callback | code | client). Absent means callback for back-compat. This one field drives the whole dispatch seam.
services/agent/src/tools/dispatch.ts:108 — runResolvedTool: the single place a resolved spec becomes a result. Branch on kind, one home, no import cycle back into a call site.
services/agent/src/tools/code.ts:82 — BASE_ENV_ALLOWLIST plus buildChildEnv: the security boundary. The subprocess inherits only this allowlist and the tool's scoped secrets, never process.env, so provider keys do not reach author code.
services/agent/src/tools/callback.ts:50 — the /tools/call envelope. Arguments go as an object, not a JSON string, to avoid double-encoding; the Composio key stays server-side.
services/agent/src/tools/relay.ts:48 — startToolRelay: the Daytona file relay. The in-sandbox process writes a request file; the runner (which can reach Agenta) executes and writes the response file.
services/agent/src/tools/mcp-bridge.ts:80 — the attach decision: a missing callback endpoint warns and names the affected tools but does not drop the whole MCP server, so code tools are not silently lost.

mmabrouk · 2026-06-19T17:51:57Z

+// excludes everything secret-bearing or sidecar-specific: no AGENTA_*, no *_API_KEY /
+// *_TOKEN, no COMPOSIO_* / DAYTONA_*, no provider keys that the in-process Pi path writes
+// into process.env before a run. Only the tool's declared scoped `env` is layered on top.
+const BASE_ENV_ALLOWLIST = [


This allowlist is the isolation boundary for author-supplied code tools. The child gets only these vars plus the tool's scoped secrets, never process.env, so the provider keys the in-process Pi path writes there (OPENAI_API_KEY etc.) and AGENTA_* / COMPOSIO_* config stay out of the snippet. Anything added here leaks to every code tool; treat additions as a security change. The leak case in test/code-tool.test.ts guards it.

mmabrouk · 2026-06-19T17:52:03Z

+  spec: ResolvedToolSpec,
+  params: unknown,
+  opts: RunResolvedToolOpts,
+): Promise<string> {


The dispatch seam. Every delivery path (in-process Pi, Pi-under-rivet, the MCP bridge) used to carry its own copy of this branch. Now it lives once. Note the default: a spec with no kind falls through to callback, so older resolvers that never set kind still route to /tools/call rather than getting dropped. The client branch throwing is intentional; browser-fulfilled tools must never run in-sandbox.

mmabrouk · 2026-06-19T17:52:09Z

+  inputSchema?: Record<string, unknown> | null;
+  /** Set for `callback` (gateway) tools only; absent for `code` / `client`. */
+  callRef?: string;
+  kind?: "callback" | "code" | "client";


This single field is the executor axis the whole tools/ folder branches on. It is orthogonal to needsApproval and render, not a subclass tag. The Python wire (sdks/python/agenta/sdk/agents/utils/wire.py) mirrors these names and the golden fixtures pin the two, so renaming or reordering here without updating that side fails test_wire_contract.py.

mmabrouk · 2026-06-19T17:52:15Z

+  const callbackSpecs = executable.filter((s) => (s.kind ?? "callback") === "callback");
+  const hasEndpoint = Boolean(callback?.endpoint);
+
+  if (callbackSpecs.length > 0 && !hasEndpoint) {


The careful case: callback tools present but no endpoint. Dropping the whole server here would silently lose any code tools in the same set (they need no endpoint). Instead it warns, names the callback tools whose calls will fail, and still attaches the server. Worth confirming this is the behavior you want over failing fast.

mmabrouk

Codex subagent review for #4773

Findings:

Blocking security: services/agent/src/tools/mcp-bridge.ts:88 serializes every executable ResolvedToolSpec into the long-lived bridge process environment as AGENTA_TOOL_SPECS, and services/agent/src/tools/mcp-bridge.ts:96 also puts callback auth in env. That breaks the scoped-secret boundary documented in services/agent/src/protocol.ts:49 and enforced only at the final runCodeTool child env in services/agent/src/tools/code.ts:149. For the MCP bridge path, all code-tool spec.env values exist in the parent bridge env before a selected tool is spawned; an author-supplied code tool that can run local code can inspect same-user process environments on common Linux setups (for example via /proc/$PPID/environ) or otherwise benefit from env/process inspection, recovering other tools' scoped secrets and callback auth. Please avoid carrying full specs with env/auth through harness or bridge environment variables, or redact secret-bearing fields from AGENTA_TOOL_SPECS and inject only the selected tool's scoped env over a narrower channel immediately before spawning the code subprocess. This should also have a regression test that one code tool cannot read bridge-level tool specs, callback auth, or another tool's scoped secret.
Test/validation breakage: services/agent/test/tool-dispatch.test.ts:20 imports ../src/engines/pi.ts, and services/agent/test/mcp-servers.test.ts:15 imports ../src/engines/rivet.ts, but those files are not present on this PR head or on main. The commands advertised in those test files fail standalone with module-not-found, despite the PR body saying this branch is independent off main. Either keep these tests with the engine PR that introduces those modules, or make #4773's tests exercise only the contract/tool modules present here. Separately, services/agent/src/protocol.ts:4 says the Python wire/golden contract is pinned by files that also are not present on this branch, so this PR currently does not provide the promised standalone TS/Python parity guard.

Context checked: I read the #4779 docs branch first; the docs describe the broader active stack, but #4773's stack nav still points at #4777 and the docs are ahead of code in places, so I treated them as context rather than source of truth for this branch.

Tests not run: review-only via the GitHub app; I did not check out or execute the branch.

feat(agent): runner wire protocol and tool execution

2f467ce

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Feature Request New feature or request typescript labels Jun 19, 2026

vercel Bot deployed to Preview June 19, 2026 16:29 View deployment

mmabrouk commented Jun 19, 2026

View reviewed changes

mmabrouk mentioned this pull request Jun 19, 2026

feat(agent): runner engines, HTTP server, tracing, and docker image #4778

Open

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

mmabrouk commented Jun 19, 2026

View reviewed changes

This was referenced Jun 19, 2026

docs(agent): agent-workflows design and ground truth #4777

Open

feat(agent): runner engines, server, and tracing #4774

Open

mmabrouk commented Jun 19, 2026

View reviewed changes

Conversation

mmabrouk commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent-workflows: functional PR set

Context

What this changes

Key architectural decision to review

How to review this PR

Tests / notes

Uh oh!

vercel Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

mmabrouk commented Jun 19, 2026

Reviewer guide: interesting code

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk commented Jun 19, 2026

Reviewer guide: interesting code

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmabrouk left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mmabrouk commented Jun 19, 2026 •

edited

Loading

vercel Bot commented Jun 19, 2026 •

edited

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading