Skip to content

feat(agent): runner wire contract and tool execution#4773

Open
mmabrouk wants to merge 1 commit into
mainfrom
feat/agent-runner-tools
Open

feat(agent): runner wire contract and tool execution#4773
mmabrouk wants to merge 1 commit into
mainfrom
feat/agent-runner-tools

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 19, 2026

Copy link
Copy Markdown
Member

Agent-workflows: functional PR set

Sliced by functional area, final code only (no intermediate churn). Most PRs are independent off main; two pairs are stacked. This PR's base is main.

Context

The agent service drives an LLM harness (Pi in process, or Pi/Claude over ACP) for one turn. The harness needs two things from Agenta: a payload shape it can deserialize, and a way to run the tools the agent calls. This PR is the lower half of the TypeScript runner. It defines the /run wire contract and the tool executors, with nothing that drives a model yet. It branches off main and is independent; the runner-engine PR stacks on top of it and imports these executors. This is a functional slice that shows the final code, not the path the code took to get there.

What this changes

The PR adds a new services/agent TypeScript package with two parts.

protocol.ts is the /run wire contract shared by both engines: AgentRunRequest, AgentRunResult, ResolvedToolSpec, HarnessCapabilities, the AgentEvent union, the StreamRecord NDJSON line, and the helpers resolvePromptText and resolveRunSessionId. The Python side mirrors these names in sdks/python/agenta/sdk/agents/utils/wire.py, and shared golden fixtures pin the two together. Defining the types here, rather than in one engine that the other imports from, is what lets the Pi and rivet engines stay peers.

The tools/ folder executes a tool the backend already resolved. Before, each delivery path carried its own copy of "look at the spec, decide how to run it." Now dispatch.ts owns that decision once and branches on spec.kind: a code tool runs in a sandbox subprocess, a callback tool (the default) POSTs back to Agenta's /tools/call, and a client tool throws because the browser fulfills it across a turn boundary. The call sites keep their own result-wrapping shape; only the execution itself is shared. relay.ts adds the Daytona file relay, used when the in-sandbox process cannot reach a private Agenta backend. mcp-bridge.ts and mcp-server.ts expose the same resolved specs as a stdio MCP server for harnesses driven over ACP, which only accept tools through MCP.

Key architectural decision to review

Two decisions carry the weight.

The dispatch-by-kind seam in tools/dispatch.ts. runResolvedTool is the single place that turns a ResolvedToolSpec into a result, and the three executor kinds are orthogonal axes on the spec, not subclasses. The tradeoff: every caller funnels through this one function, so an executor bug surfaces everywhere at once, but a change to how a kind runs is a one-line edit instead of three. Check that the default (absent kind) stays callback for back-compat, and that the client branch throwing is what the call sites expect.

The subprocess env in tools/code.ts. A code tool runs author-supplied snippet code, so buildChildEnv is the security boundary. The child inherits only BASE_ENV_ALLOWLIST (PATH, HOME, locale, temp) plus the tool's own scoped secrets. It does not inherit process.env, so the provider keys the in-process Pi path writes there (OPENAI_API_KEY and friends), AGENTA_* config, and COMPOSIO_* / DAYTONA_* never reach the snippet. Scrutinize the allowlist itself: anything added to it leaks to every code tool. The leak test in test/code-tool.test.ts is the guard.

How to review this PR

  1. Read src/protocol.ts first. It is the contract everything else speaks. Read the doc comments on AgentRunRequest and ResolvedToolSpec; they explain each field's owner and lifetime.
  2. Then tools/dispatch.ts for the seam, tools/code.ts for the sandbox env, tools/callback.ts for the /tools/call envelope.
  3. Then mcp-bridge.ts (which specs justify attaching the server) and mcp-server.ts (the JSON-RPC stdio loop).
  4. Skip pnpm-lock.yaml (3712 of the lines, generated) and the tsconfig / .dockerignore scaffolding.

Likely regression: the env allowlist. A future hand that adds a variable here to "fix" a code tool would punch a hole in the isolation boundary. The other watch point is the callback default: a spec that arrives with no kind must still route to /tools/call, not get dropped.

Tests / notes

test/ holds four assertion scripts run with tsx, no test framework. code-tool.test.ts exercises both runtimes through real subprocesses, including the node bare-function main case and the env-isolation guarantee (provider key absent, scoped secret present). tool-dispatch.test.ts covers the kind branching, tool-bridge.test.ts the MCP round-trip, and mcp-servers.test.ts the attach decision. The engines that consume runResolvedTool and buildToolMcpServers land in the stacked runner-engine PR, so the wiring is exercised there.

@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Feature Request New feature or request typescript labels Jun 19, 2026
@vercel

vercel Bot commented Jun 19, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 19, 2026 4:29pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Introduces services/agent as a new private ESM TypeScript package (agenta-agent-pi-wrapper) containing a shared wire-protocol contract (protocol.ts), three tool execution backends (HTTP callback, local code subprocess, filesystem relay for Daytona), an MCP stdio bridge and JSON-RPC server, and integration/unit tests for each component.

Changes

Agent Pi Wrapper Package

Layer / File(s) Summary
Project scaffolding
.gitignore, services/agent/.dockerignore, services/agent/package.json, services/agent/tsconfig.json
Adds the agenta-agent-pi-wrapper package manifest with Pi/ACP/OTel dependencies and scripts, a strict ESNext TypeScript config, a .gitignore rule for the built dist/ bundle, and a .dockerignore excluding node_modules/logs/.git from Docker build context.
Shared wire-protocol types and helpers
services/agent/src/protocol.ts
Defines all shared TypeScript contracts: message/content primitives (ContentBlock, ChatMessage, TraceContext), tool/harness configuration (ResolvedToolSpec, ToolCallbackContext, McpServerConfig, HarnessCapabilities), the AgentEvent discriminated union, run payload contracts (AgentRunRequest, AgentRunResult, StreamRecord), and helper functions messageText, resolvePromptText, and resolveRunSessionId.
Tool callback HTTP transport
services/agent/src/tools/callback.ts
Implements callAgentaTool() to POST tool-call envelopes to an Agenta /tools/call endpoint with configurable timeout, abort signal merging, HTTP error propagation, and JSON content extraction.
Code tool subprocess executor
services/agent/src/tools/code.ts, services/agent/test/code-tool.test.ts
Implements runCodeTool(), which writes a snippet to a temp directory, spawns Python or Node with runtime-specific bootstrap scripts, enforces CODE_TOOL_TIMEOUT_MS via SIGKILL, and communicates via stdin/stdout JSON. Tests exercise multiple entrypoint patterns, env isolation, and error rejection.
Filesystem relay for Daytona/sandbox isolation
services/agent/src/tools/relay.ts
Implements startToolRelay(), a runner-side polling loop that reads *.req.json files from a relay directory, dispatches callAgentaTool(), and writes *.res.json responses back, enabling tool calls across network-isolated sandboxes.
Tool dispatch routing
services/agent/src/tools/dispatch.ts, services/agent/test/tool-dispatch.test.ts
Implements runResolvedTool() branching by spec.kind: coderunCodeTool, client → throw, callbackrelayToolCall (filesystem) or callAgentaTool (HTTP). Tests validate routing and end-to-end code execution.
MCP stdio bridge and server
services/agent/src/tools/mcp-bridge.ts, services/agent/src/tools/mcp-server.ts, services/agent/test/tool-bridge.test.ts, services/agent/test/mcp-servers.test.ts
mcp-bridge.ts builds McpServerStdio entries via buildToolMcpServers(), filtering client tools and wiring AGENTA_TOOL_SPECS/endpoint env vars. mcp-server.ts is a minimal JSON-RPC 2.0 stdio server handling initialize, tools/list, and tools/call via runResolvedTool. Tests validate bridge attachment logic and MCP→ACP env conversion.

Sequence Diagram(s)

sequenceDiagram
  participant AcpAgent as ACP Agent
  participant McpBridge as mcp-bridge.ts
  participant McpServer as mcp-server.ts
  participant Dispatch as runResolvedTool
  participant CodeExec as runCodeTool
  participant CallbackHTTP as callAgentaTool
  participant RelayFS as startToolRelay

  rect rgba(100, 149, 237, 0.5)
    note over AcpAgent,McpBridge: Bridge setup at harness start
    AcpAgent->>McpBridge: buildToolMcpServers(specs, callback)
    McpBridge-->>AcpAgent: McpServerStdio config (command, args, env)
    AcpAgent->>McpServer: spawn via stdio (AGENTA_TOOL_SPECS in env)
  end

  rect rgba(60, 179, 113, 0.5)
    note over AcpAgent,Dispatch: Per tool-call execution
    AcpAgent->>McpServer: tools/call {name, arguments} (JSON-RPC stdin)
    McpServer->>Dispatch: runResolvedTool(spec, params, opts)
    alt spec.kind == "code"
      Dispatch->>CodeExec: runCodeTool(runtime, code, env, args)
      CodeExec-->>Dispatch: stdout JSON string
    else spec.kind == "callback" and relayDir set
      Dispatch->>RelayFS: write *.req.json, poll for *.res.json
      RelayFS->>CallbackHTTP: callAgentaTool(endpoint, auth, args)
      CallbackHTTP-->>RelayFS: result string
      RelayFS-->>Dispatch: result string
    else spec.kind == "callback" direct
      Dispatch->>CallbackHTTP: callAgentaTool(endpoint, auth, args)
      CallbackHTTP-->>Dispatch: result string
    end
    Dispatch-->>McpServer: result string
    McpServer-->>AcpAgent: JSON-RPC success response (stdout)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(agent): runner wire contract and tool execution' accurately and specifically summarizes the main changes: introducing wire protocol contracts and tool execution capabilities for the agent service.
Docstring Coverage ✅ Passed Docstring coverage is 76.47% which is sufficient. The required threshold is 60.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively explains the changes, context, architecture decisions, and review guidance related to the agent runner wire contract and tool execution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-runner-tools

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mmabrouk

Copy link
Copy Markdown
Member Author

Reviewer guide: interesting code

Start here to find the load-bearing code fast.

  • services/agent/src/protocol.ts:64ResolvedToolSpec: the three orthogonal axes (executor kind, needsApproval, render) and why absent kind means callback.
  • services/agent/src/protocol.ts:293resolveRunSessionId: prefer the platform conversation id over the harness's ephemeral id.
  • services/agent/src/tools/dispatch.ts:104runResolvedTool: the one place that branches on kind (code runs local, client throws, callback routes back or relays).
  • services/agent/src/tools/code.ts:82BASE_ENV_ALLOWLIST and buildChildEnv: the deny-by-default boundary that keeps provider keys and AGENTA_* out of an author's snippet.
  • services/agent/src/tools/callback.ts:32callAgentaTool: the /tools/call envelope and response parse, kept in one place so the contract changes once.
  • services/agent/src/tools/relay.ts:51startToolRelay: the Daytona file relay for when the in-sandbox process cannot reach Agenta directly.

// excludes everything secret-bearing or sidecar-specific: no AGENTA_*, no *_API_KEY /
// *_TOKEN, no COMPOSIO_* / DAYTONA_*, no provider keys that the in-process Pi path writes
// into process.env before a run. Only the tool's declared scoped `env` is layered on top.
const BASE_ENV_ALLOWLIST = [

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security boundary to scrutinize: this is a deny-by-default allowlist for the code-tool subprocess. The child never inherits the sidecar process.env, so provider keys, AGENTA_* config, and COMPOSIO_/DAYTONA_ stay out of an author-supplied snippet. Only the tool's own resolved secrets are layered on top. Check that no entry here can carry a secret, and note the tradeoff: a runtime needing an env var not listed will fail to start.

params: unknown,
opts: RunResolvedToolOpts,
): Promise<string> {
if (spec.kind === "code") {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dispatch-by-kind: this is the single home for 'branch on spec.kind to run a resolved tool', previously duplicated across the Pi engine, the Pi-under-rivet extension, and the MCP server. Note the default: a spec with no kind falls through to the callback path (preserving the older gateway-only spec). Worth confirming a malformed spec that loses its kind routing to callback is intended, and that the callback path rejects what it cannot fulfil rather than failing silently.

}

/** Prefer the platform conversation id, falling back to the harness's ephemeral id. */
export function resolveRunSessionId(request: AgentRunRequest, fallback: string): string {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolveRunSessionId prefers the platform conversation id over the harness's ephemeral id. This is what keeps a session stable across turns when the harness mints its own throwaway id. Check the fallback: an empty/whitespace sessionId must fall back, not pin the run to a blank id.

let stdout = "";
let stderr = "";
let settled = false;
const finish = (fn: () => void) => {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subprocess lifecycle is the likely regression surface. The 'settled' guard plus finish() must resolve/reject exactly once across the timeout, abort, error, and close paths, and the temp dir is removed in finally. Worth a close read that no path can settle twice or leak the snippet dir.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
services/agent/test/tool-dispatch.test.ts (1)

64-83: ⚡ Quick win

Add callback/relay branch coverage for runResolvedTool

This file currently verifies only code and client branches. Adding tests for callback direct-call and relay paths would protect the new dispatch contract from regressions.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e248ef9d-0540-48ef-b446-fb03d48acbe9

📥 Commits

Reviewing files that changed from the base of the PR and between a97e608 and 2f467ce.

⛔ Files ignored due to path filters (1)
  • services/agent/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (15)
  • .gitignore
  • services/agent/.dockerignore
  • services/agent/package.json
  • services/agent/src/protocol.ts
  • services/agent/src/tools/callback.ts
  • services/agent/src/tools/code.ts
  • services/agent/src/tools/dispatch.ts
  • services/agent/src/tools/mcp-bridge.ts
  • services/agent/src/tools/mcp-server.ts
  • services/agent/src/tools/relay.ts
  • services/agent/test/code-tool.test.ts
  • services/agent/test/mcp-servers.test.ts
  • services/agent/test/tool-bridge.test.ts
  • services/agent/test/tool-dispatch.test.ts
  • services/agent/tsconfig.json

Comment on lines +43 to +47
const timeoutSignal = AbortSignal.timeout(TOOL_CALL_TIMEOUT_MS);
const anyOf = (AbortSignal as any).any;
const combined =
signal && typeof anyOf === "function" ? anyOf([signal, timeoutSignal]) : timeoutSignal;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Declared Node/runtime constraints =="
if [ -f services/agent/package.json ]; then
  jq '{engines, packageManager, volta}' services/agent/package.json
else
  echo "services/agent/package.json not found"
fi

echo
echo "== CI / container node version hints =="
rg -n --iglob '*.yml' --iglob '*.yaml' --iglob 'Dockerfile*' \
  'node-version|setup-node|FROM node|NODE_VERSION'

echo
echo "== AbortSignal API usage in repo =="
rg -n 'AbortSignal\.any|AbortSignal\.timeout'

Repository: Agenta-AI/agenta

Length of output: 2201


🏁 Script executed:

#!/bin/bash

echo "== .nvmrc or .tool-versions hints =="
find . -maxdepth 3 -name ".nvmrc" -o -name ".tool-versions" | head -20

echo
echo "== services/agent directory structure =="
ls -la services/agent/ | head -20

echo
echo "== Check if services/agent has its own .nvmrc =="
if [ -f services/agent/.nvmrc ]; then
  cat services/agent/.nvmrc
else
  echo "No .nvmrc found in services/agent"
fi

echo
echo "== Full services/agent/package.json (snippet) =="
head -30 services/agent/package.json

Repository: Agenta-AI/agenta

Length of output: 1839


Caller abort is dropped when AbortSignal.any is unavailable.

At lines 44–46, if AbortSignal.any is missing, the function only uses timeoutSignal, so external cancellation (signal) never propagates. While all current deployments target Node 20+, the services/agent/package.json lacks an explicit engines constraint, leaving this vulnerability open if the code is ever run on an older Node version.

Add an explicit minimum Node version to engines in services/agent/package.json (Node 20 or higher), or implement the fallback merge logic to handle runtimes lacking AbortSignal.any:

💡 Suggested fix (if engines constraint is not added)
   const timeoutSignal = AbortSignal.timeout(TOOL_CALL_TIMEOUT_MS);
   const anyOf = (AbortSignal as any).any;
-  const combined =
-    signal && typeof anyOf === "function" ? anyOf([signal, timeoutSignal]) : timeoutSignal;
+  let combined: AbortSignal = timeoutSignal;
+  if (signal) {
+    if (typeof anyOf === "function") {
+      combined = anyOf([signal, timeoutSignal]);
+    } else {
+      const ctrl = new AbortController();
+      const abort = () => ctrl.abort();
+      timeoutSignal.addEventListener("abort", abort, { once: true });
+      signal.addEventListener("abort", abort, { once: true });
+      combined = ctrl.signal;
+    }
+  }

Comment on lines +55 to +93
export async function relayToolCall(
dir: string,
callRef: string,
toolCallId: string,
params: unknown,
signal?: AbortSignal,
): Promise<string> {
const id = sanitizeRelayId(toolCallId);
const reqPath = `${dir}/${id}${RELAY_REQ_SUFFIX}`;
const resPath = `${dir}/${id}${RELAY_RES_SUFFIX}`;
try {
mkdirSync(dir, { recursive: true });
} catch {
// The runner also creates it; a race here is harmless.
}
writeFileSync(reqPath, JSON.stringify({ callRef, toolCallId, args: params ?? {} }), "utf-8");

const deadline = Date.now() + RELAY_TIMEOUT_MS;
while (Date.now() < deadline) {
if (signal?.aborted) throw new Error("aborted");
if (existsSync(resPath)) {
const res = JSON.parse(readFileSync(resPath, "utf-8")) as RelayResponse;
try {
unlinkSync(reqPath);
} catch {
/* best-effort cleanup */
}
try {
unlinkSync(resPath);
} catch {
/* best-effort cleanup */
}
if (res.ok) return res.text ?? "";
throw new Error(res.error || `tool relay failed for ${callRef}`);
}
await sleep(RELAY_POLL_MS);
}
throw new Error(`tool relay timed out for ${callRef}`);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Aborted or timed-out relay calls can still execute later

On Line 70 and Line 92, relayToolCall leaves the request file behind when aborted/timed out. The runner loop in services/agent/src/tools/relay.ts will still pick it up and execute the external tool call, creating side effects after cancellation.

Suggested fix
 export async function relayToolCall(
   dir: string,
   callRef: string,
   toolCallId: string,
   params: unknown,
   signal?: AbortSignal,
 ): Promise<string> {
   const id = sanitizeRelayId(toolCallId);
   const reqPath = `${dir}/${id}${RELAY_REQ_SUFFIX}`;
   const resPath = `${dir}/${id}${RELAY_RES_SUFFIX}`;
+  if (signal?.aborted) throw new Error("aborted");
   try {
     mkdirSync(dir, { recursive: true });
   } catch {
     // The runner also creates it; a race here is harmless.
   }
   writeFileSync(reqPath, JSON.stringify({ callRef, toolCallId, args: params ?? {} }), "utf-8");
 
-  const deadline = Date.now() + RELAY_TIMEOUT_MS;
-  while (Date.now() < deadline) {
-    if (signal?.aborted) throw new Error("aborted");
-    if (existsSync(resPath)) {
-      const res = JSON.parse(readFileSync(resPath, "utf-8")) as RelayResponse;
-      try {
-        unlinkSync(reqPath);
-      } catch {
-        /* best-effort cleanup */
-      }
-      try {
-        unlinkSync(resPath);
-      } catch {
-        /* best-effort cleanup */
-      }
-      if (res.ok) return res.text ?? "";
-      throw new Error(res.error || `tool relay failed for ${callRef}`);
+  try {
+    const deadline = Date.now() + RELAY_TIMEOUT_MS;
+    while (Date.now() < deadline) {
+      if (signal?.aborted) throw new Error("aborted");
+      if (existsSync(resPath)) {
+        const res = JSON.parse(readFileSync(resPath, "utf-8")) as RelayResponse;
+        if (res.ok) return res.text ?? "";
+        throw new Error(res.error || `tool relay failed for ${callRef}`);
+      }
+      await sleep(RELAY_POLL_MS);
     }
-    await sleep(RELAY_POLL_MS);
+    throw new Error(`tool relay timed out for ${callRef}`);
+  } finally {
+    try { unlinkSync(reqPath); } catch {}
+    try { unlinkSync(resPath); } catch {}
   }
-  throw new Error(`tool relay timed out for ${callRef}`);
 }

Comment on lines +117 to +128
// callback (default): route back to Agenta's /tools/call (directly or via the Daytona relay).
if (opts.relayDir) {
return relayToolCall(opts.relayDir, spec.callRef ?? "", opts.toolCallId, params, opts.signal);
}
return callAgentaTool(
opts.endpoint ?? "",
opts.authorization,
spec.callRef ?? "",
opts.toolCallId,
params,
opts.signal,
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate callback prerequisites before dispatching

On Line 122 and Line 124, falling back to "" for endpoint/callRef turns contract mistakes into opaque runtime failures. Fail fast with explicit validation before relay/HTTP dispatch.

Suggested fix
 export async function runResolvedTool(
   spec: ResolvedToolSpec,
   params: unknown,
   opts: RunResolvedToolOpts,
 ): Promise<string> {
   if (spec.kind === "code") {
     return runCodeTool(spec.runtime, spec.code ?? "", spec.env, params, opts.signal);
   }
   if (spec.kind === "client") {
     throw new Error(
       `client tool '${spec.name}' is browser-fulfilled and cannot be executed in-sandbox`,
     );
   }
+  if (!spec.callRef?.trim()) {
+    throw new Error(`callback tool '${spec.name}' is missing callRef`);
+  }
   // callback (default): route back to Agenta's /tools/call (directly or via the Daytona relay).
   if (opts.relayDir) {
-    return relayToolCall(opts.relayDir, spec.callRef ?? "", opts.toolCallId, params, opts.signal);
+    return relayToolCall(opts.relayDir, spec.callRef, opts.toolCallId, params, opts.signal);
   }
+  if (!opts.endpoint?.trim()) {
+    throw new Error(`callback tool '${spec.name}' requires a callback endpoint`);
+  }
   return callAgentaTool(
-    opts.endpoint ?? "",
+    opts.endpoint,
     opts.authorization,
-    spec.callRef ?? "",
+    spec.callRef,
     opts.toolCallId,
     params,
     opts.signal,
   );
 }

import { EMPTY_OBJECT_SCHEMA } from "./callback.ts";
import { runResolvedTool } from "./dispatch.ts";

const SPECS: ResolvedToolSpec[] = JSON.parse(process.env.AGENTA_TOOL_SPECS ?? "[]");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard AGENTA_TOOL_SPECS parsing to prevent hard startup crashes.

Line 25 parses env JSON at module load without error handling. A malformed value crashes the bridge before it can even reply to initialize. Default to [] with a stderr warning instead.

Suggested fix
-const SPECS: ResolvedToolSpec[] = JSON.parse(process.env.AGENTA_TOOL_SPECS ?? "[]");
+function parseSpecs(raw: string | undefined): ResolvedToolSpec[] {
+  try {
+    const parsed = JSON.parse(raw ?? "[]");
+    return Array.isArray(parsed) ? (parsed as ResolvedToolSpec[]) : [];
+  } catch {
+    process.stderr.write("[tool-bridge] invalid AGENTA_TOOL_SPECS JSON; defaulting to []\n");
+    return [];
+  }
+}
+
+const SPECS: ResolvedToolSpec[] = parseSpecs(process.env.AGENTA_TOOL_SPECS);

Comment on lines +109 to +132
process.stdin.setEncoding("utf8");
process.stdin.on("data", (chunk: string) => {
buffer += chunk;
let newline: number;
while ((newline = buffer.indexOf("\n")) !== -1) {
const line = buffer.slice(0, newline).trim();
buffer = buffer.slice(newline + 1);
if (!line) continue;
let parsed: any;
try {
parsed = JSON.parse(line);
} catch {
log(`skipping non-JSON line: ${line.slice(0, 120)}`);
continue;
}
Promise.resolve(handle(parsed))
.then((response) => {
if (response) send(response);
})
.catch((err) => log(`handler error: ${err?.message ?? err}`));
}
});
process.stdin.on("end", () => process.exit(0));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid exiting while requests are still in flight.

Line 131 calls process.exit(0) as soon as stdin ends. If a tools/call is still running, the process can terminate before sending its response.

Suggested fix
 function main(): void {
   log(`serving ${SPECS.length} tool(s) -> ${ENDPOINT || "(no endpoint)"}`);
   let buffer = "";
+  let inFlight = 0;
+  let stdinEnded = false;
+  const maybeExit = () => {
+    if (stdinEnded && inFlight === 0) process.exit(0);
+  };
+
   process.stdin.setEncoding("utf8");
   process.stdin.on("data", (chunk: string) => {
     buffer += chunk;
@@
-      Promise.resolve(handle(parsed))
+      inFlight += 1;
+      Promise.resolve(handle(parsed))
         .then((response) => {
           if (response) send(response);
         })
-        .catch((err) => log(`handler error: ${err?.message ?? err}`));
+        .catch((err) => log(`handler error: ${err?.message ?? err}`))
+        .finally(() => {
+          inFlight -= 1;
+          maybeExit();
+        });
     }
   });
-  process.stdin.on("end", () => process.exit(0));
+  process.stdin.on("end", () => {
+    stdinEnded = true;
+    maybeExit();
+  });
 }

): { stop: () => Promise<void> } {
let active = true;
const seen = new Set<string>();
const inflight: Promise<void>[] = [];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

inflight grows forever across relay lifetime.

At Lines 58 and 103-104, every handled request is retained permanently in inflight; this can create steady memory growth on long-running runners.

💡 Suggested fix
-  const inflight: Promise<void>[] = [];
+  const inflight = new Set<Promise<void>>();
@@
-          inflight.push(handle(name));
+          const task = handle(name).finally(() => inflight.delete(task));
+          inflight.add(task);
@@
-    await Promise.allSettled(inflight);
+    await Promise.allSettled([...inflight]);

Also applies to: 103-104, 110-110

@mmabrouk

Copy link
Copy Markdown
Member Author

Reviewer guide: interesting code

Start here, in order:

  • services/agent/src/protocol.ts:70ResolvedToolSpec.kind is the executor axis (callback | code | client). Absent means callback for back-compat. This one field drives the whole dispatch seam.
  • services/agent/src/tools/dispatch.ts:108runResolvedTool: the single place a resolved spec becomes a result. Branch on kind, one home, no import cycle back into a call site.
  • services/agent/src/tools/code.ts:82BASE_ENV_ALLOWLIST plus buildChildEnv: the security boundary. The subprocess inherits only this allowlist and the tool's scoped secrets, never process.env, so provider keys do not reach author code.
  • services/agent/src/tools/callback.ts:50 — the /tools/call envelope. Arguments go as an object, not a JSON string, to avoid double-encoding; the Composio key stays server-side.
  • services/agent/src/tools/relay.ts:48startToolRelay: the Daytona file relay. The in-sandbox process writes a request file; the runner (which can reach Agenta) executes and writes the response file.
  • services/agent/src/tools/mcp-bridge.ts:80 — the attach decision: a missing callback endpoint warns and names the affected tools but does not drop the whole MCP server, so code tools are not silently lost.

// excludes everything secret-bearing or sidecar-specific: no AGENTA_*, no *_API_KEY /
// *_TOKEN, no COMPOSIO_* / DAYTONA_*, no provider keys that the in-process Pi path writes
// into process.env before a run. Only the tool's declared scoped `env` is layered on top.
const BASE_ENV_ALLOWLIST = [

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allowlist is the isolation boundary for author-supplied code tools. The child gets only these vars plus the tool's scoped secrets, never process.env, so the provider keys the in-process Pi path writes there (OPENAI_API_KEY etc.) and AGENTA_* / COMPOSIO_* config stay out of the snippet. Anything added here leaks to every code tool; treat additions as a security change. The leak case in test/code-tool.test.ts guards it.

spec: ResolvedToolSpec,
params: unknown,
opts: RunResolvedToolOpts,
): Promise<string> {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dispatch seam. Every delivery path (in-process Pi, Pi-under-rivet, the MCP bridge) used to carry its own copy of this branch. Now it lives once. Note the default: a spec with no kind falls through to callback, so older resolvers that never set kind still route to /tools/call rather than getting dropped. The client branch throwing is intentional; browser-fulfilled tools must never run in-sandbox.

inputSchema?: Record<string, unknown> | null;
/** Set for `callback` (gateway) tools only; absent for `code` / `client`. */
callRef?: string;
kind?: "callback" | "code" | "client";

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This single field is the executor axis the whole tools/ folder branches on. It is orthogonal to needsApproval and render, not a subclass tag. The Python wire (sdks/python/agenta/sdk/agents/utils/wire.py) mirrors these names and the golden fixtures pin the two, so renaming or reordering here without updating that side fails test_wire_contract.py.

const callbackSpecs = executable.filter((s) => (s.kind ?? "callback") === "callback");
const hasEndpoint = Boolean(callback?.endpoint);

if (callbackSpecs.length > 0 && !hasEndpoint) {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The careful case: callback tools present but no endpoint. Dropping the whole server here would silently lose any code tools in the same set (they need no endpoint). Instead it warns, names the callback tools whose calls will fail, and still attaches the server. Worth confirming this is the behavior you want over failing fast.

@mmabrouk mmabrouk left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex subagent review for #4773

Findings:

  • Blocking security: services/agent/src/tools/mcp-bridge.ts:88 serializes every executable ResolvedToolSpec into the long-lived bridge process environment as AGENTA_TOOL_SPECS, and services/agent/src/tools/mcp-bridge.ts:96 also puts callback auth in env. That breaks the scoped-secret boundary documented in services/agent/src/protocol.ts:49 and enforced only at the final runCodeTool child env in services/agent/src/tools/code.ts:149. For the MCP bridge path, all code-tool spec.env values exist in the parent bridge env before a selected tool is spawned; an author-supplied code tool that can run local code can inspect same-user process environments on common Linux setups (for example via /proc/$PPID/environ) or otherwise benefit from env/process inspection, recovering other tools' scoped secrets and callback auth. Please avoid carrying full specs with env/auth through harness or bridge environment variables, or redact secret-bearing fields from AGENTA_TOOL_SPECS and inject only the selected tool's scoped env over a narrower channel immediately before spawning the code subprocess. This should also have a regression test that one code tool cannot read bridge-level tool specs, callback auth, or another tool's scoped secret.

  • Test/validation breakage: services/agent/test/tool-dispatch.test.ts:20 imports ../src/engines/pi.ts, and services/agent/test/mcp-servers.test.ts:15 imports ../src/engines/rivet.ts, but those files are not present on this PR head or on main. The commands advertised in those test files fail standalone with module-not-found, despite the PR body saying this branch is independent off main. Either keep these tests with the engine PR that introduces those modules, or make #4773's tests exercise only the contract/tool modules present here. Separately, services/agent/src/protocol.ts:4 says the Python wire/golden contract is pinned by files that also are not present on this branch, so this PR currently does not provide the promised standalone TS/Python parity guard.

Context checked: I read the #4779 docs branch first; the docs describe the broader active stack, but #4773's stack nav still points at #4777 and the docs are ahead of code in places, so I treated them as context rather than source of truth for this branch.

Tests not run: review-only via the GitHub app; I did not check out or execute the branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature Request New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files. typescript

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant