feat(agent): runner wire contract and tool execution#4773
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughIntroduces ChangesAgent Pi Wrapper Package
Sequence Diagram(s)sequenceDiagram
participant AcpAgent as ACP Agent
participant McpBridge as mcp-bridge.ts
participant McpServer as mcp-server.ts
participant Dispatch as runResolvedTool
participant CodeExec as runCodeTool
participant CallbackHTTP as callAgentaTool
participant RelayFS as startToolRelay
rect rgba(100, 149, 237, 0.5)
note over AcpAgent,McpBridge: Bridge setup at harness start
AcpAgent->>McpBridge: buildToolMcpServers(specs, callback)
McpBridge-->>AcpAgent: McpServerStdio config (command, args, env)
AcpAgent->>McpServer: spawn via stdio (AGENTA_TOOL_SPECS in env)
end
rect rgba(60, 179, 113, 0.5)
note over AcpAgent,Dispatch: Per tool-call execution
AcpAgent->>McpServer: tools/call {name, arguments} (JSON-RPC stdin)
McpServer->>Dispatch: runResolvedTool(spec, params, opts)
alt spec.kind == "code"
Dispatch->>CodeExec: runCodeTool(runtime, code, env, args)
CodeExec-->>Dispatch: stdout JSON string
else spec.kind == "callback" and relayDir set
Dispatch->>RelayFS: write *.req.json, poll for *.res.json
RelayFS->>CallbackHTTP: callAgentaTool(endpoint, auth, args)
CallbackHTTP-->>RelayFS: result string
RelayFS-->>Dispatch: result string
else spec.kind == "callback" direct
Dispatch->>CallbackHTTP: callAgentaTool(endpoint, auth, args)
CallbackHTTP-->>Dispatch: result string
end
Dispatch-->>McpServer: result string
McpServer-->>AcpAgent: JSON-RPC success response (stdout)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer guide: interesting codeStart here to find the load-bearing code fast.
|
| // excludes everything secret-bearing or sidecar-specific: no AGENTA_*, no *_API_KEY / | ||
| // *_TOKEN, no COMPOSIO_* / DAYTONA_*, no provider keys that the in-process Pi path writes | ||
| // into process.env before a run. Only the tool's declared scoped `env` is layered on top. | ||
| const BASE_ENV_ALLOWLIST = [ |
There was a problem hiding this comment.
Security boundary to scrutinize: this is a deny-by-default allowlist for the code-tool subprocess. The child never inherits the sidecar process.env, so provider keys, AGENTA_* config, and COMPOSIO_/DAYTONA_ stay out of an author-supplied snippet. Only the tool's own resolved secrets are layered on top. Check that no entry here can carry a secret, and note the tradeoff: a runtime needing an env var not listed will fail to start.
| params: unknown, | ||
| opts: RunResolvedToolOpts, | ||
| ): Promise<string> { | ||
| if (spec.kind === "code") { |
There was a problem hiding this comment.
Dispatch-by-kind: this is the single home for 'branch on spec.kind to run a resolved tool', previously duplicated across the Pi engine, the Pi-under-rivet extension, and the MCP server. Note the default: a spec with no kind falls through to the callback path (preserving the older gateway-only spec). Worth confirming a malformed spec that loses its kind routing to callback is intended, and that the callback path rejects what it cannot fulfil rather than failing silently.
| } | ||
|
|
||
| /** Prefer the platform conversation id, falling back to the harness's ephemeral id. */ | ||
| export function resolveRunSessionId(request: AgentRunRequest, fallback: string): string { |
There was a problem hiding this comment.
resolveRunSessionId prefers the platform conversation id over the harness's ephemeral id. This is what keeps a session stable across turns when the harness mints its own throwaway id. Check the fallback: an empty/whitespace sessionId must fall back, not pin the run to a blank id.
| let stdout = ""; | ||
| let stderr = ""; | ||
| let settled = false; | ||
| const finish = (fn: () => void) => { |
There was a problem hiding this comment.
Subprocess lifecycle is the likely regression surface. The 'settled' guard plus finish() must resolve/reject exactly once across the timeout, abort, error, and close paths, and the temp dir is removed in finally. Worth a close read that no path can settle twice or leak the snippet dir.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
services/agent/test/tool-dispatch.test.ts (1)
64-83: ⚡ Quick winAdd callback/relay branch coverage for
runResolvedToolThis file currently verifies only
codeandclientbranches. Adding tests for callback direct-call and relay paths would protect the new dispatch contract from regressions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: e248ef9d-0540-48ef-b446-fb03d48acbe9
⛔ Files ignored due to path filters (1)
services/agent/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (15)
.gitignoreservices/agent/.dockerignoreservices/agent/package.jsonservices/agent/src/protocol.tsservices/agent/src/tools/callback.tsservices/agent/src/tools/code.tsservices/agent/src/tools/dispatch.tsservices/agent/src/tools/mcp-bridge.tsservices/agent/src/tools/mcp-server.tsservices/agent/src/tools/relay.tsservices/agent/test/code-tool.test.tsservices/agent/test/mcp-servers.test.tsservices/agent/test/tool-bridge.test.tsservices/agent/test/tool-dispatch.test.tsservices/agent/tsconfig.json
| const timeoutSignal = AbortSignal.timeout(TOOL_CALL_TIMEOUT_MS); | ||
| const anyOf = (AbortSignal as any).any; | ||
| const combined = | ||
| signal && typeof anyOf === "function" ? anyOf([signal, timeoutSignal]) : timeoutSignal; | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Declared Node/runtime constraints =="
if [ -f services/agent/package.json ]; then
jq '{engines, packageManager, volta}' services/agent/package.json
else
echo "services/agent/package.json not found"
fi
echo
echo "== CI / container node version hints =="
rg -n --iglob '*.yml' --iglob '*.yaml' --iglob 'Dockerfile*' \
'node-version|setup-node|FROM node|NODE_VERSION'
echo
echo "== AbortSignal API usage in repo =="
rg -n 'AbortSignal\.any|AbortSignal\.timeout'Repository: Agenta-AI/agenta
Length of output: 2201
🏁 Script executed:
#!/bin/bash
echo "== .nvmrc or .tool-versions hints =="
find . -maxdepth 3 -name ".nvmrc" -o -name ".tool-versions" | head -20
echo
echo "== services/agent directory structure =="
ls -la services/agent/ | head -20
echo
echo "== Check if services/agent has its own .nvmrc =="
if [ -f services/agent/.nvmrc ]; then
cat services/agent/.nvmrc
else
echo "No .nvmrc found in services/agent"
fi
echo
echo "== Full services/agent/package.json (snippet) =="
head -30 services/agent/package.jsonRepository: Agenta-AI/agenta
Length of output: 1839
Caller abort is dropped when AbortSignal.any is unavailable.
At lines 44–46, if AbortSignal.any is missing, the function only uses timeoutSignal, so external cancellation (signal) never propagates. While all current deployments target Node 20+, the services/agent/package.json lacks an explicit engines constraint, leaving this vulnerability open if the code is ever run on an older Node version.
Add an explicit minimum Node version to engines in services/agent/package.json (Node 20 or higher), or implement the fallback merge logic to handle runtimes lacking AbortSignal.any:
💡 Suggested fix (if engines constraint is not added)
const timeoutSignal = AbortSignal.timeout(TOOL_CALL_TIMEOUT_MS);
const anyOf = (AbortSignal as any).any;
- const combined =
- signal && typeof anyOf === "function" ? anyOf([signal, timeoutSignal]) : timeoutSignal;
+ let combined: AbortSignal = timeoutSignal;
+ if (signal) {
+ if (typeof anyOf === "function") {
+ combined = anyOf([signal, timeoutSignal]);
+ } else {
+ const ctrl = new AbortController();
+ const abort = () => ctrl.abort();
+ timeoutSignal.addEventListener("abort", abort, { once: true });
+ signal.addEventListener("abort", abort, { once: true });
+ combined = ctrl.signal;
+ }
+ }| export async function relayToolCall( | ||
| dir: string, | ||
| callRef: string, | ||
| toolCallId: string, | ||
| params: unknown, | ||
| signal?: AbortSignal, | ||
| ): Promise<string> { | ||
| const id = sanitizeRelayId(toolCallId); | ||
| const reqPath = `${dir}/${id}${RELAY_REQ_SUFFIX}`; | ||
| const resPath = `${dir}/${id}${RELAY_RES_SUFFIX}`; | ||
| try { | ||
| mkdirSync(dir, { recursive: true }); | ||
| } catch { | ||
| // The runner also creates it; a race here is harmless. | ||
| } | ||
| writeFileSync(reqPath, JSON.stringify({ callRef, toolCallId, args: params ?? {} }), "utf-8"); | ||
|
|
||
| const deadline = Date.now() + RELAY_TIMEOUT_MS; | ||
| while (Date.now() < deadline) { | ||
| if (signal?.aborted) throw new Error("aborted"); | ||
| if (existsSync(resPath)) { | ||
| const res = JSON.parse(readFileSync(resPath, "utf-8")) as RelayResponse; | ||
| try { | ||
| unlinkSync(reqPath); | ||
| } catch { | ||
| /* best-effort cleanup */ | ||
| } | ||
| try { | ||
| unlinkSync(resPath); | ||
| } catch { | ||
| /* best-effort cleanup */ | ||
| } | ||
| if (res.ok) return res.text ?? ""; | ||
| throw new Error(res.error || `tool relay failed for ${callRef}`); | ||
| } | ||
| await sleep(RELAY_POLL_MS); | ||
| } | ||
| throw new Error(`tool relay timed out for ${callRef}`); | ||
| } |
There was a problem hiding this comment.
Aborted or timed-out relay calls can still execute later
On Line 70 and Line 92, relayToolCall leaves the request file behind when aborted/timed out. The runner loop in services/agent/src/tools/relay.ts will still pick it up and execute the external tool call, creating side effects after cancellation.
Suggested fix
export async function relayToolCall(
dir: string,
callRef: string,
toolCallId: string,
params: unknown,
signal?: AbortSignal,
): Promise<string> {
const id = sanitizeRelayId(toolCallId);
const reqPath = `${dir}/${id}${RELAY_REQ_SUFFIX}`;
const resPath = `${dir}/${id}${RELAY_RES_SUFFIX}`;
+ if (signal?.aborted) throw new Error("aborted");
try {
mkdirSync(dir, { recursive: true });
} catch {
// The runner also creates it; a race here is harmless.
}
writeFileSync(reqPath, JSON.stringify({ callRef, toolCallId, args: params ?? {} }), "utf-8");
- const deadline = Date.now() + RELAY_TIMEOUT_MS;
- while (Date.now() < deadline) {
- if (signal?.aborted) throw new Error("aborted");
- if (existsSync(resPath)) {
- const res = JSON.parse(readFileSync(resPath, "utf-8")) as RelayResponse;
- try {
- unlinkSync(reqPath);
- } catch {
- /* best-effort cleanup */
- }
- try {
- unlinkSync(resPath);
- } catch {
- /* best-effort cleanup */
- }
- if (res.ok) return res.text ?? "";
- throw new Error(res.error || `tool relay failed for ${callRef}`);
+ try {
+ const deadline = Date.now() + RELAY_TIMEOUT_MS;
+ while (Date.now() < deadline) {
+ if (signal?.aborted) throw new Error("aborted");
+ if (existsSync(resPath)) {
+ const res = JSON.parse(readFileSync(resPath, "utf-8")) as RelayResponse;
+ if (res.ok) return res.text ?? "";
+ throw new Error(res.error || `tool relay failed for ${callRef}`);
+ }
+ await sleep(RELAY_POLL_MS);
}
- await sleep(RELAY_POLL_MS);
+ throw new Error(`tool relay timed out for ${callRef}`);
+ } finally {
+ try { unlinkSync(reqPath); } catch {}
+ try { unlinkSync(resPath); } catch {}
}
- throw new Error(`tool relay timed out for ${callRef}`);
}| // callback (default): route back to Agenta's /tools/call (directly or via the Daytona relay). | ||
| if (opts.relayDir) { | ||
| return relayToolCall(opts.relayDir, spec.callRef ?? "", opts.toolCallId, params, opts.signal); | ||
| } | ||
| return callAgentaTool( | ||
| opts.endpoint ?? "", | ||
| opts.authorization, | ||
| spec.callRef ?? "", | ||
| opts.toolCallId, | ||
| params, | ||
| opts.signal, | ||
| ); |
There was a problem hiding this comment.
Validate callback prerequisites before dispatching
On Line 122 and Line 124, falling back to "" for endpoint/callRef turns contract mistakes into opaque runtime failures. Fail fast with explicit validation before relay/HTTP dispatch.
Suggested fix
export async function runResolvedTool(
spec: ResolvedToolSpec,
params: unknown,
opts: RunResolvedToolOpts,
): Promise<string> {
if (spec.kind === "code") {
return runCodeTool(spec.runtime, spec.code ?? "", spec.env, params, opts.signal);
}
if (spec.kind === "client") {
throw new Error(
`client tool '${spec.name}' is browser-fulfilled and cannot be executed in-sandbox`,
);
}
+ if (!spec.callRef?.trim()) {
+ throw new Error(`callback tool '${spec.name}' is missing callRef`);
+ }
// callback (default): route back to Agenta's /tools/call (directly or via the Daytona relay).
if (opts.relayDir) {
- return relayToolCall(opts.relayDir, spec.callRef ?? "", opts.toolCallId, params, opts.signal);
+ return relayToolCall(opts.relayDir, spec.callRef, opts.toolCallId, params, opts.signal);
}
+ if (!opts.endpoint?.trim()) {
+ throw new Error(`callback tool '${spec.name}' requires a callback endpoint`);
+ }
return callAgentaTool(
- opts.endpoint ?? "",
+ opts.endpoint,
opts.authorization,
- spec.callRef ?? "",
+ spec.callRef,
opts.toolCallId,
params,
opts.signal,
);
}| import { EMPTY_OBJECT_SCHEMA } from "./callback.ts"; | ||
| import { runResolvedTool } from "./dispatch.ts"; | ||
|
|
||
| const SPECS: ResolvedToolSpec[] = JSON.parse(process.env.AGENTA_TOOL_SPECS ?? "[]"); |
There was a problem hiding this comment.
Guard AGENTA_TOOL_SPECS parsing to prevent hard startup crashes.
Line 25 parses env JSON at module load without error handling. A malformed value crashes the bridge before it can even reply to initialize. Default to [] with a stderr warning instead.
Suggested fix
-const SPECS: ResolvedToolSpec[] = JSON.parse(process.env.AGENTA_TOOL_SPECS ?? "[]");
+function parseSpecs(raw: string | undefined): ResolvedToolSpec[] {
+ try {
+ const parsed = JSON.parse(raw ?? "[]");
+ return Array.isArray(parsed) ? (parsed as ResolvedToolSpec[]) : [];
+ } catch {
+ process.stderr.write("[tool-bridge] invalid AGENTA_TOOL_SPECS JSON; defaulting to []\n");
+ return [];
+ }
+}
+
+const SPECS: ResolvedToolSpec[] = parseSpecs(process.env.AGENTA_TOOL_SPECS);| process.stdin.setEncoding("utf8"); | ||
| process.stdin.on("data", (chunk: string) => { | ||
| buffer += chunk; | ||
| let newline: number; | ||
| while ((newline = buffer.indexOf("\n")) !== -1) { | ||
| const line = buffer.slice(0, newline).trim(); | ||
| buffer = buffer.slice(newline + 1); | ||
| if (!line) continue; | ||
| let parsed: any; | ||
| try { | ||
| parsed = JSON.parse(line); | ||
| } catch { | ||
| log(`skipping non-JSON line: ${line.slice(0, 120)}`); | ||
| continue; | ||
| } | ||
| Promise.resolve(handle(parsed)) | ||
| .then((response) => { | ||
| if (response) send(response); | ||
| }) | ||
| .catch((err) => log(`handler error: ${err?.message ?? err}`)); | ||
| } | ||
| }); | ||
| process.stdin.on("end", () => process.exit(0)); | ||
| } |
There was a problem hiding this comment.
Avoid exiting while requests are still in flight.
Line 131 calls process.exit(0) as soon as stdin ends. If a tools/call is still running, the process can terminate before sending its response.
Suggested fix
function main(): void {
log(`serving ${SPECS.length} tool(s) -> ${ENDPOINT || "(no endpoint)"}`);
let buffer = "";
+ let inFlight = 0;
+ let stdinEnded = false;
+ const maybeExit = () => {
+ if (stdinEnded && inFlight === 0) process.exit(0);
+ };
+
process.stdin.setEncoding("utf8");
process.stdin.on("data", (chunk: string) => {
buffer += chunk;
@@
- Promise.resolve(handle(parsed))
+ inFlight += 1;
+ Promise.resolve(handle(parsed))
.then((response) => {
if (response) send(response);
})
- .catch((err) => log(`handler error: ${err?.message ?? err}`));
+ .catch((err) => log(`handler error: ${err?.message ?? err}`))
+ .finally(() => {
+ inFlight -= 1;
+ maybeExit();
+ });
}
});
- process.stdin.on("end", () => process.exit(0));
+ process.stdin.on("end", () => {
+ stdinEnded = true;
+ maybeExit();
+ });
}| ): { stop: () => Promise<void> } { | ||
| let active = true; | ||
| const seen = new Set<string>(); | ||
| const inflight: Promise<void>[] = []; |
There was a problem hiding this comment.
inflight grows forever across relay lifetime.
At Lines 58 and 103-104, every handled request is retained permanently in inflight; this can create steady memory growth on long-running runners.
💡 Suggested fix
- const inflight: Promise<void>[] = [];
+ const inflight = new Set<Promise<void>>();
@@
- inflight.push(handle(name));
+ const task = handle(name).finally(() => inflight.delete(task));
+ inflight.add(task);
@@
- await Promise.allSettled(inflight);
+ await Promise.allSettled([...inflight]);Also applies to: 103-104, 110-110
Reviewer guide: interesting codeStart here, in order:
|
| // excludes everything secret-bearing or sidecar-specific: no AGENTA_*, no *_API_KEY / | ||
| // *_TOKEN, no COMPOSIO_* / DAYTONA_*, no provider keys that the in-process Pi path writes | ||
| // into process.env before a run. Only the tool's declared scoped `env` is layered on top. | ||
| const BASE_ENV_ALLOWLIST = [ |
There was a problem hiding this comment.
This allowlist is the isolation boundary for author-supplied code tools. The child gets only these vars plus the tool's scoped secrets, never process.env, so the provider keys the in-process Pi path writes there (OPENAI_API_KEY etc.) and AGENTA_* / COMPOSIO_* config stay out of the snippet. Anything added here leaks to every code tool; treat additions as a security change. The leak case in test/code-tool.test.ts guards it.
| spec: ResolvedToolSpec, | ||
| params: unknown, | ||
| opts: RunResolvedToolOpts, | ||
| ): Promise<string> { |
There was a problem hiding this comment.
The dispatch seam. Every delivery path (in-process Pi, Pi-under-rivet, the MCP bridge) used to carry its own copy of this branch. Now it lives once. Note the default: a spec with no kind falls through to callback, so older resolvers that never set kind still route to /tools/call rather than getting dropped. The client branch throwing is intentional; browser-fulfilled tools must never run in-sandbox.
| inputSchema?: Record<string, unknown> | null; | ||
| /** Set for `callback` (gateway) tools only; absent for `code` / `client`. */ | ||
| callRef?: string; | ||
| kind?: "callback" | "code" | "client"; |
There was a problem hiding this comment.
This single field is the executor axis the whole tools/ folder branches on. It is orthogonal to needsApproval and render, not a subclass tag. The Python wire (sdks/python/agenta/sdk/agents/utils/wire.py) mirrors these names and the golden fixtures pin the two, so renaming or reordering here without updating that side fails test_wire_contract.py.
| const callbackSpecs = executable.filter((s) => (s.kind ?? "callback") === "callback"); | ||
| const hasEndpoint = Boolean(callback?.endpoint); | ||
|
|
||
| if (callbackSpecs.length > 0 && !hasEndpoint) { |
There was a problem hiding this comment.
The careful case: callback tools present but no endpoint. Dropping the whole server here would silently lose any code tools in the same set (they need no endpoint). Instead it warns, names the callback tools whose calls will fail, and still attaches the server. Worth confirming this is the behavior you want over failing fast.
mmabrouk
left a comment
There was a problem hiding this comment.
Codex subagent review for #4773
Findings:
-
Blocking security:
services/agent/src/tools/mcp-bridge.ts:88serializes every executableResolvedToolSpecinto the long-lived bridge process environment asAGENTA_TOOL_SPECS, andservices/agent/src/tools/mcp-bridge.ts:96also puts callback auth in env. That breaks the scoped-secret boundary documented inservices/agent/src/protocol.ts:49and enforced only at the finalrunCodeToolchild env inservices/agent/src/tools/code.ts:149. For the MCP bridge path, all code-toolspec.envvalues exist in the parent bridge env before a selected tool is spawned; an author-supplied code tool that can run local code can inspect same-user process environments on common Linux setups (for example via/proc/$PPID/environ) or otherwise benefit from env/process inspection, recovering other tools' scoped secrets and callback auth. Please avoid carrying full specs withenv/auth through harness or bridge environment variables, or redact secret-bearing fields fromAGENTA_TOOL_SPECSand inject only the selected tool's scoped env over a narrower channel immediately before spawning the code subprocess. This should also have a regression test that one code tool cannot read bridge-level tool specs, callback auth, or another tool's scoped secret. -
Test/validation breakage:
services/agent/test/tool-dispatch.test.ts:20imports../src/engines/pi.ts, andservices/agent/test/mcp-servers.test.ts:15imports../src/engines/rivet.ts, but those files are not present on this PR head or onmain. The commands advertised in those test files fail standalone with module-not-found, despite the PR body saying this branch is independent offmain. Either keep these tests with the engine PR that introduces those modules, or make #4773's tests exercise only the contract/tool modules present here. Separately,services/agent/src/protocol.ts:4says the Python wire/golden contract is pinned by files that also are not present on this branch, so this PR currently does not provide the promised standalone TS/Python parity guard.
Context checked: I read the #4779 docs branch first; the docs describe the broader active stack, but #4773's stack nav still points at #4777 and the docs are ahead of code in places, so I treated them as context rather than source of truth for this branch.
Tests not run: review-only via the GitHub app; I did not check out or execute the branch.
Agent-workflows: functional PR set
Sliced by functional area, final code only (no intermediate churn). Most PRs are independent off
main; two pairs are stacked. This PR's base ismain.Context
The agent service drives an LLM harness (Pi in process, or Pi/Claude over ACP) for one turn. The harness needs two things from Agenta: a payload shape it can deserialize, and a way to run the tools the agent calls. This PR is the lower half of the TypeScript runner. It defines the
/runwire contract and the tool executors, with nothing that drives a model yet. It branches offmainand is independent; the runner-engine PR stacks on top of it and imports these executors. This is a functional slice that shows the final code, not the path the code took to get there.What this changes
The PR adds a new
services/agentTypeScript package with two parts.protocol.tsis the/runwire contract shared by both engines:AgentRunRequest,AgentRunResult,ResolvedToolSpec,HarnessCapabilities, theAgentEventunion, theStreamRecordNDJSON line, and the helpersresolvePromptTextandresolveRunSessionId. The Python side mirrors these names insdks/python/agenta/sdk/agents/utils/wire.py, and shared golden fixtures pin the two together. Defining the types here, rather than in one engine that the other imports from, is what lets the Pi and rivet engines stay peers.The
tools/folder executes a tool the backend already resolved. Before, each delivery path carried its own copy of "look at the spec, decide how to run it." Nowdispatch.tsowns that decision once and branches onspec.kind: acodetool runs in a sandbox subprocess, acallbacktool (the default) POSTs back to Agenta's/tools/call, and aclienttool throws because the browser fulfills it across a turn boundary. The call sites keep their own result-wrapping shape; only the execution itself is shared.relay.tsadds the Daytona file relay, used when the in-sandbox process cannot reach a private Agenta backend.mcp-bridge.tsandmcp-server.tsexpose the same resolved specs as a stdio MCP server for harnesses driven over ACP, which only accept tools through MCP.Key architectural decision to review
Two decisions carry the weight.
The dispatch-by-kind seam in
tools/dispatch.ts.runResolvedToolis the single place that turns aResolvedToolSpecinto a result, and the three executor kinds are orthogonal axes on the spec, not subclasses. The tradeoff: every caller funnels through this one function, so an executor bug surfaces everywhere at once, but a change to how a kind runs is a one-line edit instead of three. Check that the default (absentkind) stayscallbackfor back-compat, and that theclientbranch throwing is what the call sites expect.The subprocess env in
tools/code.ts. A code tool runs author-supplied snippet code, sobuildChildEnvis the security boundary. The child inherits onlyBASE_ENV_ALLOWLIST(PATH, HOME, locale, temp) plus the tool's own scoped secrets. It does not inheritprocess.env, so the provider keys the in-process Pi path writes there (OPENAI_API_KEYand friends),AGENTA_*config, andCOMPOSIO_*/DAYTONA_*never reach the snippet. Scrutinize the allowlist itself: anything added to it leaks to every code tool. The leak test intest/code-tool.test.tsis the guard.How to review this PR
src/protocol.tsfirst. It is the contract everything else speaks. Read the doc comments onAgentRunRequestandResolvedToolSpec; they explain each field's owner and lifetime.tools/dispatch.tsfor the seam,tools/code.tsfor the sandbox env,tools/callback.tsfor the/tools/callenvelope.mcp-bridge.ts(which specs justify attaching the server) andmcp-server.ts(the JSON-RPC stdio loop).pnpm-lock.yaml(3712 of the lines, generated) and thetsconfig/.dockerignorescaffolding.Likely regression: the env allowlist. A future hand that adds a variable here to "fix" a code tool would punch a hole in the isolation boundary. The other watch point is the
callbackdefault: a spec that arrives with nokindmust still route to/tools/call, not get dropped.Tests / notes
test/holds four assertion scripts run with tsx, no test framework.code-tool.test.tsexercises both runtimes through real subprocesses, including the node bare-function maincase and the env-isolation guarantee (provider key absent, scoped secret present).tool-dispatch.test.tscovers the kind branching,tool-bridge.test.tsthe MCP round-trip, andmcp-servers.test.tsthe attach decision. The engines that consumerunResolvedToolandbuildToolMcpServersland in the stacked runner-engine PR, so the wiring is exercised there.