feat(agent): runner engines, server, and tracing by mmabrouk · Pull Request #4774 · Agenta-AI/agenta

mmabrouk · 2026-06-19T16:29:03Z

Agent-workflows: functional PR set

Sliced by functional area, final code only (no intermediate churn). Most PRs are independent off main; two pairs are stacked. This PR's base is #4773 (review that first).

feat(sdk): agent runtime behind backend/harness ports #4771: SDK agent runtime: ports, adapters, tools, messages protocol
- feat(agent): agent workflow service and tool-resolution API #4772: Agent service + tool-resolution API
feat(agent): runner wire contract and tool execution #4773: Runner wire contract + tool execution
- feat(agent): runner engines, server, and tracing #4774: Runner engines, server, tracing <- you are here
feat(frontend): agent config playground controls #4775: Playground agent config UI
chore(hosting): wire the agent runner sidecar into compose #4776: Hosting compose wiring
docs(agent): agent-workflows design and ground truth #4777: Docs: design + ground truth

Context

The runner is the TypeScript process that actually drives a coding harness for one agent turn. The Python service stays thin: it resolves config, secrets, and tools, then hands a neutral AgentRunRequest to the runner and reads back an AgentRunResult. This PR adds the runner's engines, its two transports, its tracing, and the harness assets it ships with.

This is a functional slice that shows the final code. It stacks on feat/agent-runner-tools (the tool resolution + dispatch this engine layer calls into). Review that PR first.

What this changes

Two engines now sit behind one /run contract. engines/pi.ts drives the Pi SDK in process: it builds an in-memory session, injects AGENTS.md and forced skills through the resource loader, registers the resolved tools as Pi customTools, and self-instruments through the OTel extension. engines/rivet.ts drives a harness (pi or claude) over the Agent Client Protocol through a rivet sandbox-agent daemon, on a local host or in a Daytona sandbox. The harness and the sandbox are two independent axes that swap as config, not new code.

Two transports expose the same contract. server.ts is the dockerized HTTP sidecar: GET /health and POST /run, with NDJSON streaming selected by Accept: application/x-ndjson. cli.ts is the one-shot stdio path: one JSON request on stdin, one JSON result on stdout, stderr for logs.

Before, the rivet path answered a harness permission gate inline with a hardcoded auto-approve. Now responder.ts lifts that decision behind a Responder interface. PolicyResponder reproduces the old behavior exactly (auto-allow, or deny per policy/env), and a cross-turn HITL responder slots in later without touching the harness adapter.

Key architectural decision to review

The split-tracing model in tracing/otel.ts is the decision to scrutinize. Both engines must produce one uniform span tree (invoke_agent > turn > chat/execute_tool) nested under the caller's /invoke span, but they get their signal from two different places.

Local Pi self-instruments. The runner propagates the W3C traceparent into the Pi process through extensions/agenta.ts, and Pi's own pi.on(...) lifecycle hooks emit the real spans (createAgentaOtel). The rivet path cannot see those hooks because the harness is a separate process, so createRivetOtel rebuilds the same tree from ACP session/update events (agent_message_chunk, tool_call, usage_update). The emitSpans flag is what arbitrates: it is off for local Pi so the two paths never double up, and on for any other harness or for Daytona Pi (where the in-sandbox process cannot reach Agenta's OTLP, so the runner traces from the stream instead).

The load-bearing consequence is TraceBatchProcessor. Agenta rolls up token and cost metrics per ingest batch, and the harness span tree exports in a separate OTLP batch from the workflow span. So the per-batch roll-up cannot bridge them. The fix is to stamp the run-total usage directly on the invoke_agent span and to flush each trace as one batch by trace id (the root has a remote parent that never ends in this process, so root-end never fires). If you change how usage is summed or how batches are flushed, spans drop silently rather than erroring. That is the regression to guard.

The second decision worth a look is the cold continuation model in rivet.ts. Each invoke is a fresh sandbox, so prior turns replay as transcript text, and resolved tool turns are encoded as text (messageTranscript) because ACP prompt content blocks cannot carry tool calls or results. This is the substrate the cross-turn HITL path resumes from.

How to review this PR

Read engines/rivet.ts first. It is the largest file and it ties the rest together: sandbox selection, the daemon and extension env, the ACP session, the permission seam, the tracer wiring, and the usage readback. Check the runRivet body from line 674 down, especially the emitSpans: !isPi || isDaytona choice and the finally that always disposes the sandbox.

Then read engines/pi.ts for the in-process counterpart, tracing/otel.ts for the two tracers (the TraceBatchProcessor and the createRivetOtel delta state machine are the parts that earn their tests), responder.ts for the permission seam, and docker/README.md for the licensing posture.

Skip the line-by-line ACP marshalling helpers (acpBlockText, stripStartupBanner, model id matching). They are mechanical and covered by the stream-events test. The likely regression is in the tracing batch and usage path, not in the transports.

Tests / notes

test/stream-events.test.ts drives createRivetOtel with a hand-built ACP sequence and asserts the streaming delta lifecycle and the one-shot coalesced shape are both correct and never double-emit. test/responder.test.ts pins the responder parity with the old inline auto-approve and the emitEvent choke point. test/continuation.test.ts covers transcript replay of resolved tool turns. These run offline with no harness and no network.

Live harness runs (Pi local, Pi/Claude over rivet, Daytona) are verified by hand, not in CI. The Daytona snapshot builder and the OAuth upload paths need a live account to exercise. The licensing posture in docker/README.md is the rule to keep: bake Pi (MIT), never bake or distribute Claude, ship the snapshot builder rather than a prebuilt snapshot.

vercel · 2026-06-19T16:29:09Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 19, 2026 5:49pm

coderabbitai · 2026-06-19T16:29:11Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 316c8e1a-2fdd-4a6b-9e12-fca5aac0563a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/agent-runner-engine

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-19T17:53:04Z

Railway Preview Environment


Preview URL	https://gateway-production-aad1.up.railway.app/w
Project	`agenta-oss-pr-4774`
Image tag	`pr-4774-0340527`
Status	Deployed
Railway logs	Open logs
Workflow logs	View workflow run
Updated at 2026-06-19T17:53:03.664Z

mmabrouk · 2026-06-19T17:54:53Z

Reviewer guide: interesting code

The most informative places to read, in order:

services/agent/src/engines/rivet.ts:836 — emitSpans: !isPi || isDaytona. This one expression decides the whole tracing strategy: local Pi self-instruments (off), every other harness and Daytona Pi traces from the ACP stream (on). Get this wrong and you either lose the span tree or double it.
services/agent/src/tracing/otel.ts:114 — TraceBatchProcessor. Agenta rolls up usage per ingest batch. The harness span tree and the workflow span land in separate batches, so the trace is flushed as one batch by trace id and the run-total usage is stamped on the agent span directly. Change either and spans drop silently.
services/agent/src/tracing/otel.ts:711 — createRivetOtel. Rebuilds the same span tree from ACP session/update events plus a delta/lifecycle state machine for streaming. This is the part the stream-events test pins.
services/agent/src/responder.ts:39 — the Responder seam. PolicyResponder reproduces the old inline auto-approve exactly; a cross-turn HITL responder slots in here without touching the harness adapter.
services/agent/src/engines/rivet.ts:283 — messageTranscript / buildTurnText. Cold continuation: prior turns and resolved tool turns replay as text because ACP prompt blocks cannot carry tool calls or results.
services/agent/docker/README.md:14 — the licensing rule that shapes every image: bake Pi (MIT), never bake or distribute Claude, ship the snapshot builder, never bake a credential.

mmabrouk · 2026-06-19T17:55:08Z

+      endpoint: request.trace?.endpoint,
+      authorization: request.trace?.authorization,
+      captureContent: request.trace?.captureContent,
+      emitSpans: !isPi || isDaytona,


This is the tracing fork. emitSpans is off for local Pi (it self-instruments through extensions/agenta.ts under the propagated traceparent) and on otherwise. On Daytona Pi self-instruments for tools and usage but cannot reach Agenta's OTLP, so the runner traces from the ACP stream instead — hence || isDaytona. The two paths must never both be on for the same run or the span tree doubles.

mmabrouk · 2026-06-19T17:55:16Z

+    spans.push(span);
+    this.buffers.set(traceId, spans);
+    // No parent in this process => this is the local root and the trace is done.
+    if (!span.parentSpanId) {


The load-bearing reason this processor exists. The agent span tree and the workflow span export in separate OTLP batches, and Agenta computes cumulative usage per batch, so the per-batch roll-up cannot bridge them. A trace whose root has a remote parent never fires root-end here, so it is flushed explicitly by trace id (flushTrace), and the run-total usage is stamped on invoke_agent directly (agent_end). If batching or the usage sum changes, spans drop with no error.

mmabrouk · 2026-06-19T17:55:19Z

+}
+
+/** Headless responder: a fixed policy, no human in the loop. */
+export class PolicyResponder implements Responder {


This is the seam that replaces the old hardcoded inline auto-approve in rivet.ts. PolicyResponder reproduces the previous behavior exactly (auto-allow trusted backend tools, or deny per permissionPolicy / AGENTA_RIVET_DENY_PERMISSIONS), and responder.test.ts pins that parity. The point of the interface is that a cross-turn HITL responder — surface the gate to the browser, end the turn, resolve on the next turn's reply — slots in here without the harness adapter changing.

mmabrouk · 2026-06-19T17:55:22Z

+// uncaught exception that kills the whole process — taking every in-flight request with
+// it (the caller sees "Server disconnected"). Log and keep serving instead; the failing
+// run still returns its own error to its caller.
+process.on("unhandledRejection", (reason) => {


Worth keeping. The rivet SDK can reject a background promise (an adapter install, a Daytona preview SSE drop) outside any awaited path. Node's default turns that into an uncaught exception that kills the process and every in-flight request with it (the caller sees 'Server disconnected'). Logging and continuing keeps the sidecar serving; the failing run still returns its own error to its own caller.

mmabrouk

Codex subagent review for #4774

I found one blocking security issue. I attempted to submit this as REQUEST_CHANGES, but GitHub rejected that because the authenticated account owns the PR, so this is posted as a review comment instead. I read the #4779 design files as context only, with the coordination caveat that those docs are not code on #4779's own main-based SHA and pr-stack.md may be stale/proposed. I also checked #4778: the reviewed key files are code-identical to #4774, and #4778 appears to supersede this runner-engine PR organizationally, so these findings apply there too.

Findings:

Blocking security: Pi-over-rivet leaks scoped tool secrets and callback auth into the harness process env (services/agent/src/engines/rivet.ts:164, services/agent/src/extensions/agenta.ts:47). buildPiExtensionEnv serializes the full request.customTools array into AGENTA_TOOL_SPECS and also exports AGENTA_TOOL_CALLBACK_AUTH. Per #4773's wire/tool contract, a code tool's env contains resolved secrets that are supposed to be scoped to that code-tool subprocess only. On this path they become environment variables for the daemon/adapter/Pi harness process so any harness shell/read-env capability, or any prompt-influenced code running in that process, can read every code-tool secret plus the tool callback authorization. That bypasses the buildChildEnv isolation boundary from #4773. Please scrub secret-bearing fields from the harness-visible specs and move tool execution credentials behind a runner-owned side channel/relay, or otherwise keep them out of the model/harness process environment.
Local Pi ACP runs drop tool and usage events when emitSpans is false (services/agent/src/engines/rivet.ts:836, services/agent/src/tracing/otel.ts:873). The split-tracing switch disables synthetic spans for local Pi, but handleUpdate returns before processing tool_call, tool_call_update, and usage_update. That means local Pi-over-rivet still accumulates assistant text, but its structured event log/live stream loses tool call/results and usage events even if ACP emits them. Span emission and result-event emission need to be separated here; a regression test should drive createRivetOtel({ emitSpans: false }) with tool/usage updates.
ACP tracing exports before the final usage split is available and does not stamp usage on invoke_agent (services/agent/src/engines/rivet.ts:887, services/agent/src/tracing/otel.ts:966). runRivet calls run.finish() and run.flush() before it reads PromptResponse.usage, so the input/output split that is returned in AgentRunResult.usage never reaches the exported spans. In createRivetOtel.finish, only usage.total/usage.cost from the stream are put on the LLM span, and the invoke_agent span only gets output text before ending. That does not match the PR's stated load-bearing behavior of stamping run-total usage on invoke_agent, and it risks trace rollups missing prompt/completion usage for non-Pi harnesses. Consider computing final usage before flush and setting the run totals on the agent span, with a test that asserts the exported span attributes.

I did not run the test suite or live harnesses; this was a GitHub-only review. I did not find a Docker licensing blocker in the reviewed Dockerfile/README, aside from the #4778 supersession note above.

coderabbitai

Actionable comments posted: 12

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 99d58672-9613-45ca-b2b5-0555d6aab0db

📥 Commits

Reviewing files that changed from the base of the PR and between 2f467ce and b9a2e2e.

📒 Files selected for processing (18)

services/agent/README.md
services/agent/config/AGENTS.md
services/agent/config/agent.json
services/agent/docker/Dockerfile
services/agent/docker/Dockerfile.dev
services/agent/docker/README.md
services/agent/scripts/build-extension.mjs
services/agent/skills/agenta-getting-started/SKILL.md
services/agent/src/cli.ts
services/agent/src/engines/pi.ts
services/agent/src/engines/rivet.ts
services/agent/src/extensions/agenta.ts
services/agent/src/responder.ts
services/agent/src/server.ts
services/agent/src/tracing/otel.ts
services/agent/test/continuation.test.ts
services/agent/test/responder.test.ts
services/agent/test/stream-events.test.ts

coderabbitai · 2026-06-19T18:27:52Z

+FROM node:24-slim
+
+WORKDIR /app
+
+# CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude
+# Code) over HTTPS using the system trust store, which node:*-slim omits — without this
+# the daemon's `install-agent claude` fails TLS verification. git lets npm/installers
+# fetch git deps.
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends ca-certificates git \
+    && rm -rf /var/lib/apt/lists/*
+
+RUN corepack enable
+
+# Install deps as a cached layer (manifest + lockfile only). The full dependency set is
+# installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`,
+# both devDependencies.
+COPY package.json pnpm-lock.yaml ./
+RUN pnpm install --frozen-lockfile
+
+# Bake the source (no bind mount in production).
+COPY tsconfig.json ./
+COPY scripts ./scripts
+COPY src ./src
+COPY config ./config
+COPY skills ./skills
+
+# Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs
+# this baked copy into Pi's agent dir on every run. Rebuild the image after editing
+# src/extensions/agenta.ts or the tracer.
+RUN pnpm run build:extension
+
+ENV NODE_ENV=production \
+    PORT=8765
+
+EXPOSE 8765
+
+# Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the
+# container runs as a non-root host uid.
+CMD ["node_modules/.bin/tsx", "src/server.ts"]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run the production container as a non-root user.

Line 16-55 currently runs the sidecar as root (no USER set), which weakens container isolation for a network-exposed service process.

Suggested fix

FROM node:24-slim WORKDIR /app @@ RUN pnpm run build:extension ENV NODE_ENV=production \ PORT=8765 +RUN groupadd --system app && useradd --system --gid app --create-home app \ + && chown -R app:app /app +USER app + EXPOSE 8765 @@ CMD ["node_modules/.bin/tsx", "src/server.ts"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

FROM node:24-slim

WORKDIR /app

# CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude

# Code) over HTTPS using the system trust store, which node:*-slim omits — without this

# the daemon's `install-agent claude` fails TLS verification. git lets npm/installers

# fetch git deps.

RUN apt-get update \

&& apt-get install -y --no-install-recommends ca-certificates git \

&& rm -rf /var/lib/apt/lists/*

RUN corepack enable

# Install deps as a cached layer (manifest + lockfile only). The full dependency set is

# installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`,

# both devDependencies.

COPY package.json pnpm-lock.yaml ./

RUN pnpm install --frozen-lockfile

# Bake the source (no bind mount in production).

COPY tsconfig.json ./

COPY scripts ./scripts

COPY src ./src

COPY config ./config

COPY skills ./skills

# Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs

# this baked copy into Pi's agent dir on every run. Rebuild the image after editing

# src/extensions/agenta.ts or the tracer.

RUN pnpm run build:extension

ENV NODE_ENV=production \

PORT=8765

EXPOSE 8765

# Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the

# container runs as a non-root host uid.

CMD ["node_modules/.bin/tsx", "src/server.ts"]

FROM node:24-slim

WORKDIR /app

# CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude

# Code) over HTTPS using the system trust store, which node:*-slim omits — without this

# the daemon's `install-agent claude` fails TLS verification. git lets npm/installers

# fetch git deps.

RUN apt-get update \

&& apt-get install -y --no-install-recommends ca-certificates git \

&& rm -rf /var/lib/apt/lists/*

RUN corepack enable

# Install deps as a cached layer (manifest + lockfile only). The full dependency set is

# installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`,

# both devDependencies.

COPY package.json pnpm-lock.yaml ./

RUN pnpm install --frozen-lockfile

# Bake the source (no bind mount in production).

COPY tsconfig.json ./

COPY scripts ./scripts

COPY src ./src

COPY config ./config

COPY skills ./skills

# Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs

# this baked copy into Pi's agent dir on every run. Rebuild the image after editing

# src/extensions/agenta.ts or the tracer.

RUN pnpm run build:extension

ENV NODE_ENV=production \

PORT=8765

RUN groupadd --system app && useradd --system --gid app --create-home app \

&& chown -R app:app /app

USER app

EXPOSE 8765

# Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the

# container runs as a non-root host uid.

CMD ["node_modules/.bin/tsx", "src/server.ts"]

Source: Linters/SAST tools

coderabbitai · 2026-06-19T18:27:52Z

+FROM node:24-slim
+
+WORKDIR /app
+
+# CA certificates: the rivet daemon (Rust) downloads harness CLIs (e.g. Claude Code) over
+# HTTPS using the system trust store, which node:*-slim omits — without this the daemon's
+# `install-agent claude` fails TLS verification. git lets npm/installers fetch git deps.
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends ca-certificates git \
+    && rm -rf /var/lib/apt/lists/*
+
+RUN corepack enable
+
+# Install deps as a cached layer (manifest + lockfile only).
+COPY package.json pnpm-lock.yaml ./
+RUN pnpm install --frozen-lockfile
+
+# Fallback copy for non-mounted runs; in dev these are bind-mounted over.
+COPY tsconfig.json ./
+COPY scripts ./scripts
+COPY src ./src
+
+# Bundle the Agenta Pi extension (tracing + tools) into dist/. dist/ is NOT bind-mounted
+# in dev, so this baked copy is what runRivet installs into Pi's agent dir. Rebuild the
+# image after editing src/piExtension.ts or src/agenta-otel.ts.
+RUN pnpm run build:extension
+
+ENV NODE_ENV=development \
+    PORT=8765
+
+EXPOSE 8765
+
+# Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the
+# container runs as a non-root host uid.
+CMD ["node_modules/.bin/tsx", "watch", "src/server.ts"]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a non-root runtime user in the dev image too.

Line 7-41 leaves the process running as root; switching to a non-root user keeps dev closer to prod hardening and avoids unnecessary privilege.

Suggested fix

FROM node:24-slim @@ RUN pnpm run build:extension ENV NODE_ENV=development \ PORT=8765 +RUN groupadd --system app && useradd --system --gid app --create-home app \ + && chown -R app:app /app +USER app + EXPOSE 8765 @@ CMD ["node_modules/.bin/tsx", "watch", "src/server.ts"]

Source: Linters/SAST tools

coderabbitai · 2026-06-19T18:27:52Z

+  const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase();
+  return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit);
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

CLI backend routing does not match the /run transport contract.

Line 24-25 routes everything except "rivet" to Pi. Requests with backend: "auto" (or only harness/sandbox) won’t follow the same engine selection behavior as server.ts.

💡 Suggested fix

function runAgent( request: AgentRunRequest, emit?: EmitEvent, ): Promise<AgentRunResult> { - const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase(); - return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit); + const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "auto").toLowerCase(); + if (backend === "rivet") return runRivet(request, emit); + if (backend === "pi") return runPi(request, emit); + return request.harness || request.sandbox ? runRivet(request, emit) : runPi(request, emit); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase();

return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit);

}

const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "auto").toLowerCase();

if (backend === "rivet") return runRivet(request, emit);

if (backend === "pi") return runPi(request, emit);

return request.harness || request.sandbox ? runRivet(request, emit) : runPi(request, emit);

}

coderabbitai · 2026-06-19T18:27:52Z

+    process.exit(1);
+  }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n services/agent/src/cli.ts | head -100

Repository: Agenta-AI/agenta

Length of output: 3675

🌐 Web query:

Node.js process.exit stdout buffer flush truncation issue

💡 Result:

In Node.js, calling process.exit frequently leads to the truncation of data written to process.stdout and process.stderr [1][2][3]. This occurs because writes to these streams are often asynchronous and can span multiple ticks of the Node.js event loop [1][2]. When process.exit is invoked, it forces the process to terminate synchronously and immediately, effectively abandoning any pending asynchronous I/O operations before they have a chance to complete and flush their buffers [1][2][4]. This behavior is well-documented as a common pitfall [1][2]. Because the termination is abrupt, the Node.js runtime does not guarantee that data already passed to stdout or stderr will be fully written to the underlying operating system file descriptors [5][6]. To avoid this issue, the recommended practice is to avoid calling process.exit explicitly whenever possible [1][2]. Instead, developers should use the following strategies: 1. Allow the process to exit naturally: By ensuring no additional work is scheduled in the event loop, the Node.js process will exit gracefully on its own after all pending I/O operations are completed [1][2][7]. 2. Use process.exitCode: If you need to specify a particular exit status, set the process.exitCode property to the desired integer [1][2]. This informs Node.js of the correct code to use when the process terminates naturally, without forcing an immediate shutdown [2][4][7]. 3. Handle error conditions gracefully: If the process must terminate due to an error, it is considered safer to throw an uncaught exception (or handle the error appropriately) rather than calling process.exit [1][2]. While some workarounds involving setting stream handles to blocking mode (e.g., process.stdout._handle.setBlocking(true)) have been discussed in the community, they are generally discouraged as they rely on internal, unsupported APIs and may not be cross-platform or stable across different Node.js versions [8]. The most robust approach remains allowing the natural conclusion of the event loop [2][7].

Citations:

1: https://beta.docs.nodejs.org/process.html

2: https://nodejs.org/api/process.html

3: https://nodejs.org/docs/latest-v26.x/api/process.html

4: https://stackoverflow.com/questions/59260733/is-there-any-difference-between-process-exitcode-and-process-exit-in-node

5: Stdout is not completely flushed on process exit nodejs/node#2972

6: process: flush stdout/stderr upon process.exit() nodejs/node#6773

7: https://www.echooff.dev/blog/prefer-process-exitcode-over-process-exit-in-nodejs

8: stdio buffered writes (chunked) issues & process.exit() truncation nodejs/node#6456

Replace process.exit() with process.exitCode to prevent stdout truncation.

Lines 58, 65, 71, and 85 call process.exit() immediately after writing JSON/NDJSON to stdout. This forces abrupt process termination before buffered output fully flushes, causing data loss when stdout is piped—a common scenario in production systems.

Use process.exitCode = statusCode; followed by return; instead to allow the process to exit naturally after all buffered I/O completes:

Suggested fix

- process.exit(1); + process.exitCode = 1; + return; @@ - process.exit(result.ok ? 0 : 1); + process.exitCode = result.ok ? 0 : 1; + return; @@ - process.exit(1); + process.exitCode = 1; + return; @@ - process.exit(result.ok ? 0 : 1); + process.exitCode = result.ok ? 0 : 1;

coderabbitai · 2026-06-19T18:27:52Z

+function applySecrets(secrets: Record<string, string> | undefined): void {
+  for (const [key, value] of Object.entries(secrets ?? {})) {
+    if (value) process.env[key] = value;
+  }
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore request secrets after each run instead of leaving them in process.env.

Line 99 and Line 213 mutate global process env and never revert it. In a long-lived runner, one request’s provider keys can leak into subsequent runs.

Proposed fix

-function applySecrets(secrets: Record<string, string> | undefined): void { +function applySecrets(secrets: Record<string, string> | undefined): () => void { + const previous = new Map<string, string | undefined>(); for (const [key, value] of Object.entries(secrets ?? {})) { - if (value) process.env[key] = value; + if (!value) continue; + if (!previous.has(key)) previous.set(key, process.env[key]); + process.env[key] = value; } + return () => { + for (const [key, oldValue] of previous) { + if (oldValue === undefined) delete process.env[key]; + else process.env[key] = oldValue; + } + }; } @@ - applySecrets(request.secrets); + const restoreSecrets = applySecrets(request.secrets); @@ } finally { + restoreSecrets(); try { rmSync(cwd, { recursive: true, force: true });

Also applies to: 213-214, 394-400

coderabbitai · 2026-06-19T18:27:52Z

+  let specs: ResolvedToolSpec[] = [];
+  try {
+    specs = JSON.parse(raw);
+  } catch (err) {
+    log(`bad AGENTA_TOOL_SPECS: ${(err as Error).message}`);


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate AGENTA_TOOL_SPECS shape before iterating.

If AGENTA_TOOL_SPECS parses to an object/string instead of an array, the loop at Line 59 throws and the extension can fail the run.

Proposed fix

let specs: ResolvedToolSpec[] = []; try { - specs = JSON.parse(raw); + const parsed = JSON.parse(raw); + if (!Array.isArray(parsed)) { + log("bad AGENTA_TOOL_SPECS: expected a JSON array"); + return; + } + specs = parsed as ResolvedToolSpec[]; } catch (err) { log(`bad AGENTA_TOOL_SPECS: ${(err as Error).message}`); return; }

Also applies to: 59-60

coderabbitai · 2026-06-19T18:27:52Z

+  if (backend === "pi") return runPi(request, emit);
+  return request.harness || request.sandbox
+    ? runRivet(request, emit, signal)
+    : runPi(request, emit);


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Streaming disconnect cancellation is not effective for Pi runs.

Line 39 and Line 42 call runPi without an abort signal, so Line 68 abort only cancels Rivet runs. A disconnected NDJSON client can still leave Pi work running unobserved.

Also applies to: 62-68, 78-79

coderabbitai · 2026-06-19T18:27:52Z

+async function readBody(req: IncomingMessage): Promise<string> {
+  const chunks: Buffer[] = [];
+  for await (const chunk of req) {
+    chunks.push(chunk as Buffer);
+  }
+  return Buffer.concat(chunks).toString("utf8");
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a request-size limit before buffering the entire body.

Line 97-103 reads the full body into memory with no cap. A large request can exhaust memory and take down the sidecar.

💡 Suggested fix

+const MAX_BODY_BYTES = Number(process.env.AGENT_MAX_REQUEST_BYTES ?? 1_048_576); async function readBody(req: IncomingMessage): Promise<string> { const chunks: Buffer[] = []; + let total = 0; for await (const chunk of req) { - chunks.push(chunk as Buffer); + const buf = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk); + total += buf.length; + if (total > MAX_BODY_BYTES) { + throw new Error("REQUEST_TOO_LARGE"); + } + chunks.push(buf); } return Buffer.concat(chunks).toString("utf8"); } @@ if (req.method === "POST" && req.url === "/run") { - const raw = await readBody(req); + let raw: string; + try { + raw = await readBody(req); + } catch (err) { + if (err instanceof Error && err.message === "REQUEST_TOO_LARGE") { + return send(res, 413, { ok: false, error: "Request body too large" }); + } + throw err; + }

Also applies to: 111-113, 133-136

coderabbitai · 2026-06-19T18:27:52Z

+    if (!emitSpans) return; // output accumulated above; spans come from the harness
+
+    if (kind === "tool_call") {


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

emitSpans=false currently drops non-text Rivet events (tool/usage).

Line 873 short-circuits before tool_call, tool_call_update, and usage_update. Also, maybeCloseTool only emits tool_result when a span entry exists, which never happens when spans are disabled. In local Pi-through-Rivet mode, this yields incomplete events()/stream output.

Also applies to: 891-915, 919-931

coderabbitai · 2026-06-19T18:27:52Z

+  delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+  assert.equal(policyFromRequest(undefined), "auto");
+  assert.equal(policyFromRequest("auto"), "auto");
+  assert.equal(policyFromRequest("deny"), "deny");
+
+  process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";
+  assert.equal(policyFromRequest(undefined), "deny", "env forces deny");
+  assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");
+  delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Restore the previous env value to keep tests process-safe.

Line 23-31 mutates AGENTA_RIVET_DENY_PERMISSIONS but does not restore a pre-existing value.

💡 Suggested fix

{ - delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; - assert.equal(policyFromRequest(undefined), "auto"); - assert.equal(policyFromRequest("auto"), "auto"); - assert.equal(policyFromRequest("deny"), "deny"); - - process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true"; - assert.equal(policyFromRequest(undefined), "deny", "env forces deny"); - assert.equal(policyFromRequest("auto"), "deny", "env overrides auto"); - delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; + const prev = process.env.AGENTA_RIVET_DENY_PERMISSIONS; + try { + delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; + assert.equal(policyFromRequest(undefined), "auto"); + assert.equal(policyFromRequest("auto"), "auto"); + assert.equal(policyFromRequest("deny"), "deny"); + + process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true"; + assert.equal(policyFromRequest(undefined), "deny", "env forces deny"); + assert.equal(policyFromRequest("auto"), "deny", "env overrides auto"); + } finally { + if (prev === undefined) delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; + else process.env.AGENTA_RIVET_DENY_PERMISSIONS = prev; + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;

assert.equal(policyFromRequest(undefined), "auto");

assert.equal(policyFromRequest("auto"), "auto");

assert.equal(policyFromRequest("deny"), "deny");

process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";

assert.equal(policyFromRequest(undefined), "deny", "env forces deny");

assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");

delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;

const prev = process.env.AGENTA_RIVET_DENY_PERMISSIONS;

try {

delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;

assert.equal(policyFromRequest(undefined), "auto");

assert.equal(policyFromRequest("auto"), "auto");

assert.equal(policyFromRequest("deny"), "deny");

process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";

assert.equal(policyFromRequest(undefined), "deny", "env forces deny");

assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");

} finally {

if (prev === undefined) delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;

else process.env.AGENTA_RIVET_DENY_PERMISSIONS = prev;

}

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 19, 2026

dosubot Bot added feature typescript labels Jun 19, 2026

vercel Bot deployed to Preview June 19, 2026 16:29 View deployment

feat(agent): runner engines, server, and tracing

b9a2e2e

mmabrouk force-pushed the feat/agent-runner-engine branch from 72644cc to b9a2e2e Compare June 19, 2026 17:47

vercel Bot deployed to Preview June 19, 2026 17:49 View deployment

mmabrouk mentioned this pull request Jun 19, 2026

docs(agent): agent-workflows design and ground truth #4777

Open

mmabrouk commented Jun 19, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

		if (!emitSpans) return; // output accumulated above; spans come from the harness

		if (kind === "tool_call") {

Conversation

mmabrouk commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent-workflows: functional PR set

Context

What this changes

Key architectural decision to review

How to review this PR

Tests / notes

Uh oh!

vercel Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Jun 19, 2026

Railway Preview Environment

Uh oh!

mmabrouk commented Jun 19, 2026

Reviewer guide: interesting code

Uh oh!

mmabrouk Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mmabrouk commented Jun 19, 2026 •

edited

Loading

vercel Bot commented Jun 19, 2026 •

edited

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading