feat(agent): runner engines, server, and tracing#4774
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
72644cc to
b9a2e2e
Compare
Railway Preview Environment
|
Reviewer guide: interesting codeThe most informative places to read, in order:
|
| endpoint: request.trace?.endpoint, | ||
| authorization: request.trace?.authorization, | ||
| captureContent: request.trace?.captureContent, | ||
| emitSpans: !isPi || isDaytona, |
There was a problem hiding this comment.
This is the tracing fork. emitSpans is off for local Pi (it self-instruments through extensions/agenta.ts under the propagated traceparent) and on otherwise. On Daytona Pi self-instruments for tools and usage but cannot reach Agenta's OTLP, so the runner traces from the ACP stream instead — hence || isDaytona. The two paths must never both be on for the same run or the span tree doubles.
| spans.push(span); | ||
| this.buffers.set(traceId, spans); | ||
| // No parent in this process => this is the local root and the trace is done. | ||
| if (!span.parentSpanId) { |
There was a problem hiding this comment.
The load-bearing reason this processor exists. The agent span tree and the workflow span export in separate OTLP batches, and Agenta computes cumulative usage per batch, so the per-batch roll-up cannot bridge them. A trace whose root has a remote parent never fires root-end here, so it is flushed explicitly by trace id (flushTrace), and the run-total usage is stamped on invoke_agent directly (agent_end). If batching or the usage sum changes, spans drop with no error.
| } | ||
|
|
||
| /** Headless responder: a fixed policy, no human in the loop. */ | ||
| export class PolicyResponder implements Responder { |
There was a problem hiding this comment.
This is the seam that replaces the old hardcoded inline auto-approve in rivet.ts. PolicyResponder reproduces the previous behavior exactly (auto-allow trusted backend tools, or deny per permissionPolicy / AGENTA_RIVET_DENY_PERMISSIONS), and responder.test.ts pins that parity. The point of the interface is that a cross-turn HITL responder — surface the gate to the browser, end the turn, resolve on the next turn's reply — slots in here without the harness adapter changing.
| // uncaught exception that kills the whole process — taking every in-flight request with | ||
| // it (the caller sees "Server disconnected"). Log and keep serving instead; the failing | ||
| // run still returns its own error to its caller. | ||
| process.on("unhandledRejection", (reason) => { |
There was a problem hiding this comment.
Worth keeping. The rivet SDK can reject a background promise (an adapter install, a Daytona preview SSE drop) outside any awaited path. Node's default turns that into an uncaught exception that kills the process and every in-flight request with it (the caller sees 'Server disconnected'). Logging and continuing keeps the sidecar serving; the failing run still returns its own error to its own caller.
mmabrouk
left a comment
There was a problem hiding this comment.
Codex subagent review for #4774
I found one blocking security issue. I attempted to submit this as REQUEST_CHANGES, but GitHub rejected that because the authenticated account owns the PR, so this is posted as a review comment instead. I read the #4779 design files as context only, with the coordination caveat that those docs are not code on #4779's own main-based SHA and pr-stack.md may be stale/proposed. I also checked #4778: the reviewed key files are code-identical to #4774, and #4778 appears to supersede this runner-engine PR organizationally, so these findings apply there too.
Findings:
-
Blocking security: Pi-over-rivet leaks scoped tool secrets and callback auth into the harness process env (
services/agent/src/engines/rivet.ts:164,services/agent/src/extensions/agenta.ts:47).buildPiExtensionEnvserializes the fullrequest.customToolsarray intoAGENTA_TOOL_SPECSand also exportsAGENTA_TOOL_CALLBACK_AUTH. Per #4773's wire/tool contract, acodetool'senvcontains resolved secrets that are supposed to be scoped to that code-tool subprocess only. On this path they become environment variables for the daemon/adapter/Pi harness process so any harness shell/read-env capability, or any prompt-influenced code running in that process, can read every code-tool secret plus the tool callback authorization. That bypasses thebuildChildEnvisolation boundary from #4773. Please scrub secret-bearing fields from the harness-visible specs and move tool execution credentials behind a runner-owned side channel/relay, or otherwise keep them out of the model/harness process environment. -
Local Pi ACP runs drop tool and usage events when
emitSpansis false (services/agent/src/engines/rivet.ts:836,services/agent/src/tracing/otel.ts:873). The split-tracing switch disables synthetic spans for local Pi, buthandleUpdatereturns before processingtool_call,tool_call_update, andusage_update. That means local Pi-over-rivet still accumulates assistant text, but its structured event log/live stream loses tool call/results and usage events even if ACP emits them. Span emission and result-event emission need to be separated here; a regression test should drivecreateRivetOtel({ emitSpans: false })with tool/usage updates. -
ACP tracing exports before the final usage split is available and does not stamp usage on
invoke_agent(services/agent/src/engines/rivet.ts:887,services/agent/src/tracing/otel.ts:966).runRivetcallsrun.finish()andrun.flush()before it readsPromptResponse.usage, so the input/output split that is returned inAgentRunResult.usagenever reaches the exported spans. IncreateRivetOtel.finish, onlyusage.total/usage.costfrom the stream are put on the LLM span, and theinvoke_agentspan only gets output text before ending. That does not match the PR's stated load-bearing behavior of stamping run-total usage oninvoke_agent, and it risks trace rollups missing prompt/completion usage for non-Pi harnesses. Consider computing final usage before flush and setting the run totals on the agent span, with a test that asserts the exported span attributes.
I did not run the test suite or live harnesses; this was a GitHub-only review. I did not find a Docker licensing blocker in the reviewed Dockerfile/README, aside from the #4778 supersession note above.
There was a problem hiding this comment.
Actionable comments posted: 12
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 99d58672-9613-45ca-b2b5-0555d6aab0db
📒 Files selected for processing (18)
services/agent/README.mdservices/agent/config/AGENTS.mdservices/agent/config/agent.jsonservices/agent/docker/Dockerfileservices/agent/docker/Dockerfile.devservices/agent/docker/README.mdservices/agent/scripts/build-extension.mjsservices/agent/skills/agenta-getting-started/SKILL.mdservices/agent/src/cli.tsservices/agent/src/engines/pi.tsservices/agent/src/engines/rivet.tsservices/agent/src/extensions/agenta.tsservices/agent/src/responder.tsservices/agent/src/server.tsservices/agent/src/tracing/otel.tsservices/agent/test/continuation.test.tsservices/agent/test/responder.test.tsservices/agent/test/stream-events.test.ts
| FROM node:24-slim | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude | ||
| # Code) over HTTPS using the system trust store, which node:*-slim omits — without this | ||
| # the daemon's `install-agent claude` fails TLS verification. git lets npm/installers | ||
| # fetch git deps. | ||
| RUN apt-get update \ | ||
| && apt-get install -y --no-install-recommends ca-certificates git \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN corepack enable | ||
|
|
||
| # Install deps as a cached layer (manifest + lockfile only). The full dependency set is | ||
| # installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`, | ||
| # both devDependencies. | ||
| COPY package.json pnpm-lock.yaml ./ | ||
| RUN pnpm install --frozen-lockfile | ||
|
|
||
| # Bake the source (no bind mount in production). | ||
| COPY tsconfig.json ./ | ||
| COPY scripts ./scripts | ||
| COPY src ./src | ||
| COPY config ./config | ||
| COPY skills ./skills | ||
|
|
||
| # Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs | ||
| # this baked copy into Pi's agent dir on every run. Rebuild the image after editing | ||
| # src/extensions/agenta.ts or the tracer. | ||
| RUN pnpm run build:extension | ||
|
|
||
| ENV NODE_ENV=production \ | ||
| PORT=8765 | ||
|
|
||
| EXPOSE 8765 | ||
|
|
||
| # Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the | ||
| # container runs as a non-root host uid. | ||
| CMD ["node_modules/.bin/tsx", "src/server.ts"] |
There was a problem hiding this comment.
Run the production container as a non-root user.
Line 16-55 currently runs the sidecar as root (no USER set), which weakens container isolation for a network-exposed service process.
Suggested fix
FROM node:24-slim
WORKDIR /app
@@
RUN pnpm run build:extension
ENV NODE_ENV=production \
PORT=8765
+RUN groupadd --system app && useradd --system --gid app --create-home app \
+ && chown -R app:app /app
+USER app
+
EXPOSE 8765
@@
CMD ["node_modules/.bin/tsx", "src/server.ts"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| FROM node:24-slim | |
| WORKDIR /app | |
| # CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude | |
| # Code) over HTTPS using the system trust store, which node:*-slim omits — without this | |
| # the daemon's `install-agent claude` fails TLS verification. git lets npm/installers | |
| # fetch git deps. | |
| RUN apt-get update \ | |
| && apt-get install -y --no-install-recommends ca-certificates git \ | |
| && rm -rf /var/lib/apt/lists/* | |
| RUN corepack enable | |
| # Install deps as a cached layer (manifest + lockfile only). The full dependency set is | |
| # installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`, | |
| # both devDependencies. | |
| COPY package.json pnpm-lock.yaml ./ | |
| RUN pnpm install --frozen-lockfile | |
| # Bake the source (no bind mount in production). | |
| COPY tsconfig.json ./ | |
| COPY scripts ./scripts | |
| COPY src ./src | |
| COPY config ./config | |
| COPY skills ./skills | |
| # Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs | |
| # this baked copy into Pi's agent dir on every run. Rebuild the image after editing | |
| # src/extensions/agenta.ts or the tracer. | |
| RUN pnpm run build:extension | |
| ENV NODE_ENV=production \ | |
| PORT=8765 | |
| EXPOSE 8765 | |
| # Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the | |
| # container runs as a non-root host uid. | |
| CMD ["node_modules/.bin/tsx", "src/server.ts"] | |
| FROM node:24-slim | |
| WORKDIR /app | |
| # CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude | |
| # Code) over HTTPS using the system trust store, which node:*-slim omits — without this | |
| # the daemon's `install-agent claude` fails TLS verification. git lets npm/installers | |
| # fetch git deps. | |
| RUN apt-get update \ | |
| && apt-get install -y --no-install-recommends ca-certificates git \ | |
| && rm -rf /var/lib/apt/lists/* | |
| RUN corepack enable | |
| # Install deps as a cached layer (manifest + lockfile only). The full dependency set is | |
| # installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`, | |
| # both devDependencies. | |
| COPY package.json pnpm-lock.yaml ./ | |
| RUN pnpm install --frozen-lockfile | |
| # Bake the source (no bind mount in production). | |
| COPY tsconfig.json ./ | |
| COPY scripts ./scripts | |
| COPY src ./src | |
| COPY config ./config | |
| COPY skills ./skills | |
| # Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs | |
| # this baked copy into Pi's agent dir on every run. Rebuild the image after editing | |
| # src/extensions/agenta.ts or the tracer. | |
| RUN pnpm run build:extension | |
| ENV NODE_ENV=production \ | |
| PORT=8765 | |
| RUN groupadd --system app && useradd --system --gid app --create-home app \ | |
| && chown -R app:app /app | |
| USER app | |
| EXPOSE 8765 | |
| # Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the | |
| # container runs as a non-root host uid. | |
| CMD ["node_modules/.bin/tsx", "src/server.ts"] |
Source: Linters/SAST tools
| FROM node:24-slim | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # CA certificates: the rivet daemon (Rust) downloads harness CLIs (e.g. Claude Code) over | ||
| # HTTPS using the system trust store, which node:*-slim omits — without this the daemon's | ||
| # `install-agent claude` fails TLS verification. git lets npm/installers fetch git deps. | ||
| RUN apt-get update \ | ||
| && apt-get install -y --no-install-recommends ca-certificates git \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN corepack enable | ||
|
|
||
| # Install deps as a cached layer (manifest + lockfile only). | ||
| COPY package.json pnpm-lock.yaml ./ | ||
| RUN pnpm install --frozen-lockfile | ||
|
|
||
| # Fallback copy for non-mounted runs; in dev these are bind-mounted over. | ||
| COPY tsconfig.json ./ | ||
| COPY scripts ./scripts | ||
| COPY src ./src | ||
|
|
||
| # Bundle the Agenta Pi extension (tracing + tools) into dist/. dist/ is NOT bind-mounted | ||
| # in dev, so this baked copy is what runRivet installs into Pi's agent dir. Rebuild the | ||
| # image after editing src/piExtension.ts or src/agenta-otel.ts. | ||
| RUN pnpm run build:extension | ||
|
|
||
| ENV NODE_ENV=development \ | ||
| PORT=8765 | ||
|
|
||
| EXPOSE 8765 | ||
|
|
||
| # Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the | ||
| # container runs as a non-root host uid. | ||
| CMD ["node_modules/.bin/tsx", "watch", "src/server.ts"] |
There was a problem hiding this comment.
Use a non-root runtime user in the dev image too.
Line 7-41 leaves the process running as root; switching to a non-root user keeps dev closer to prod hardening and avoids unnecessary privilege.
Suggested fix
FROM node:24-slim
@@
RUN pnpm run build:extension
ENV NODE_ENV=development \
PORT=8765
+RUN groupadd --system app && useradd --system --gid app --create-home app \
+ && chown -R app:app /app
+USER app
+
EXPOSE 8765
@@
CMD ["node_modules/.bin/tsx", "watch", "src/server.ts"]Source: Linters/SAST tools
| const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase(); | ||
| return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit); | ||
| } |
There was a problem hiding this comment.
CLI backend routing does not match the /run transport contract.
Line 24-25 routes everything except "rivet" to Pi. Requests with backend: "auto" (or only harness/sandbox) won’t follow the same engine selection behavior as server.ts.
💡 Suggested fix
function runAgent(
request: AgentRunRequest,
emit?: EmitEvent,
): Promise<AgentRunResult> {
- const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase();
- return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit);
+ const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "auto").toLowerCase();
+ if (backend === "rivet") return runRivet(request, emit);
+ if (backend === "pi") return runPi(request, emit);
+ return request.harness || request.sandbox ? runRivet(request, emit) : runPi(request, emit);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase(); | |
| return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit); | |
| } | |
| const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "auto").toLowerCase(); | |
| if (backend === "rivet") return runRivet(request, emit); | |
| if (backend === "pi") return runPi(request, emit); | |
| return request.harness || request.sandbox ? runRivet(request, emit) : runPi(request, emit); | |
| } |
| process.exit(1); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n services/agent/src/cli.ts | head -100Repository: Agenta-AI/agenta
Length of output: 3675
🌐 Web query:
Node.js process.exit stdout buffer flush truncation issue
💡 Result:
In Node.js, calling process.exit frequently leads to the truncation of data written to process.stdout and process.stderr [1][2][3]. This occurs because writes to these streams are often asynchronous and can span multiple ticks of the Node.js event loop [1][2]. When process.exit is invoked, it forces the process to terminate synchronously and immediately, effectively abandoning any pending asynchronous I/O operations before they have a chance to complete and flush their buffers [1][2][4]. This behavior is well-documented as a common pitfall [1][2]. Because the termination is abrupt, the Node.js runtime does not guarantee that data already passed to stdout or stderr will be fully written to the underlying operating system file descriptors [5][6]. To avoid this issue, the recommended practice is to avoid calling process.exit explicitly whenever possible [1][2]. Instead, developers should use the following strategies: 1. Allow the process to exit naturally: By ensuring no additional work is scheduled in the event loop, the Node.js process will exit gracefully on its own after all pending I/O operations are completed [1][2][7]. 2. Use process.exitCode: If you need to specify a particular exit status, set the process.exitCode property to the desired integer [1][2]. This informs Node.js of the correct code to use when the process terminates naturally, without forcing an immediate shutdown [2][4][7]. 3. Handle error conditions gracefully: If the process must terminate due to an error, it is considered safer to throw an uncaught exception (or handle the error appropriately) rather than calling process.exit [1][2]. While some workarounds involving setting stream handles to blocking mode (e.g., process.stdout._handle.setBlocking(true)) have been discussed in the community, they are generally discouraged as they rely on internal, unsupported APIs and may not be cross-platform or stable across different Node.js versions [8]. The most robust approach remains allowing the natural conclusion of the event loop [2][7].
Citations:
- 1: https://beta.docs.nodejs.org/process.html
- 2: https://nodejs.org/api/process.html
- 3: https://nodejs.org/docs/latest-v26.x/api/process.html
- 4: https://stackoverflow.com/questions/59260733/is-there-any-difference-between-process-exitcode-and-process-exit-in-node
- 5: Stdout is not completely flushed on process exit nodejs/node#2972
- 6: process: flush stdout/stderr upon
process.exit()nodejs/node#6773 - 7: https://www.echooff.dev/blog/prefer-process-exitcode-over-process-exit-in-nodejs
- 8: stdio buffered writes (chunked) issues & process.exit() truncation nodejs/node#6456
Replace process.exit() with process.exitCode to prevent stdout truncation.
Lines 58, 65, 71, and 85 call process.exit() immediately after writing JSON/NDJSON to stdout. This forces abrupt process termination before buffered output fully flushes, causing data loss when stdout is piped—a common scenario in production systems.
Use process.exitCode = statusCode; followed by return; instead to allow the process to exit naturally after all buffered I/O completes:
Suggested fix
- process.exit(1);
+ process.exitCode = 1;
+ return;
@@
- process.exit(result.ok ? 0 : 1);
+ process.exitCode = result.ok ? 0 : 1;
+ return;
@@
- process.exit(1);
+ process.exitCode = 1;
+ return;
@@
- process.exit(result.ok ? 0 : 1);
+ process.exitCode = result.ok ? 0 : 1;| function applySecrets(secrets: Record<string, string> | undefined): void { | ||
| for (const [key, value] of Object.entries(secrets ?? {})) { | ||
| if (value) process.env[key] = value; | ||
| } | ||
| } |
There was a problem hiding this comment.
Restore request secrets after each run instead of leaving them in process.env.
Line 99 and Line 213 mutate global process env and never revert it. In a long-lived runner, one request’s provider keys can leak into subsequent runs.
Proposed fix
-function applySecrets(secrets: Record<string, string> | undefined): void {
+function applySecrets(secrets: Record<string, string> | undefined): () => void {
+ const previous = new Map<string, string | undefined>();
for (const [key, value] of Object.entries(secrets ?? {})) {
- if (value) process.env[key] = value;
+ if (!value) continue;
+ if (!previous.has(key)) previous.set(key, process.env[key]);
+ process.env[key] = value;
}
+ return () => {
+ for (const [key, oldValue] of previous) {
+ if (oldValue === undefined) delete process.env[key];
+ else process.env[key] = oldValue;
+ }
+ };
}
@@
- applySecrets(request.secrets);
+ const restoreSecrets = applySecrets(request.secrets);
@@
} finally {
+ restoreSecrets();
try {
rmSync(cwd, { recursive: true, force: true });Also applies to: 213-214, 394-400
| let specs: ResolvedToolSpec[] = []; | ||
| try { | ||
| specs = JSON.parse(raw); | ||
| } catch (err) { | ||
| log(`bad AGENTA_TOOL_SPECS: ${(err as Error).message}`); |
There was a problem hiding this comment.
Validate AGENTA_TOOL_SPECS shape before iterating.
If AGENTA_TOOL_SPECS parses to an object/string instead of an array, the loop at Line 59 throws and the extension can fail the run.
Proposed fix
let specs: ResolvedToolSpec[] = [];
try {
- specs = JSON.parse(raw);
+ const parsed = JSON.parse(raw);
+ if (!Array.isArray(parsed)) {
+ log("bad AGENTA_TOOL_SPECS: expected a JSON array");
+ return;
+ }
+ specs = parsed as ResolvedToolSpec[];
} catch (err) {
log(`bad AGENTA_TOOL_SPECS: ${(err as Error).message}`);
return;
}Also applies to: 59-60
| if (backend === "pi") return runPi(request, emit); | ||
| return request.harness || request.sandbox | ||
| ? runRivet(request, emit, signal) | ||
| : runPi(request, emit); |
There was a problem hiding this comment.
Streaming disconnect cancellation is not effective for Pi runs.
Line 39 and Line 42 call runPi without an abort signal, so Line 68 abort only cancels Rivet runs. A disconnected NDJSON client can still leave Pi work running unobserved.
Also applies to: 62-68, 78-79
| async function readBody(req: IncomingMessage): Promise<string> { | ||
| const chunks: Buffer[] = []; | ||
| for await (const chunk of req) { | ||
| chunks.push(chunk as Buffer); | ||
| } | ||
| return Buffer.concat(chunks).toString("utf8"); | ||
| } |
There was a problem hiding this comment.
Add a request-size limit before buffering the entire body.
Line 97-103 reads the full body into memory with no cap. A large request can exhaust memory and take down the sidecar.
💡 Suggested fix
+const MAX_BODY_BYTES = Number(process.env.AGENT_MAX_REQUEST_BYTES ?? 1_048_576);
async function readBody(req: IncomingMessage): Promise<string> {
const chunks: Buffer[] = [];
+ let total = 0;
for await (const chunk of req) {
- chunks.push(chunk as Buffer);
+ const buf = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk);
+ total += buf.length;
+ if (total > MAX_BODY_BYTES) {
+ throw new Error("REQUEST_TOO_LARGE");
+ }
+ chunks.push(buf);
}
return Buffer.concat(chunks).toString("utf8");
}
@@
if (req.method === "POST" && req.url === "/run") {
- const raw = await readBody(req);
+ let raw: string;
+ try {
+ raw = await readBody(req);
+ } catch (err) {
+ if (err instanceof Error && err.message === "REQUEST_TOO_LARGE") {
+ return send(res, 413, { ok: false, error: "Request body too large" });
+ }
+ throw err;
+ }Also applies to: 111-113, 133-136
| if (!emitSpans) return; // output accumulated above; spans come from the harness | ||
|
|
||
| if (kind === "tool_call") { |
There was a problem hiding this comment.
emitSpans=false currently drops non-text Rivet events (tool/usage).
Line 873 short-circuits before tool_call, tool_call_update, and usage_update. Also, maybeCloseTool only emits tool_result when a span entry exists, which never happens when spans are disabled. In local Pi-through-Rivet mode, this yields incomplete events()/stream output.
Also applies to: 891-915, 919-931
| delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; | ||
| assert.equal(policyFromRequest(undefined), "auto"); | ||
| assert.equal(policyFromRequest("auto"), "auto"); | ||
| assert.equal(policyFromRequest("deny"), "deny"); | ||
|
|
||
| process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true"; | ||
| assert.equal(policyFromRequest(undefined), "deny", "env forces deny"); | ||
| assert.equal(policyFromRequest("auto"), "deny", "env overrides auto"); | ||
| delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; |
There was a problem hiding this comment.
Restore the previous env value to keep tests process-safe.
Line 23-31 mutates AGENTA_RIVET_DENY_PERMISSIONS but does not restore a pre-existing value.
💡 Suggested fix
{
- delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
- assert.equal(policyFromRequest(undefined), "auto");
- assert.equal(policyFromRequest("auto"), "auto");
- assert.equal(policyFromRequest("deny"), "deny");
-
- process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";
- assert.equal(policyFromRequest(undefined), "deny", "env forces deny");
- assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");
- delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+ const prev = process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+ try {
+ delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+ assert.equal(policyFromRequest(undefined), "auto");
+ assert.equal(policyFromRequest("auto"), "auto");
+ assert.equal(policyFromRequest("deny"), "deny");
+
+ process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";
+ assert.equal(policyFromRequest(undefined), "deny", "env forces deny");
+ assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");
+ } finally {
+ if (prev === undefined) delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+ else process.env.AGENTA_RIVET_DENY_PERMISSIONS = prev;
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; | |
| assert.equal(policyFromRequest(undefined), "auto"); | |
| assert.equal(policyFromRequest("auto"), "auto"); | |
| assert.equal(policyFromRequest("deny"), "deny"); | |
| process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true"; | |
| assert.equal(policyFromRequest(undefined), "deny", "env forces deny"); | |
| assert.equal(policyFromRequest("auto"), "deny", "env overrides auto"); | |
| delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; | |
| const prev = process.env.AGENTA_RIVET_DENY_PERMISSIONS; | |
| try { | |
| delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; | |
| assert.equal(policyFromRequest(undefined), "auto"); | |
| assert.equal(policyFromRequest("auto"), "auto"); | |
| assert.equal(policyFromRequest("deny"), "deny"); | |
| process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true"; | |
| assert.equal(policyFromRequest(undefined), "deny", "env forces deny"); | |
| assert.equal(policyFromRequest("auto"), "deny", "env overrides auto"); | |
| } finally { | |
| if (prev === undefined) delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; | |
| else process.env.AGENTA_RIVET_DENY_PERMISSIONS = prev; | |
| } |
Agent-workflows: functional PR set
Sliced by functional area, final code only (no intermediate churn). Most PRs are independent off
main; two pairs are stacked. This PR's base is #4773 (review that first).Context
The runner is the TypeScript process that actually drives a coding harness for one agent turn. The Python service stays thin: it resolves config, secrets, and tools, then hands a neutral
AgentRunRequestto the runner and reads back anAgentRunResult. This PR adds the runner's engines, its two transports, its tracing, and the harness assets it ships with.This is a functional slice that shows the final code. It stacks on
feat/agent-runner-tools(the tool resolution + dispatch this engine layer calls into). Review that PR first.What this changes
Two engines now sit behind one
/runcontract.engines/pi.tsdrives the Pi SDK in process: it builds an in-memory session, injectsAGENTS.mdand forced skills through the resource loader, registers the resolved tools as PicustomTools, and self-instruments through the OTel extension.engines/rivet.tsdrives a harness (piorclaude) over the Agent Client Protocol through a rivetsandbox-agentdaemon, on a local host or in a Daytona sandbox. The harness and the sandbox are two independent axes that swap as config, not new code.Two transports expose the same contract.
server.tsis the dockerized HTTP sidecar:GET /healthandPOST /run, with NDJSON streaming selected byAccept: application/x-ndjson.cli.tsis the one-shot stdio path: one JSON request on stdin, one JSON result on stdout, stderr for logs.Before, the rivet path answered a harness permission gate inline with a hardcoded auto-approve. Now
responder.tslifts that decision behind aResponderinterface.PolicyResponderreproduces the old behavior exactly (auto-allow, or deny per policy/env), and a cross-turn HITL responder slots in later without touching the harness adapter.Key architectural decision to review
The split-tracing model in
tracing/otel.tsis the decision to scrutinize. Both engines must produce one uniform span tree (invoke_agent>turn>chat/execute_tool) nested under the caller's/invokespan, but they get their signal from two different places.Local Pi self-instruments. The runner propagates the W3C
traceparentinto the Pi process throughextensions/agenta.ts, and Pi's ownpi.on(...)lifecycle hooks emit the real spans (createAgentaOtel). The rivet path cannot see those hooks because the harness is a separate process, socreateRivetOtelrebuilds the same tree from ACPsession/updateevents (agent_message_chunk,tool_call,usage_update). TheemitSpansflag is what arbitrates: it is off for local Pi so the two paths never double up, and on for any other harness or for Daytona Pi (where the in-sandbox process cannot reach Agenta's OTLP, so the runner traces from the stream instead).The load-bearing consequence is
TraceBatchProcessor. Agenta rolls up token and cost metrics per ingest batch, and the harness span tree exports in a separate OTLP batch from the workflow span. So the per-batch roll-up cannot bridge them. The fix is to stamp the run-total usage directly on theinvoke_agentspan and to flush each trace as one batch by trace id (the root has a remote parent that never ends in this process, so root-end never fires). If you change how usage is summed or how batches are flushed, spans drop silently rather than erroring. That is the regression to guard.The second decision worth a look is the cold continuation model in
rivet.ts. Each invoke is a fresh sandbox, so prior turns replay as transcript text, and resolved tool turns are encoded as text (messageTranscript) because ACP prompt content blocks cannot carry tool calls or results. This is the substrate the cross-turn HITL path resumes from.How to review this PR
Read
engines/rivet.tsfirst. It is the largest file and it ties the rest together: sandbox selection, the daemon and extension env, the ACP session, the permission seam, the tracer wiring, and the usage readback. Check therunRivetbody from line 674 down, especially theemitSpans: !isPi || isDaytonachoice and thefinallythat always disposes the sandbox.Then read
engines/pi.tsfor the in-process counterpart,tracing/otel.tsfor the two tracers (theTraceBatchProcessorand thecreateRivetOteldelta state machine are the parts that earn their tests),responder.tsfor the permission seam, anddocker/README.mdfor the licensing posture.Skip the line-by-line ACP marshalling helpers (
acpBlockText,stripStartupBanner, model id matching). They are mechanical and covered by the stream-events test. The likely regression is in the tracing batch and usage path, not in the transports.Tests / notes
test/stream-events.test.tsdrivescreateRivetOtelwith a hand-built ACP sequence and asserts the streaming delta lifecycle and the one-shot coalesced shape are both correct and never double-emit.test/responder.test.tspins the responder parity with the old inline auto-approve and theemitEventchoke point.test/continuation.test.tscovers transcript replay of resolved tool turns. These run offline with no harness and no network.Live harness runs (Pi local, Pi/Claude over rivet, Daytona) are verified by hand, not in CI. The Daytona snapshot builder and the OAuth upload paths need a live account to exercise. The licensing posture in
docker/README.mdis the rule to keep: bake Pi (MIT), never bake or distribute Claude, ship the snapshot builder rather than a prebuilt snapshot.