Skip to content

feat(agent): runner engines, HTTP server, tracing, and docker image#4778

Open
mmabrouk wants to merge 1 commit into
feat/agent-runner-toolsfrom
feat/agent-runner-engines
Open

feat(agent): runner engines, HTTP server, tracing, and docker image#4778
mmabrouk wants to merge 1 commit into
feat/agent-runner-toolsfrom
feat/agent-runner-engines

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 19, 2026

Copy link
Copy Markdown
Member

Part of the agent-workflows PR set (sliced by functional area, final code only)

Most of these are independent off main. Two short stacks exist (an indented item stacks on the line above it). Each PR shows only its own area.

This PR's base is #4773. Merge that first.

Context

The runner is the TypeScript process that actually drives a coding agent for one /invoke. The wire contract and tool resolution land in PR #4773 (this PR's base). This PR adds the part that runs the agent: the two engines behind the contract, the HTTP sidecar that fronts them, the tracing that ties a run into the caller's trace, the harness assets, and the production image. With this, the Python service can hand a runner one request and get back a traced agent run.

What this changes

This PR adds two engines that satisfy the same /run contract:

  • engines/pi.ts drives the Pi SDK in process. It builds a hermetic session in a throwaway temp dir, injects AGENTS.md and forced skills in memory, resolves the model, runs one user turn, and returns the structured result. Provider keys come from the request secrets or the local Pi login.
  • engines/rivet.ts drives a harness (Pi or Claude) over the Agent Client Protocol through a rivet sandbox-agent daemon. The harness (which engine) and the sandbox (local or Daytona) are two independent axes that swap by config, not by new code.

server.ts is the long-lived HTTP sidecar. It serves GET /health and POST /run, supports one-shot JSON and NDJSON streaming, and routes each request to an engine. cli.ts is the one-shot stdin/stdout transport for the same contract. responder.ts is the seam that answers the permission gates a harness raises; today a fixed auto-or-deny policy answers them headlessly.

tracing/otel.ts and extensions/agenta.ts turn a run into an OpenTelemetry span tree that nests under the caller's /invoke span. config/ and skills/ hold the runtime harness assets. docker/ holds the production Dockerfile, a dev Dockerfile, and a README that documents the licensing posture.

Key architectural decision to review

Split tracing, decided by one switch. engines/rivet.ts:836 sets emitSpans: !isPi || isDaytona. The reason is reachability. Local Pi self-instruments: the runner propagates the caller's trace context into Pi through the Agenta extension (extensions/agenta.ts), and Pi emits its own real span tree (turn, chat, tool, with token usage) under that parent. Every other case has the runner rebuild the spans from the ACP session/update event stream instead. Daytona is the case to scrutinize: the in-sandbox process cannot reach Agenta's OTLP, so even for Pi the extension drops tracing and keeps only tools plus a usage writeback, and the runner traces from the event stream. Check that the two paths produce the same span shape and that usage is never double counted.

Flush by trace id. tracing/otel.ts buffers a trace's spans and exports them in one OTLP batch (TraceBatchProcessor). Agenta rolls up cumulative token and cost metrics per ingest batch, so a trace split across batches loses its root aggregation. The processor flushes on two signals: the root span ends (a standalone run), or the run flushes explicitly by trace id. The second path is the one that matters here. invoke_agent has a remote parent that never ends inside the runner, so the root-end signal never fires and the explicit per-trace flush is the only thing that exports the run. Check onEnd (the !span.parentSpanId guard at otel.ts:125) against the explicit flushTrace calls in both engines.

Auto backend routing. server.ts picks the engine from the request. An explicit backend wins, then the AGENT_BACKEND env default, and auto falls back to the request shape: a request carrying harness or sandbox goes to rivet, otherwise to Pi (server.ts:40). Confirm the fallback cannot send a request to the wrong engine.

Docker licensing posture. docker/README.md and docker/Dockerfile state the rule: the image bakes Pi (MIT) through the npm dependencies and never bakes or distributes Claude (proprietary, under Anthropic's Commercial Terms, which grant use but not redistribution). The daemon installs Claude from Anthropic at runtime over HTTPS, which keeps Anthropic as the distributor. No credential is baked. For the Daytona path, the repo ships the snapshot builder recipe, not a prebuilt Claude-containing image. Verify the Dockerfile bakes only Pi and that nothing in the build path embeds Claude or a key.

How to review this PR

Read server.ts first. It is the entry point and shows the contract and the routing in one screen. Then read engines/pi.ts for the simpler in-process path, then engines/rivet.ts for the ACP path and the emitSpans switch. Then read extensions/agenta.ts and tracing/otel.ts together, since the split-tracing decision only makes sense across both. Finish with docker/README.md and docker/Dockerfile for the licensing posture.

Skip engines/rivet.ts binary resolution and the Daytona upload helpers on a first pass; they are plumbing. The likeliest regression sits in the tracing seam: a run whose spans never flush (a missing per-trace flush), or a run that flushes twice and double counts usage.

Tests / notes

The tests run offline with no harness and no network. stream-events.test.ts drives the createRivetOtel delta and lifecycle state machine with a hand-built ACP session/update sequence and asserts the streaming and one-shot event shapes. responder.test.ts covers the permission policy and that an out-of-stream interaction_request lands in both the live sink and the batch event log. continuation.test.ts covers the cross-turn substrate: tool turns survive into the replayed transcript, which the cold harness needs because ACP prompt content blocks cannot carry tool calls or results.

@vercel

vercel Bot commented Jun 19, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 19, 2026 4:30pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: cacb9f28-a1d5-41d9-93b7-6aa71ad8838f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-runner-engines

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Backend CLI devops Feature Request New feature or request tests typescript labels Jun 19, 2026
@mmabrouk

Copy link
Copy Markdown
Member Author

Reviewer guide: interesting code

The interesting code, in reading order:

  • services/agent/src/server.ts:40 — the auto backend routing. With no explicit backend, a request carrying harness or sandbox goes to rivet, otherwise to Pi. Check the fallback cannot route to the wrong engine.
  • services/agent/src/engines/rivet.ts:836 — the split-tracing switch, emitSpans: !isPi || isDaytona. Local Pi self-instruments through the extension; every other case has the runner rebuild spans from the ACP event stream.
  • services/agent/src/extensions/agenta.ts:107 — the extension stays inert unless a run is wired, and on Daytona it drops tracing but keeps tools plus the usage writeback. This is the other half of the split-tracing decision.
  • services/agent/src/tracing/otel.ts:125TraceBatchProcessor.onEnd. The !span.parentSpanId guard decides root-end flush; when invoke_agent has a remote parent it never fires, so the explicit per-trace flushTrace is what exports the run.
  • services/agent/src/responder.ts:44 — the permission-gate seam. PolicyResponder answers gates headlessly with a fixed auto-or-deny policy; a cross-turn HITL responder slots in here later without touching the engine.
  • services/agent/docker/README.md:14 and services/agent/docker/Dockerfile:9 — the licensing posture: bake Pi (MIT), never bake or distribute Claude (proprietary), install Claude from Anthropic at runtime, bake no credential.

endpoint: request.trace?.endpoint,
authorization: request.trace?.authorization,
captureContent: request.trace?.captureContent,
emitSpans: !isPi || isDaytona,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split-tracing switch. emitSpans: !isPi || isDaytona. Local Pi self-instruments through the Agenta extension and emits its own span tree under the caller's parent; every other case (non-Pi harness, or Pi on Daytona where the sandbox can't reach Agenta's OTLP) has the runner rebuild spans from the ACP event stream. Worth checking the two paths yield the same span shape and never double-count usage.

spans.push(span);
this.buffers.set(traceId, spans);
// No parent in this process => this is the local root and the trace is done.
if (!span.parentSpanId) {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flush-by-trace-id is the load-bearing part. The !span.parentSpanId guard flushes when the local root ends, but invoke_agent has a remote parent that never ends in this process, so that signal never fires. The explicit per-trace flushTrace in each engine is then the only thing that exports the run. Buffering the whole trace into one OTLP batch also keeps Agenta's per-batch cumulative roll-up intact.

const backend = (request.backend ?? DEFAULT_BACKEND).toLowerCase();
if (backend === "rivet") return runRivet(request, emit, signal);
if (backend === "pi") return runPi(request, emit);
return request.harness || request.sandbox

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto backend routing. With no explicit backend and the env default at auto, the engine is chosen from the request shape: harness or sandbox present routes to rivet, otherwise Pi. Confirm a request that should hit rivet always carries one of those fields, so the fallback can't silently pick the wrong engine.

#
# Licensing posture (see docker/README.md):
# - Pi (@earendil-works/pi-coding-agent, MIT) is baked via the npm dependencies.
# - Claude Code is proprietary (Anthropic Commercial Terms). It is NEVER baked into

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Licensing posture to verify: the image bakes Pi (MIT) via npm deps and never bakes Claude (proprietary). The daemon installs Claude from Anthropic at runtime over HTTPS (hence ca-certificates), keeping Anthropic as the distributor. No credential is baked. Check nothing in the build path embeds Claude or a key.

@github-actions

Copy link
Copy Markdown
Contributor

Railway Preview Environment

Preview URL https://gateway-production-4d26.up.railway.app/w
Project agenta-oss-pr-4778
Image tag pr-4778-b85e101
Status Deployed
Railway logs Open logs
Workflow logs View workflow run
Updated at 2026-06-19T16:45:43.162Z

@mmabrouk mmabrouk left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex subagent review for #4778

Blocking security finding:

  • services/agent/src/engines/pi.ts:101 and services/agent/src/engines/pi.ts:213 write request.secrets into global process.env for the in-process Pi engine and never restore or remove them. This runner is also exposed by the long-lived HTTP sidecar (services/agent/src/server.ts:39 routes explicit backend: "pi", and the auto fallback can also reach runPi), so a request carrying one project's vault key leaves OPENAI_API_KEY, ANTHROPIC_API_KEY, etc. in the shared process. A later request without that secret can authenticate with the previous project's key, and concurrent runPi requests can race on whichever key was last assigned. The rivet path keeps secrets in a per-daemon env; the Pi path needs equivalent per-run isolation, or at least snapshot/restore/delete the touched provider env vars in finally and avoid concurrent in-process Pi runs while global env is the auth channel. Please add a regression test that runs two Pi requests in the same process and proves the second cannot see the first request's secret.

Review-map and stack consistency:

  • I read the #4779 ground-truth and PR-stack docs first. Those docs are useful as cross-PR design context, but #4779 is a main-based docs PR and its current pages describe code that lands in the code PRs rather than on #4779's own SHA. I treated them as review context, not as proof that #4779 itself contains this runtime.
  • #4778's stack-nav/body are internally consistent with the active stack I saw: #4778 is based on #4773 and is the newer runner engines/server/tracing/docker-image PR. I did not find a stale #4774 reference in #4778's PR body or in the sampled #4778 files.
  • The main #4778 code/Docker blobs I sampled match #4774 (server.ts, pi.ts, rivet.ts, otel.ts, extensions/agenta.ts, Dockerfile/README, and package.json), so #4778 appears to supersede or duplicate #4774 rather than introduce runtime drift. Coordinate closing or merging only one, and carry the secret-isolation finding to whichever PR remains active.

Additional context:

  • The split tracing path and flushTrace calls look coherent: local Pi self-instruments, Daytona and non-Pi trace from ACP, and both engines explicitly flush by trace id before returning or surfacing errors.
  • Docker/package review did not show Claude Code or credentials baked into the production image; the documented runtime-install posture is intact. The OAuth fallback remains deployment-sensitive: cloud and multi-tenant runners must not mount host OAuth state.

Tests not run: review was through GitHub file/patch inspection only; I did not run local tests.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (5)
services/agent/docker/Dockerfile (1)

1-56: ⚡ Quick win

Verify non-root user requirement and licensing posture compliance.

The Dockerfile is well-structured for the stated purpose. However, Trivy flags a security concern and the past review requested verification of licensing compliance.

Security: The image runs as root (no USER directive). While this is a common pattern, running containers as non-root is a security best practice and may be required depending on the deployment environment. Consider adding:

RUN groupadd -r nodeapp && useradd -r -g nodeapp nodeapp
USER nodeapp

If non-root is not feasible due to permission constraints (e.g., runtime config injection, volume mounts), this should be explicitly documented in a comment.

Licensing: The Dockerfile comments correctly state that Pi (MIT) is baked via npm dependencies and Claude Code is never baked. The build path (lines 33–46) only copies package.json, pnpm-lock.yaml, source, config, and skills—no proprietary artifacts. The pnpm install --frozen-lockfile ensures no unexpected transitive dependencies are pulled. This aligns with the licensing posture in docker/README.md. ✓

services/agent/docker/Dockerfile.dev (1)

1-41: ⚡ Quick win

Verify non-root user requirement (consistent with production Dockerfile).

The dev image follows the same structure as production but uses tsx watch for hot reload and relies on a baked extension bundle (since dist/ is not bind-mounted). The comment at lines 29–31 correctly explains that developers must rebuild the image after editing the extension or tracer—this is appropriate given the bind-mount strategy.

However, like the production Dockerfile, this image runs as root (Trivy DS-0002). If a non-root user is required or recommended, both images should be updated consistently. See the parallel review comment on the production Dockerfile for the suggested fix.

services/agent/README.md (2)

25-25: 💤 Low value

Fix missing language specifier on code fence (MD040).

The code block at line 25 is missing a language specifier. Markdown linters expect fenced code blocks to declare their content type.

 ## Layout (`src/`)
 
-```
+```text
 src/
   cli.ts              entrypoint: stdin/stdout (subprocess transport)
   server.ts           entrypoint: HTTP sidecar on :8765

Alternatively, use plaintext or bash depending on the intent.


1-122: ⚡ Quick win

Excellent contract documentation; one formatting nit.

The README comprehensively documents the service contract, invocation modes, engine architecture, tool routing, tracing integration, and authentication strategy. The engine vs. harness distinction (lines 46–54) is especially helpful for users. The local usage example is runnable and useful.

The only issue is the MD040 markdown lint warning on line 25 (code fence language specifier), flagged in the parallel comment above.

services/agent/src/engines/rivet.ts (1)

316-318: 💤 Low value

Minor: character-boundary truncation could produce malformed text.

transcript.slice(-maxChars) operates on UTF-16 code units and may split a surrogate pair (emoji, etc.), leaving an unpaired surrogate at the start. For a history replay prompt this is unlikely to cause model failures, but could produce garbled leading characters.

If you want to harden this, consider trimming to the next newline after slicing to align on a message boundary.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7aebe7ce-6165-4b85-9e83-59167f1260df

📥 Commits

Reviewing files that changed from the base of the PR and between 2f467ce and 965e180.

📒 Files selected for processing (18)
  • services/agent/README.md
  • services/agent/config/AGENTS.md
  • services/agent/config/agent.json
  • services/agent/docker/Dockerfile
  • services/agent/docker/Dockerfile.dev
  • services/agent/docker/README.md
  • services/agent/scripts/build-extension.mjs
  • services/agent/skills/agenta-getting-started/SKILL.md
  • services/agent/src/cli.ts
  • services/agent/src/engines/pi.ts
  • services/agent/src/engines/rivet.ts
  • services/agent/src/extensions/agenta.ts
  • services/agent/src/responder.ts
  • services/agent/src/server.ts
  • services/agent/src/tracing/otel.ts
  • services/agent/test/continuation.test.ts
  • services/agent/test/responder.test.ts
  • services/agent/test/stream-events.test.ts

Comment thread services/agent/src/cli.ts
Comment on lines +20 to +26
function runAgent(
request: AgentRunRequest,
emit?: EmitEvent,
): Promise<AgentRunResult> {
const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase();
return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align CLI backend routing with server auto-routing semantics.

When backend is omitted (or set to auto via env), CLI currently falls through to runPi, so rivet-shaped requests (harness/sandbox) can execute on the wrong engine.

Suggested fix
 function runAgent(
   request: AgentRunRequest,
   emit?: EmitEvent,
 ): Promise<AgentRunResult> {
-  const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase();
-  return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit);
+  const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "auto").toLowerCase();
+  if (backend === "rivet") return runRivet(request, emit);
+  if (backend === "pi") return runPi(request, emit);
+  return request.harness || request.sandbox ? runRivet(request, emit) : runPi(request, emit);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function runAgent(
request: AgentRunRequest,
emit?: EmitEvent,
): Promise<AgentRunResult> {
const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase();
return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit);
}
function runAgent(
request: AgentRunRequest,
emit?: EmitEvent,
): Promise<AgentRunResult> {
const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "auto").toLowerCase();
if (backend === "rivet") return runRivet(request, emit);
if (backend === "pi") return runPi(request, emit);
return request.harness || request.sandbox ? runRivet(request, emit) : runPi(request, emit);
}

Comment on lines +98 to +103
/** Apply vault-resolved provider keys to the process env so Pi's model auth can see them. */
function applySecrets(secrets: Record<string, string> | undefined): void {
for (const [key, value] of Object.entries(secrets ?? {})) {
if (value) process.env[key] = value;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Scope and restore injected secrets per run to prevent cross-request leakage.

The current secret injection mutates global process.env and never restores it. In a long-lived runner, later invocations can inherit prior tenant/provider credentials.

Suggested fix
-function applySecrets(secrets: Record<string, string> | undefined): void {
+function applySecrets(secrets: Record<string, string> | undefined): () => void {
+  const previous = new Map<string, string | undefined>();
   for (const [key, value] of Object.entries(secrets ?? {})) {
-    if (value) process.env[key] = value;
+    if (!value) continue;
+    if (!previous.has(key)) previous.set(key, process.env[key]);
+    process.env[key] = value;
   }
+  return () => {
+    for (const [key, value] of previous) {
+      if (value === undefined) delete process.env[key];
+      else process.env[key] = value;
+    }
+  };
 }
@@
-  applySecrets(request.secrets);
+  const restoreSecrets = applySecrets(request.secrets);
   const cwd = mkdtempSync(join(tmpdir(), "agenta-agent-"));
@@
   } finally {
+    restoreSecrets();
     try {
       rmSync(cwd, { recursive: true, force: true });
     } catch {

Also applies to: 213-215, 394-400

Comment on lines +46 to +60
let specs: ResolvedToolSpec[] = [];
try {
specs = JSON.parse(raw);
} catch (err) {
log(`bad AGENTA_TOOL_SPECS: ${(err as Error).message}`);
return;
}
const authorization = process.env.AGENTA_TOOL_CALLBACK_AUTH;
// Daytona: the in-sandbox process can't reach Agenta, so tool calls are relayed through
// the runner via files in this dir. Unset for local runs (direct /tools/call).
const relayDir = process.env.AGENTA_TOOL_RELAY_DIR;

let registered = 0;
for (const spec of specs) {
// `client` tools are browser-fulfilled; there is no browser in the sandbox to answer.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate parsed tool-spec payload shape before iterating.

JSON.parse success is not enough here; if AGENTA_TOOL_SPECS is valid JSON but not an array, startup can still fail at runtime when iterating/reading fields. Guard the parsed shape and bail cleanly.

Suggested fix
-  let specs: ResolvedToolSpec[] = [];
+  let specs: ResolvedToolSpec[] = [];
   try {
-    specs = JSON.parse(raw);
+    const parsed = JSON.parse(raw);
+    if (!Array.isArray(parsed)) {
+      log("bad AGENTA_TOOL_SPECS: expected a JSON array");
+      return;
+    }
+    specs = parsed as ResolvedToolSpec[];
   } catch (err) {
     log(`bad AGENTA_TOOL_SPECS: ${(err as Error).message}`);
     return;
   }

Comment on lines +117 to +124
pi.on("agent_end", async () => {
if (hasTracing) await otel.flush(); // invoke_agent has a remote parent → flush by id
if (usageOut) {
try {
writeFileSync(usageOut, JSON.stringify(otel.usage()), "utf-8");
} catch (err) {
log(`usage writeback skipped: ${(err as Error).message}`);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep usage writeback resilient even if trace flush fails.

In the agent_end hook, a flush failure currently short-circuits the handler and can skip AGENTA_USAGE_OUT writeback. Wrap flush in its own try/catch and continue.

Suggested fix
   pi.on("agent_end", async () => {
-    if (hasTracing) await otel.flush(); // invoke_agent has a remote parent → flush by id
+    if (hasTracing) {
+      try {
+        await otel.flush(); // invoke_agent has a remote parent → flush by id
+      } catch (err) {
+        log(`trace flush skipped: ${(err as Error).message}`);
+      }
+    }
     if (usageOut) {
       try {
         writeFileSync(usageOut, JSON.stringify(otel.usage()), "utf-8");
       } catch (err) {
         log(`usage writeback skipped: ${(err as Error).message}`);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pi.on("agent_end", async () => {
if (hasTracing) await otel.flush(); // invoke_agent has a remote parent → flush by id
if (usageOut) {
try {
writeFileSync(usageOut, JSON.stringify(otel.usage()), "utf-8");
} catch (err) {
log(`usage writeback skipped: ${(err as Error).message}`);
}
pi.on("agent_end", async () => {
if (hasTracing) {
try {
await otel.flush(); // invoke_agent has a remote parent → flush by id
} catch (err) {
log(`trace flush skipped: ${(err as Error).message}`);
}
}
if (usageOut) {
try {
writeFileSync(usageOut, JSON.stringify(otel.usage()), "utf-8");
} catch (err) {
log(`usage writeback skipped: ${(err as Error).message}`);
}
}

Comment on lines +65 to +76
export function decisionToReply(
decision: PermissionDecision,
availableReplies: string[],
): string {
if (decision === "deny") {
return availableReplies.find((r) => r === "reject") ?? "reject";
}
return (
availableReplies.find((r) => r === "always") ??
availableReplies.find((r) => r === "once") ??
"once"
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return only replies offered by the harness.

On Line 70 and Lines 73-75, the fallback can emit "reject"/"once" even if those values are not present in availableReplies. Downstream (services/agent/src/engines/rivet.ts Line 881) forwards this directly to session.respondPermission(...), so an unsupported reply can break permission resolution for that turn.

Suggested fix
 export function decisionToReply(
   decision: PermissionDecision,
   availableReplies: string[],
 ): string {
   if (decision === "deny") {
-    return availableReplies.find((r) => r === "reject") ?? "reject";
+    return (
+      availableReplies.find((r) => r === "reject") ??
+      availableReplies[0] ??
+      "reject"
+    );
   }
   return (
     availableReplies.find((r) => r === "always") ??
     availableReplies.find((r) => r === "once") ??
-    "once"
+    availableReplies.find((r) => r !== "reject") ??
+    availableReplies[0] ??
+    "once"
   );
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function decisionToReply(
decision: PermissionDecision,
availableReplies: string[],
): string {
if (decision === "deny") {
return availableReplies.find((r) => r === "reject") ?? "reject";
}
return (
availableReplies.find((r) => r === "always") ??
availableReplies.find((r) => r === "once") ??
"once"
);
export function decisionToReply(
decision: PermissionDecision,
availableReplies: string[],
): string {
if (decision === "deny") {
return (
availableReplies.find((r) => r === "reject") ??
availableReplies[0] ??
"reject"
);
}
return (
availableReplies.find((r) => r === "always") ??
availableReplies.find((r) => r === "once") ??
availableReplies.find((r) => r !== "reject") ??
availableReplies[0] ??
"once"
);
}

Comment on lines +97 to +103
async function readBody(req: IncomingMessage): Promise<string> {
const chunks: Buffer[] = [];
for await (const chunk of req) {
chunks.push(chunk as Buffer);
}
return Buffer.concat(chunks).toString("utf8");
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a request body size limit to /run reads.

readBody currently buffers arbitrarily large payloads in memory, so a single oversized request can exhaust the process and impact all in-flight runs.

Suggested fix
+const MAX_BODY_BYTES = 1 * 1024 * 1024; // 1 MiB
+
 async function readBody(req: IncomingMessage): Promise<string> {
   const chunks: Buffer[] = [];
+  let total = 0;
   for await (const chunk of req) {
-    chunks.push(chunk as Buffer);
+    const buf = chunk as Buffer;
+    total += buf.length;
+    if (total > MAX_BODY_BYTES) {
+      throw new Error("Payload too large");
+    }
+    chunks.push(buf);
   }
   return Buffer.concat(chunks).toString("utf8");
 }
-      const raw = await readBody(req);
+      let raw: string;
+      try {
+        raw = await readBody(req);
+      } catch (err) {
+        const msg = err instanceof Error ? err.message : String(err);
+        const status = msg === "Payload too large" ? 413 : 400;
+        return send(res, status, { ok: false, error: msg });
+      }

Also applies to: 111-118

Comment on lines +94 to +103
function defaultTarget(): ExportTarget {
const host = (process.env.AGENTA_HOST || "https://cloud.agenta.ai").replace(
/\/+$/,
"",
);
const apiKey = process.env.AGENTA_API_KEY || "";
return {
endpoint: `${host}/api/otlp/v1/traces`,
authorization: apiKey ? `ApiKey ${apiKey}` : undefined,
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Make OTLP export opt-in instead of defaulting to cloud host.

Line 95 defaults to https://cloud.agenta.ai when AGENTA_HOST is unset, and Lines 135-136 consume that fallback for export. With captureContent defaulting to true, this can send prompt/tool content off-cluster on unconfigured runs. Prefer requiring an explicit endpoint (run config or env) and skipping export when none is configured.

Also applies to: 135-136

Comment on lines +23 to +32
delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
assert.equal(policyFromRequest(undefined), "auto");
assert.equal(policyFromRequest("auto"), "auto");
assert.equal(policyFromRequest("deny"), "deny");

process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";
assert.equal(policyFromRequest(undefined), "deny", "env forces deny");
assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");
delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Restore the prior env value to keep tests isolated.

Lines 23-32 mutate process.env.AGENTA_RIVET_DENY_PERMISSIONS but do not restore an existing original value, which can leak state into other tests running in the same process.

Suggested fix
 {
-  delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
-  assert.equal(policyFromRequest(undefined), "auto");
-  assert.equal(policyFromRequest("auto"), "auto");
-  assert.equal(policyFromRequest("deny"), "deny");
-
-  process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";
-  assert.equal(policyFromRequest(undefined), "deny", "env forces deny");
-  assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");
-  delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+  const prev = process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+  try {
+    delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+    assert.equal(policyFromRequest(undefined), "auto");
+    assert.equal(policyFromRequest("auto"), "auto");
+    assert.equal(policyFromRequest("deny"), "deny");
+
+    process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";
+    assert.equal(policyFromRequest(undefined), "deny", "env forces deny");
+    assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");
+  } finally {
+    if (prev === undefined) delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+    else process.env.AGENTA_RIVET_DENY_PERMISSIONS = prev;
+  }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
assert.equal(policyFromRequest(undefined), "auto");
assert.equal(policyFromRequest("auto"), "auto");
assert.equal(policyFromRequest("deny"), "deny");
process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";
assert.equal(policyFromRequest(undefined), "deny", "env forces deny");
assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");
delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
}
{
const prev = process.env.AGENTA_RIVET_DENY_PERMISSIONS;
try {
delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
assert.equal(policyFromRequest(undefined), "auto");
assert.equal(policyFromRequest("auto"), "auto");
assert.equal(policyFromRequest("deny"), "deny");
process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";
assert.equal(policyFromRequest(undefined), "deny", "env forces deny");
assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");
} finally {
if (prev === undefined) delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
else process.env.AGENTA_RIVET_DENY_PERMISSIONS = prev;
}
}

Copy link
Copy Markdown
Member Author

Codex subagent follow-up for #4778

Verified: these mirror the #4774 findings. #4778 head 965e180 has code-identical blobs to #4774 for the affected files I checked: services/agent/src/engines/rivet.ts (876020b...) and services/agent/src/tracing/otel.ts (2815564...).

  • services/agent/src/engines/rivet.ts:164 and services/agent/src/engines/rivet.ts:167: buildPiExtensionEnv serializes full request.customTools into AGENTA_TOOL_SPECS and puts AGENTA_TOOL_CALLBACK_AUTH in the harness env. For Pi-over-rivet this exposes code-tool spec.env scoped secrets plus callback auth to the whole harness process, bypassing the feat(agent): runner wire contract and tool execution #4773 boundary where code-tool secrets should only reach the subprocess that executes that tool.
  • services/agent/src/engines/rivet.ts:836 plus services/agent/src/tracing/otel.ts:873: local Pi ACP sets emitSpans: false, then handleUpdate returns before tool_call, tool_call_update, and usage_update handling. Text still flows, but local Pi-over-rivet drops tool and usage events from the result/stream event path. Split event recording from span emission.
  • services/agent/src/engines/rivet.ts:895, services/agent/src/engines/rivet.ts:896, and services/agent/src/engines/rivet.ts:901: run.finish() closes spans and run.flush() exports before final usage is read/computed. In services/agent/src/tracing/otel.ts:965 through services/agent/src/tracing/otel.ts:979, stream usage is only placed on the LLM span and invoke_agent ends without the run-total usage. Compute final usage before finish/flush and stamp totals on invoke_agent, matching the Pi extension path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend CLI devops Feature Request New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files. tests typescript

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant