feat(agent): runner engines, HTTP server, tracing, and docker image#4778
feat(agent): runner engines, HTTP server, tracing, and docker image#4778mmabrouk wants to merge 1 commit into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer guide: interesting codeThe interesting code, in reading order:
|
| endpoint: request.trace?.endpoint, | ||
| authorization: request.trace?.authorization, | ||
| captureContent: request.trace?.captureContent, | ||
| emitSpans: !isPi || isDaytona, |
There was a problem hiding this comment.
Split-tracing switch. emitSpans: !isPi || isDaytona. Local Pi self-instruments through the Agenta extension and emits its own span tree under the caller's parent; every other case (non-Pi harness, or Pi on Daytona where the sandbox can't reach Agenta's OTLP) has the runner rebuild spans from the ACP event stream. Worth checking the two paths yield the same span shape and never double-count usage.
| spans.push(span); | ||
| this.buffers.set(traceId, spans); | ||
| // No parent in this process => this is the local root and the trace is done. | ||
| if (!span.parentSpanId) { |
There was a problem hiding this comment.
Flush-by-trace-id is the load-bearing part. The !span.parentSpanId guard flushes when the local root ends, but invoke_agent has a remote parent that never ends in this process, so that signal never fires. The explicit per-trace flushTrace in each engine is then the only thing that exports the run. Buffering the whole trace into one OTLP batch also keeps Agenta's per-batch cumulative roll-up intact.
| const backend = (request.backend ?? DEFAULT_BACKEND).toLowerCase(); | ||
| if (backend === "rivet") return runRivet(request, emit, signal); | ||
| if (backend === "pi") return runPi(request, emit); | ||
| return request.harness || request.sandbox |
There was a problem hiding this comment.
Auto backend routing. With no explicit backend and the env default at auto, the engine is chosen from the request shape: harness or sandbox present routes to rivet, otherwise Pi. Confirm a request that should hit rivet always carries one of those fields, so the fallback can't silently pick the wrong engine.
| # | ||
| # Licensing posture (see docker/README.md): | ||
| # - Pi (@earendil-works/pi-coding-agent, MIT) is baked via the npm dependencies. | ||
| # - Claude Code is proprietary (Anthropic Commercial Terms). It is NEVER baked into |
There was a problem hiding this comment.
Licensing posture to verify: the image bakes Pi (MIT) via npm deps and never bakes Claude (proprietary). The daemon installs Claude from Anthropic at runtime over HTTPS (hence ca-certificates), keeping Anthropic as the distributor. No credential is baked. Check nothing in the build path embeds Claude or a key.
Railway Preview Environment
|
mmabrouk
left a comment
There was a problem hiding this comment.
Codex subagent review for #4778
Blocking security finding:
services/agent/src/engines/pi.ts:101andservices/agent/src/engines/pi.ts:213writerequest.secretsinto globalprocess.envfor the in-process Pi engine and never restore or remove them. This runner is also exposed by the long-lived HTTP sidecar (services/agent/src/server.ts:39routes explicitbackend: "pi", and the auto fallback can also reachrunPi), so a request carrying one project's vault key leavesOPENAI_API_KEY,ANTHROPIC_API_KEY, etc. in the shared process. A later request without that secret can authenticate with the previous project's key, and concurrentrunPirequests can race on whichever key was last assigned. The rivet path keeps secrets in a per-daemon env; the Pi path needs equivalent per-run isolation, or at least snapshot/restore/delete the touched provider env vars infinallyand avoid concurrent in-process Pi runs while global env is the auth channel. Please add a regression test that runs two Pi requests in the same process and proves the second cannot see the first request's secret.
Review-map and stack consistency:
- I read the #4779 ground-truth and PR-stack docs first. Those docs are useful as cross-PR design context, but #4779 is a main-based docs PR and its current pages describe code that lands in the code PRs rather than on #4779's own SHA. I treated them as review context, not as proof that #4779 itself contains this runtime.
- #4778's stack-nav/body are internally consistent with the active stack I saw: #4778 is based on #4773 and is the newer runner engines/server/tracing/docker-image PR. I did not find a stale #4774 reference in #4778's PR body or in the sampled #4778 files.
- The main #4778 code/Docker blobs I sampled match #4774 (
server.ts,pi.ts,rivet.ts,otel.ts,extensions/agenta.ts, Dockerfile/README, and package.json), so #4778 appears to supersede or duplicate #4774 rather than introduce runtime drift. Coordinate closing or merging only one, and carry the secret-isolation finding to whichever PR remains active.
Additional context:
- The split tracing path and
flushTracecalls look coherent: local Pi self-instruments, Daytona and non-Pi trace from ACP, and both engines explicitly flush by trace id before returning or surfacing errors. - Docker/package review did not show Claude Code or credentials baked into the production image; the documented runtime-install posture is intact. The OAuth fallback remains deployment-sensitive: cloud and multi-tenant runners must not mount host OAuth state.
Tests not run: review was through GitHub file/patch inspection only; I did not run local tests.
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (5)
services/agent/docker/Dockerfile (1)
1-56: ⚡ Quick winVerify non-root user requirement and licensing posture compliance.
The Dockerfile is well-structured for the stated purpose. However, Trivy flags a security concern and the past review requested verification of licensing compliance.
Security: The image runs as
root(noUSERdirective). While this is a common pattern, running containers as non-root is a security best practice and may be required depending on the deployment environment. Consider adding:RUN groupadd -r nodeapp && useradd -r -g nodeapp nodeapp USER nodeappIf non-root is not feasible due to permission constraints (e.g., runtime config injection, volume mounts), this should be explicitly documented in a comment.
Licensing: The Dockerfile comments correctly state that Pi (MIT) is baked via npm dependencies and Claude Code is never baked. The build path (lines 33–46) only copies
package.json,pnpm-lock.yaml, source, config, and skills—no proprietary artifacts. Thepnpm install --frozen-lockfileensures no unexpected transitive dependencies are pulled. This aligns with the licensing posture indocker/README.md. ✓services/agent/docker/Dockerfile.dev (1)
1-41: ⚡ Quick winVerify non-root user requirement (consistent with production Dockerfile).
The dev image follows the same structure as production but uses
tsx watchfor hot reload and relies on a baked extension bundle (sincedist/is not bind-mounted). The comment at lines 29–31 correctly explains that developers must rebuild the image after editing the extension or tracer—this is appropriate given the bind-mount strategy.However, like the production Dockerfile, this image runs as
root(Trivy DS-0002). If a non-root user is required or recommended, both images should be updated consistently. See the parallel review comment on the production Dockerfile for the suggested fix.services/agent/README.md (2)
25-25: 💤 Low valueFix missing language specifier on code fence (MD040).
The code block at line 25 is missing a language specifier. Markdown linters expect fenced code blocks to declare their content type.
## Layout (`src/`) -``` +```text src/ cli.ts entrypoint: stdin/stdout (subprocess transport) server.ts entrypoint: HTTP sidecar on :8765Alternatively, use
plaintextorbashdepending on the intent.
1-122: ⚡ Quick winExcellent contract documentation; one formatting nit.
The README comprehensively documents the service contract, invocation modes, engine architecture, tool routing, tracing integration, and authentication strategy. The engine vs. harness distinction (lines 46–54) is especially helpful for users. The local usage example is runnable and useful.
The only issue is the MD040 markdown lint warning on line 25 (code fence language specifier), flagged in the parallel comment above.
services/agent/src/engines/rivet.ts (1)
316-318: 💤 Low valueMinor: character-boundary truncation could produce malformed text.
transcript.slice(-maxChars)operates on UTF-16 code units and may split a surrogate pair (emoji, etc.), leaving an unpaired surrogate at the start. For a history replay prompt this is unlikely to cause model failures, but could produce garbled leading characters.If you want to harden this, consider trimming to the next newline after slicing to align on a message boundary.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 7aebe7ce-6165-4b85-9e83-59167f1260df
📒 Files selected for processing (18)
services/agent/README.mdservices/agent/config/AGENTS.mdservices/agent/config/agent.jsonservices/agent/docker/Dockerfileservices/agent/docker/Dockerfile.devservices/agent/docker/README.mdservices/agent/scripts/build-extension.mjsservices/agent/skills/agenta-getting-started/SKILL.mdservices/agent/src/cli.tsservices/agent/src/engines/pi.tsservices/agent/src/engines/rivet.tsservices/agent/src/extensions/agenta.tsservices/agent/src/responder.tsservices/agent/src/server.tsservices/agent/src/tracing/otel.tsservices/agent/test/continuation.test.tsservices/agent/test/responder.test.tsservices/agent/test/stream-events.test.ts
| function runAgent( | ||
| request: AgentRunRequest, | ||
| emit?: EmitEvent, | ||
| ): Promise<AgentRunResult> { | ||
| const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase(); | ||
| return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit); | ||
| } |
There was a problem hiding this comment.
Align CLI backend routing with server auto-routing semantics.
When backend is omitted (or set to auto via env), CLI currently falls through to runPi, so rivet-shaped requests (harness/sandbox) can execute on the wrong engine.
Suggested fix
function runAgent(
request: AgentRunRequest,
emit?: EmitEvent,
): Promise<AgentRunResult> {
- const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase();
- return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit);
+ const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "auto").toLowerCase();
+ if (backend === "rivet") return runRivet(request, emit);
+ if (backend === "pi") return runPi(request, emit);
+ return request.harness || request.sandbox ? runRivet(request, emit) : runPi(request, emit);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function runAgent( | |
| request: AgentRunRequest, | |
| emit?: EmitEvent, | |
| ): Promise<AgentRunResult> { | |
| const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "pi").toLowerCase(); | |
| return backend === "rivet" ? runRivet(request, emit) : runPi(request, emit); | |
| } | |
| function runAgent( | |
| request: AgentRunRequest, | |
| emit?: EmitEvent, | |
| ): Promise<AgentRunResult> { | |
| const backend = (request.backend ?? process.env.AGENT_BACKEND ?? "auto").toLowerCase(); | |
| if (backend === "rivet") return runRivet(request, emit); | |
| if (backend === "pi") return runPi(request, emit); | |
| return request.harness || request.sandbox ? runRivet(request, emit) : runPi(request, emit); | |
| } |
| /** Apply vault-resolved provider keys to the process env so Pi's model auth can see them. */ | ||
| function applySecrets(secrets: Record<string, string> | undefined): void { | ||
| for (const [key, value] of Object.entries(secrets ?? {})) { | ||
| if (value) process.env[key] = value; | ||
| } | ||
| } |
There was a problem hiding this comment.
Scope and restore injected secrets per run to prevent cross-request leakage.
The current secret injection mutates global process.env and never restores it. In a long-lived runner, later invocations can inherit prior tenant/provider credentials.
Suggested fix
-function applySecrets(secrets: Record<string, string> | undefined): void {
+function applySecrets(secrets: Record<string, string> | undefined): () => void {
+ const previous = new Map<string, string | undefined>();
for (const [key, value] of Object.entries(secrets ?? {})) {
- if (value) process.env[key] = value;
+ if (!value) continue;
+ if (!previous.has(key)) previous.set(key, process.env[key]);
+ process.env[key] = value;
}
+ return () => {
+ for (const [key, value] of previous) {
+ if (value === undefined) delete process.env[key];
+ else process.env[key] = value;
+ }
+ };
}
@@
- applySecrets(request.secrets);
+ const restoreSecrets = applySecrets(request.secrets);
const cwd = mkdtempSync(join(tmpdir(), "agenta-agent-"));
@@
} finally {
+ restoreSecrets();
try {
rmSync(cwd, { recursive: true, force: true });
} catch {Also applies to: 213-215, 394-400
| let specs: ResolvedToolSpec[] = []; | ||
| try { | ||
| specs = JSON.parse(raw); | ||
| } catch (err) { | ||
| log(`bad AGENTA_TOOL_SPECS: ${(err as Error).message}`); | ||
| return; | ||
| } | ||
| const authorization = process.env.AGENTA_TOOL_CALLBACK_AUTH; | ||
| // Daytona: the in-sandbox process can't reach Agenta, so tool calls are relayed through | ||
| // the runner via files in this dir. Unset for local runs (direct /tools/call). | ||
| const relayDir = process.env.AGENTA_TOOL_RELAY_DIR; | ||
|
|
||
| let registered = 0; | ||
| for (const spec of specs) { | ||
| // `client` tools are browser-fulfilled; there is no browser in the sandbox to answer. |
There was a problem hiding this comment.
Validate parsed tool-spec payload shape before iterating.
JSON.parse success is not enough here; if AGENTA_TOOL_SPECS is valid JSON but not an array, startup can still fail at runtime when iterating/reading fields. Guard the parsed shape and bail cleanly.
Suggested fix
- let specs: ResolvedToolSpec[] = [];
+ let specs: ResolvedToolSpec[] = [];
try {
- specs = JSON.parse(raw);
+ const parsed = JSON.parse(raw);
+ if (!Array.isArray(parsed)) {
+ log("bad AGENTA_TOOL_SPECS: expected a JSON array");
+ return;
+ }
+ specs = parsed as ResolvedToolSpec[];
} catch (err) {
log(`bad AGENTA_TOOL_SPECS: ${(err as Error).message}`);
return;
}| pi.on("agent_end", async () => { | ||
| if (hasTracing) await otel.flush(); // invoke_agent has a remote parent → flush by id | ||
| if (usageOut) { | ||
| try { | ||
| writeFileSync(usageOut, JSON.stringify(otel.usage()), "utf-8"); | ||
| } catch (err) { | ||
| log(`usage writeback skipped: ${(err as Error).message}`); | ||
| } |
There was a problem hiding this comment.
Keep usage writeback resilient even if trace flush fails.
In the agent_end hook, a flush failure currently short-circuits the handler and can skip AGENTA_USAGE_OUT writeback. Wrap flush in its own try/catch and continue.
Suggested fix
pi.on("agent_end", async () => {
- if (hasTracing) await otel.flush(); // invoke_agent has a remote parent → flush by id
+ if (hasTracing) {
+ try {
+ await otel.flush(); // invoke_agent has a remote parent → flush by id
+ } catch (err) {
+ log(`trace flush skipped: ${(err as Error).message}`);
+ }
+ }
if (usageOut) {
try {
writeFileSync(usageOut, JSON.stringify(otel.usage()), "utf-8");
} catch (err) {
log(`usage writeback skipped: ${(err as Error).message}`);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pi.on("agent_end", async () => { | |
| if (hasTracing) await otel.flush(); // invoke_agent has a remote parent → flush by id | |
| if (usageOut) { | |
| try { | |
| writeFileSync(usageOut, JSON.stringify(otel.usage()), "utf-8"); | |
| } catch (err) { | |
| log(`usage writeback skipped: ${(err as Error).message}`); | |
| } | |
| pi.on("agent_end", async () => { | |
| if (hasTracing) { | |
| try { | |
| await otel.flush(); // invoke_agent has a remote parent → flush by id | |
| } catch (err) { | |
| log(`trace flush skipped: ${(err as Error).message}`); | |
| } | |
| } | |
| if (usageOut) { | |
| try { | |
| writeFileSync(usageOut, JSON.stringify(otel.usage()), "utf-8"); | |
| } catch (err) { | |
| log(`usage writeback skipped: ${(err as Error).message}`); | |
| } | |
| } |
| export function decisionToReply( | ||
| decision: PermissionDecision, | ||
| availableReplies: string[], | ||
| ): string { | ||
| if (decision === "deny") { | ||
| return availableReplies.find((r) => r === "reject") ?? "reject"; | ||
| } | ||
| return ( | ||
| availableReplies.find((r) => r === "always") ?? | ||
| availableReplies.find((r) => r === "once") ?? | ||
| "once" | ||
| ); |
There was a problem hiding this comment.
Return only replies offered by the harness.
On Line 70 and Lines 73-75, the fallback can emit "reject"/"once" even if those values are not present in availableReplies. Downstream (services/agent/src/engines/rivet.ts Line 881) forwards this directly to session.respondPermission(...), so an unsupported reply can break permission resolution for that turn.
Suggested fix
export function decisionToReply(
decision: PermissionDecision,
availableReplies: string[],
): string {
if (decision === "deny") {
- return availableReplies.find((r) => r === "reject") ?? "reject";
+ return (
+ availableReplies.find((r) => r === "reject") ??
+ availableReplies[0] ??
+ "reject"
+ );
}
return (
availableReplies.find((r) => r === "always") ??
availableReplies.find((r) => r === "once") ??
- "once"
+ availableReplies.find((r) => r !== "reject") ??
+ availableReplies[0] ??
+ "once"
);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export function decisionToReply( | |
| decision: PermissionDecision, | |
| availableReplies: string[], | |
| ): string { | |
| if (decision === "deny") { | |
| return availableReplies.find((r) => r === "reject") ?? "reject"; | |
| } | |
| return ( | |
| availableReplies.find((r) => r === "always") ?? | |
| availableReplies.find((r) => r === "once") ?? | |
| "once" | |
| ); | |
| export function decisionToReply( | |
| decision: PermissionDecision, | |
| availableReplies: string[], | |
| ): string { | |
| if (decision === "deny") { | |
| return ( | |
| availableReplies.find((r) => r === "reject") ?? | |
| availableReplies[0] ?? | |
| "reject" | |
| ); | |
| } | |
| return ( | |
| availableReplies.find((r) => r === "always") ?? | |
| availableReplies.find((r) => r === "once") ?? | |
| availableReplies.find((r) => r !== "reject") ?? | |
| availableReplies[0] ?? | |
| "once" | |
| ); | |
| } |
| async function readBody(req: IncomingMessage): Promise<string> { | ||
| const chunks: Buffer[] = []; | ||
| for await (const chunk of req) { | ||
| chunks.push(chunk as Buffer); | ||
| } | ||
| return Buffer.concat(chunks).toString("utf8"); | ||
| } |
There was a problem hiding this comment.
Add a request body size limit to /run reads.
readBody currently buffers arbitrarily large payloads in memory, so a single oversized request can exhaust the process and impact all in-flight runs.
Suggested fix
+const MAX_BODY_BYTES = 1 * 1024 * 1024; // 1 MiB
+
async function readBody(req: IncomingMessage): Promise<string> {
const chunks: Buffer[] = [];
+ let total = 0;
for await (const chunk of req) {
- chunks.push(chunk as Buffer);
+ const buf = chunk as Buffer;
+ total += buf.length;
+ if (total > MAX_BODY_BYTES) {
+ throw new Error("Payload too large");
+ }
+ chunks.push(buf);
}
return Buffer.concat(chunks).toString("utf8");
}- const raw = await readBody(req);
+ let raw: string;
+ try {
+ raw = await readBody(req);
+ } catch (err) {
+ const msg = err instanceof Error ? err.message : String(err);
+ const status = msg === "Payload too large" ? 413 : 400;
+ return send(res, status, { ok: false, error: msg });
+ }Also applies to: 111-118
| function defaultTarget(): ExportTarget { | ||
| const host = (process.env.AGENTA_HOST || "https://cloud.agenta.ai").replace( | ||
| /\/+$/, | ||
| "", | ||
| ); | ||
| const apiKey = process.env.AGENTA_API_KEY || ""; | ||
| return { | ||
| endpoint: `${host}/api/otlp/v1/traces`, | ||
| authorization: apiKey ? `ApiKey ${apiKey}` : undefined, | ||
| }; |
There was a problem hiding this comment.
Make OTLP export opt-in instead of defaulting to cloud host.
Line 95 defaults to https://cloud.agenta.ai when AGENTA_HOST is unset, and Lines 135-136 consume that fallback for export. With captureContent defaulting to true, this can send prompt/tool content off-cluster on unconfigured runs. Prefer requiring an explicit endpoint (run config or env) and skipping export when none is configured.
Also applies to: 135-136
| delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; | ||
| assert.equal(policyFromRequest(undefined), "auto"); | ||
| assert.equal(policyFromRequest("auto"), "auto"); | ||
| assert.equal(policyFromRequest("deny"), "deny"); | ||
|
|
||
| process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true"; | ||
| assert.equal(policyFromRequest(undefined), "deny", "env forces deny"); | ||
| assert.equal(policyFromRequest("auto"), "deny", "env overrides auto"); | ||
| delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; | ||
| } |
There was a problem hiding this comment.
Restore the prior env value to keep tests isolated.
Lines 23-32 mutate process.env.AGENTA_RIVET_DENY_PERMISSIONS but do not restore an existing original value, which can leak state into other tests running in the same process.
Suggested fix
{
- delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
- assert.equal(policyFromRequest(undefined), "auto");
- assert.equal(policyFromRequest("auto"), "auto");
- assert.equal(policyFromRequest("deny"), "deny");
-
- process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";
- assert.equal(policyFromRequest(undefined), "deny", "env forces deny");
- assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");
- delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+ const prev = process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+ try {
+ delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+ assert.equal(policyFromRequest(undefined), "auto");
+ assert.equal(policyFromRequest("auto"), "auto");
+ assert.equal(policyFromRequest("deny"), "deny");
+
+ process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true";
+ assert.equal(policyFromRequest(undefined), "deny", "env forces deny");
+ assert.equal(policyFromRequest("auto"), "deny", "env overrides auto");
+ } finally {
+ if (prev === undefined) delete process.env.AGENTA_RIVET_DENY_PERMISSIONS;
+ else process.env.AGENTA_RIVET_DENY_PERMISSIONS = prev;
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; | |
| assert.equal(policyFromRequest(undefined), "auto"); | |
| assert.equal(policyFromRequest("auto"), "auto"); | |
| assert.equal(policyFromRequest("deny"), "deny"); | |
| process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true"; | |
| assert.equal(policyFromRequest(undefined), "deny", "env forces deny"); | |
| assert.equal(policyFromRequest("auto"), "deny", "env overrides auto"); | |
| delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; | |
| } | |
| { | |
| const prev = process.env.AGENTA_RIVET_DENY_PERMISSIONS; | |
| try { | |
| delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; | |
| assert.equal(policyFromRequest(undefined), "auto"); | |
| assert.equal(policyFromRequest("auto"), "auto"); | |
| assert.equal(policyFromRequest("deny"), "deny"); | |
| process.env.AGENTA_RIVET_DENY_PERMISSIONS = "true"; | |
| assert.equal(policyFromRequest(undefined), "deny", "env forces deny"); | |
| assert.equal(policyFromRequest("auto"), "deny", "env overrides auto"); | |
| } finally { | |
| if (prev === undefined) delete process.env.AGENTA_RIVET_DENY_PERMISSIONS; | |
| else process.env.AGENTA_RIVET_DENY_PERMISSIONS = prev; | |
| } | |
| } |
|
Codex subagent follow-up for #4778 Verified: these mirror the #4774 findings. #4778 head
|
Part of the agent-workflows PR set (sliced by functional area, final code only)
Most of these are independent off
main. Two short stacks exist (an indented item stacks on the line above it). Each PR shows only its own area.This PR's base is #4773. Merge that first.
Context
The runner is the TypeScript process that actually drives a coding agent for one
/invoke. The wire contract and tool resolution land in PR #4773 (this PR's base). This PR adds the part that runs the agent: the two engines behind the contract, the HTTP sidecar that fronts them, the tracing that ties a run into the caller's trace, the harness assets, and the production image. With this, the Python service can hand a runner one request and get back a traced agent run.What this changes
This PR adds two engines that satisfy the same
/runcontract:engines/pi.tsdrives the Pi SDK in process. It builds a hermetic session in a throwaway temp dir, injectsAGENTS.mdand forced skills in memory, resolves the model, runs one user turn, and returns the structured result. Provider keys come from the request secrets or the local Pi login.engines/rivet.tsdrives a harness (Pi or Claude) over the Agent Client Protocol through a rivetsandbox-agentdaemon. The harness (which engine) and the sandbox (local or Daytona) are two independent axes that swap by config, not by new code.server.tsis the long-lived HTTP sidecar. It servesGET /healthandPOST /run, supports one-shot JSON and NDJSON streaming, and routes each request to an engine.cli.tsis the one-shot stdin/stdout transport for the same contract.responder.tsis the seam that answers the permission gates a harness raises; today a fixed auto-or-deny policy answers them headlessly.tracing/otel.tsandextensions/agenta.tsturn a run into an OpenTelemetry span tree that nests under the caller's/invokespan.config/andskills/hold the runtime harness assets.docker/holds the production Dockerfile, a dev Dockerfile, and a README that documents the licensing posture.Key architectural decision to review
Split tracing, decided by one switch.
engines/rivet.ts:836setsemitSpans: !isPi || isDaytona. The reason is reachability. Local Pi self-instruments: the runner propagates the caller's trace context into Pi through the Agenta extension (extensions/agenta.ts), and Pi emits its own real span tree (turn, chat, tool, with token usage) under that parent. Every other case has the runner rebuild the spans from the ACP session/update event stream instead. Daytona is the case to scrutinize: the in-sandbox process cannot reach Agenta's OTLP, so even for Pi the extension drops tracing and keeps only tools plus a usage writeback, and the runner traces from the event stream. Check that the two paths produce the same span shape and that usage is never double counted.Flush by trace id.
tracing/otel.tsbuffers a trace's spans and exports them in one OTLP batch (TraceBatchProcessor). Agenta rolls up cumulative token and cost metrics per ingest batch, so a trace split across batches loses its root aggregation. The processor flushes on two signals: the root span ends (a standalone run), or the run flushes explicitly by trace id. The second path is the one that matters here.invoke_agenthas a remote parent that never ends inside the runner, so the root-end signal never fires and the explicit per-trace flush is the only thing that exports the run. CheckonEnd(the!span.parentSpanIdguard atotel.ts:125) against the explicitflushTracecalls in both engines.Auto backend routing.
server.tspicks the engine from the request. An explicitbackendwins, then theAGENT_BACKENDenv default, andautofalls back to the request shape: a request carryingharnessorsandboxgoes to rivet, otherwise to Pi (server.ts:40). Confirm the fallback cannot send a request to the wrong engine.Docker licensing posture.
docker/README.mdanddocker/Dockerfilestate the rule: the image bakes Pi (MIT) through the npm dependencies and never bakes or distributes Claude (proprietary, under Anthropic's Commercial Terms, which grant use but not redistribution). The daemon installs Claude from Anthropic at runtime over HTTPS, which keeps Anthropic as the distributor. No credential is baked. For the Daytona path, the repo ships the snapshot builder recipe, not a prebuilt Claude-containing image. Verify the Dockerfile bakes only Pi and that nothing in the build path embeds Claude or a key.How to review this PR
Read
server.tsfirst. It is the entry point and shows the contract and the routing in one screen. Then readengines/pi.tsfor the simpler in-process path, thenengines/rivet.tsfor the ACP path and theemitSpansswitch. Then readextensions/agenta.tsandtracing/otel.tstogether, since the split-tracing decision only makes sense across both. Finish withdocker/README.mdanddocker/Dockerfilefor the licensing posture.Skip
engines/rivet.tsbinary resolution and the Daytona upload helpers on a first pass; they are plumbing. The likeliest regression sits in the tracing seam: a run whose spans never flush (a missing per-trace flush), or a run that flushes twice and double counts usage.Tests / notes
The tests run offline with no harness and no network.
stream-events.test.tsdrives thecreateRivetOteldelta and lifecycle state machine with a hand-built ACPsession/updatesequence and asserts the streaming and one-shot event shapes.responder.test.tscovers the permission policy and that an out-of-streaminteraction_requestlands in both the live sink and the batch event log.continuation.test.tscovers the cross-turn substrate: tool turns survive into the replayed transcript, which the cold harness needs because ACP prompt content blocks cannot carry tool calls or results.