Agenta-AI · mmabrouk · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026 · coderabbitai
diff --git a/services/agent/README.md b/services/agent/README.md
@@ -0,0 +1,121 @@
+# Agent runner (TypeScript)
+
+The Node side of the agent workflow service. It runs the actual agent loop and serves one
+contract: a JSON request in, a structured result out. The Python service
+(`services/oss/src/agent/`) decides *what* to run (config, tools, secrets, trace) and calls
+in here; this package *runs* it. It lives in Node because the harnesses (Pi, Claude Code,
+rivet's `sandbox-agent`) are Node libraries with no Python SDK.
+
+## How it is invoked
+
+Two entrypoints, same `/run` contract (see `src/protocol.ts`):
+
+- **`src/cli.ts`** — one JSON request on stdin, one result on stdout. The Python
+  SDK adapters use this subprocess transport when `AGENTA_AGENT_PI_URL` is unset. stdout is
+  the result channel only; logs go to stderr.
+- **`src/server.ts`** — the same thing as a long-lived HTTP server on `:8765`
+  (`GET /health`, `POST /run`). This is the dockerized agent runner sidecar the Python SDK
+  adapters call over HTTP when `AGENTA_AGENT_PI_URL` points at it. The dev image
+  (`docker/Dockerfile.dev`) runs `tsx watch src/server.ts`.
+
+Both route to an engine by the request's `backend` field.
+
+## Layout (`src/`)
+
+```
+src/
+  cli.ts              entrypoint: stdin/stdout (subprocess transport)
+  server.ts           entrypoint: HTTP sidecar on :8765
+  protocol.ts         the /run wire contract (request, result, events, capabilities)
+  engines/
+    pi.ts             engine: drive the Pi SDK in-process
+    rivet.ts          engine: drive a harness over ACP via a rivet sandbox-agent daemon
+  tracing/
+    otel.ts           turn a run into OpenTelemetry spans nested under /invoke
+  tools/
+    callback.ts       the one /tools/call HTTP client
+    code.ts           execute resolved code tools in a scoped subprocess
+    dispatch.ts       dispatch resolved tools by executor kind
+    mcp-bridge.ts     build the MCP server config that exposes tools to a harness
+    mcp-server.ts     the stdio MCP server itself (launched per session by the daemon)
+  extensions/
+    agenta.ts         the Pi extension (tracing + tools), bundled into dist/ for Pi to load
+```
+
+## Engines
+
+- **`pi`** (`engines/pi.ts`) — drives the Pi SDK directly in-process.
+- **`rivet`** (`engines/rivet.ts`) — drives any harness (`pi`, `claude`) over the Agent
+  Client Protocol through a rivet `sandbox-agent` daemon, either local or in a Daytona
+  sandbox. This is the default on the platform.
+
+The engine is a deployment choice (`backend` on the wire / `AGENT_BACKEND` env), not a
+harness. Harness choice (`pi`, `claude`, or experimental `agenta`) and sandbox (`local` or
+`daytona`, where supported) are per-run config the Python service sends.
+
+## Result
+
+```json
+{
+  "ok": true,
+  "output": "Rome",
+  "messages": [{ "role": "assistant", "content": "Rome" }],
+  "events": [{ "type": "message", "text": "Rome" }, { "type": "done" }],
+  "usage": { "input": 1297, "output": 5, "total": 1302, "cost": 0.0066 },
+  "stopReason": "end_turn",
+  "capabilities": { "mcpTools": false, "images": true, "...": "..." },
+  "sessionId": "...",
+  "model": "openai-codex/gpt-5.5",
+  "traceId": "..."
+}
+```
+
+`runRivet` probes the harness's capabilities and branches on them (for example, tools go
+over MCP only when the harness advertises `mcpTools`); usage and the structured event log
+come back on every run.
+
+## Tracing
+
+When the request carries a `trace` block, the run is exported to Agenta as OpenTelemetry
+spans nested under the caller's `/invoke` span. The Pi path self-instruments via the
+bundled extension (`extensions/agenta.ts`); other harnesses are traced from the rivet ACP
+event stream (`tracing/otel.ts`). The Python `tracing` module fills `trace` in from the
+live workflow span.
+
+## Tools
+
+Tools are resolved in the Python backend and arrive on the request as `customTools` plus a
+`toolCallback`. Delivery is capability-routed: the Pi extension registers them natively;
+other harnesses get them over MCP through `tools/mcp-bridge.ts` + `tools/mcp-server.ts`.
+Either way each call POSTs back to Agenta's `/tools/call` (`tools/callback.ts`), so the
+provider key and connection auth stay server-side.
+
+## The extension bundle
+
+`scripts/build-extension.mjs` esbuild-bundles `src/extensions/agenta.ts` into one
+self-contained `dist/extensions/agenta.js` that Pi can load anywhere (host, the sidecar, a
+Daytona snapshot). The dev image bakes it; rebuild after editing the extension or the
+tracer:
+
+```bash
+pnpm run build:extension
+```
+
+## Auth
+
+Provider keys arrive as `request.secrets` (resolved from the project vault) or fall back to
+the harness's own login: Pi reads `~/.pi/agent/auth.json` (`pnpm exec pi` then `/login`),
+Claude Code reads `~/.claude`. Set `OPENAI_API_KEY` / `ANTHROPIC_API_KEY` to override.
+
+## config/
+
+`config/AGENTS.md` and `config/agent.json` are a fallback "hello-world" agent, used only
+when a request arrives with no config. In practice the playground always sends the agent
+revision's config, so these are rarely hit.
+
+## Local use
+
+```bash
+pnpm install
+echo '{"backend":"pi","messages":[{"role":"user","content":"Hi"}]}' | pnpm run run:cli
+```
diff --git a/services/agent/config/AGENTS.md b/services/agent/config/AGENTS.md
@@ -0,0 +1,7 @@
+# Hello-world agent
+
+You are a friendly hello-world agent running on the Agenta agent service.
+
+- Greet the user warmly.
+- Answer the user's message in one or two short sentences.
+- Do not use tools. Keep replies plain text.
diff --git a/services/agent/config/agent.json b/services/agent/config/agent.json
@@ -0,0 +1,4 @@
+{
+  "model": "gpt-5.5",
+  "tools": []
+}
diff --git a/services/agent/docker/Dockerfile b/services/agent/docker/Dockerfile
@@ -0,0 +1,55 @@
+# Agent runner sidecar (sandbox-agent server), production image.
+#
+# Runs the TypeScript runner (src/server.ts) as a long-lived HTTP server on :8765.
+# The Python agent service calls it in-network. Unlike Dockerfile.dev there is no
+# `tsx watch` and no bind mount: the source is baked in.
+#
+# Licensing posture (see docker/README.md):
+#   - Pi (@earendil-works/pi-coding-agent, MIT) is baked via the npm dependencies.
+#   - Claude Code is proprietary (Anthropic Commercial Terms). It is NEVER baked into
+#     this image. The sandbox-agent daemon installs it at runtime from Anthropic over
+#     HTTPS (the reason ca-certificates is installed). That keeps Anthropic as the
+#     distributor, the only compliant path for an image we build and ship.
+#   - No credential is baked: no API key, no OAuth login. Auth is injected at runtime
+#     (ANTHROPIC_API_KEY / request secrets; OAuth self-host is a mounted opt-in only).
+
+FROM node:24-slim
+
+WORKDIR /app
+
+# CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude
+# Code) over HTTPS using the system trust store, which node:*-slim omits — without this
+# the daemon's `install-agent claude` fails TLS verification. git lets npm/installers
+# fetch git deps.
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends ca-certificates git \
+    && rm -rf /var/lib/apt/lists/*
+
+RUN corepack enable
+
+# Install deps as a cached layer (manifest + lockfile only). The full dependency set is
+# installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`,
+# both devDependencies.
+COPY package.json pnpm-lock.yaml ./
+RUN pnpm install --frozen-lockfile
+
+# Bake the source (no bind mount in production).
+COPY tsconfig.json ./
+COPY scripts ./scripts
+COPY src ./src
+COPY config ./config
+COPY skills ./skills
+
+# Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs
+# this baked copy into Pi's agent dir on every run. Rebuild the image after editing
+# src/extensions/agenta.ts or the tracer.
+RUN pnpm run build:extension
+
+ENV NODE_ENV=production \
+    PORT=8765
+
+EXPOSE 8765
+
+# Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the
+# container runs as a non-root host uid.
+CMD ["node_modules/.bin/tsx", "src/server.ts"]
-FROM node:24-slim
-
-WORKDIR /app
-
-# CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude
-# Code) over HTTPS using the system trust store, which node:*-slim omits — without this
-# the daemon's `install-agent claude` fails TLS verification. git lets npm/installers
-# fetch git deps.
-RUN apt-get update \
-    && apt-get install -y --no-install-recommends ca-certificates git \
-    && rm -rf /var/lib/apt/lists/*
-
-RUN corepack enable
-
-# Install deps as a cached layer (manifest + lockfile only). The full dependency set is
-# installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`,
-# both devDependencies.
-COPY package.json pnpm-lock.yaml ./
-RUN pnpm install --frozen-lockfile
-
-# Bake the source (no bind mount in production).
-COPY tsconfig.json ./
-COPY scripts ./scripts
-COPY src ./src
-COPY config ./config
-COPY skills ./skills
-
-# Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs
-# this baked copy into Pi's agent dir on every run. Rebuild the image after editing
-# src/extensions/agenta.ts or the tracer.
-RUN pnpm run build:extension
-
-ENV NODE_ENV=production \
-    PORT=8765
-
-EXPOSE 8765
-
-# Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the
-# container runs as a non-root host uid.
-CMD ["node_modules/.bin/tsx", "src/server.ts"]
+FROM node:24-slim
+
+WORKDIR /app
+
+# CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude
+# Code) over HTTPS using the system trust store, which node:*-slim omits — without this
+# the daemon's `install-agent claude` fails TLS verification. git lets npm/installers
+# fetch git deps.
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends ca-certificates git \
+    && rm -rf /var/lib/apt/lists/*
+
+RUN corepack enable
+
+# Install deps as a cached layer (manifest + lockfile only). The full dependency set is
+# installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`,
+# both devDependencies.
+COPY package.json pnpm-lock.yaml ./
+RUN pnpm install --frozen-lockfile
+
+# Bake the source (no bind mount in production).
+COPY tsconfig.json ./
+COPY scripts ./scripts
+COPY src ./src
+COPY config ./config
+COPY skills ./skills
+
+# Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs
+# this baked copy into Pi's agent dir on every run. Rebuild the image after editing
+# src/extensions/agenta.ts or the tracer.
+RUN pnpm run build:extension
+
+ENV NODE_ENV=production \
+    PORT=8765
+
+RUN groupadd --system app && useradd --system --gid app --create-home app \
+    && chown -R app:app /app
+USER app
+
+EXPOSE 8765
+
+# Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the
+# container runs as a non-root host uid.
+CMD ["node_modules/.bin/tsx", "src/server.ts"]
-FROM node:24-slim
-
-WORKDIR /app
-
-# CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude
-# Code) over HTTPS using the system trust store, which node:*-slim omits — without this
-# the daemon's `install-agent claude` fails TLS verification. git lets npm/installers
-# fetch git deps.
-RUN apt-get update \
-    && apt-get install -y --no-install-recommends ca-certificates git \
-    && rm -rf /var/lib/apt/lists/*
-
-RUN corepack enable
-
-# Install deps as a cached layer (manifest + lockfile only). The full dependency set is
-# installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`,
-# both devDependencies.
-COPY package.json pnpm-lock.yaml ./
-RUN pnpm install --frozen-lockfile
-
-# Bake the source (no bind mount in production).
-COPY tsconfig.json ./
-COPY scripts ./scripts
-COPY src ./src
-COPY config ./config
-COPY skills ./skills
-
-# Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs
-# this baked copy into Pi's agent dir on every run. Rebuild the image after editing
-# src/extensions/agenta.ts or the tracer.
-RUN pnpm run build:extension
-
-ENV NODE_ENV=production \
-    PORT=8765
-
-EXPOSE 8765
-
-# Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the
-# container runs as a non-root host uid.
-CMD ["node_modules/.bin/tsx", "src/server.ts"]
+FROM node:24-slim
+
+WORKDIR /app
+
+# CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude
+# Code) over HTTPS using the system trust store, which node:*-slim omits — without this
+# the daemon's `install-agent claude` fails TLS verification. git lets npm/installers
+# fetch git deps.
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends ca-certificates git \
+    && rm -rf /var/lib/apt/lists/*
+
+RUN corepack enable
+
+# Install deps as a cached layer (manifest + lockfile only). The full dependency set is
+# installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`,
+# both devDependencies.
+COPY package.json pnpm-lock.yaml ./
+RUN pnpm install --frozen-lockfile
+
+# Bake the source (no bind mount in production).
+COPY tsconfig.json ./
+COPY scripts ./scripts
+COPY src ./src
+COPY config ./config
+COPY skills ./skills
+
+# Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs
+# this baked copy into Pi's agent dir on every run. Rebuild the image after editing
+# src/extensions/agenta.ts or the tracer.
+RUN pnpm run build:extension
+
+ENV NODE_ENV=production \
+    PORT=8765
+
+RUN groupadd --system app && useradd --system --gid app --create-home app \
+    && chown -R app:app /app
+USER app
+
+EXPOSE 8765
+
+# Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the
+# container runs as a non-root host uid.
+CMD ["node_modules/.bin/tsx", "src/server.ts"]
diff --git a/services/agent/docker/Dockerfile.dev b/services/agent/docker/Dockerfile.dev
@@ -0,0 +1,41 @@
+# Pi harness sidecar (WP-2), dev image.
+#
+# Runs the TypeScript Pi wrapper as an HTTP server. The Python agent service calls
+# it in-network. Source is bind-mounted in dev so `tsx watch` hot-reloads; node_modules
+# stays baked into the image. Build context is services/agent.
+
+FROM node:24-slim
+
+WORKDIR /app
+
+# CA certificates: the rivet daemon (Rust) downloads harness CLIs (e.g. Claude Code) over
+# HTTPS using the system trust store, which node:*-slim omits — without this the daemon's
+# `install-agent claude` fails TLS verification. git lets npm/installers fetch git deps.
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends ca-certificates git \
+    && rm -rf /var/lib/apt/lists/*
+
+RUN corepack enable
+
+# Install deps as a cached layer (manifest + lockfile only).
+COPY package.json pnpm-lock.yaml ./
+RUN pnpm install --frozen-lockfile
+
+# Fallback copy for non-mounted runs; in dev these are bind-mounted over.
+COPY tsconfig.json ./
+COPY scripts ./scripts
+COPY src ./src
+
+# Bundle the Agenta Pi extension (tracing + tools) into dist/. dist/ is NOT bind-mounted
+# in dev, so this baked copy is what runRivet installs into Pi's agent dir. Rebuild the
+# image after editing src/piExtension.ts or src/agenta-otel.ts.
+RUN pnpm run build:extension
+
+ENV NODE_ENV=development \
+    PORT=8765
+
+EXPOSE 8765
+
+# Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the
+# container runs as a non-root host uid.
+CMD ["node_modules/.bin/tsx", "watch", "src/server.ts"]
diff --git a/services/agent/docker/README.md b/services/agent/docker/README.md
@@ -0,0 +1,66 @@
+# Agent sidecar images
+
+Images for the agent runner sidecar (the `sandbox-agent server` runtime in
+`services/agent/src/server.ts`). The Python service calls it in-network at
+`:8765`.
+
+- `Dockerfile.dev` — dev image. `tsx watch`, source bind-mounted, hot reload.
+- `Dockerfile` — production image. Source baked in, no watcher.
+
+## Licensing posture (read before changing any image or build recipe)
+
+The rule that shapes every image here:
+
+> **We ship build recipes, not Claude-containing images, and we never bake a
+> credential into any image.**
+
+Why:
+
+- **Pi** (`@earendil-works/pi-coding-agent`) is MIT. We bake it freely via the npm
+  dependencies, in every image and snapshot.
+- **Claude Code** is proprietary (© Anthropic PBC, governed by Anthropic's
+  [Commercial Terms](https://www.anthropic.com/legal/commercial-terms);
+  [legal & compliance](https://code.claude.com/docs/en/legal-and-compliance)). The
+  Commercial Terms grant a usage license only. They do not grant any right to
+  redistribute, resell, sublicense, or repackage the Services. So an image **we
+  build and distribute must not contain Claude Code.**
+- Claude Code is installed **from Anthropic** (`npm install -g
+  @anthropic-ai/claude-code`, `https://claude.ai/install.sh`, or the daemon's
+  `install-agent claude`). That keeps Anthropic as the distributor, which is the
+  permitted path. The production sidecar does this at runtime; a snapshot we build
+  for our own use does it at build time.
+
+## Authentication
+
+Auth is injected at runtime, never baked into a layer.
+
+- **API key (default, and the only option for cloud / multi-tenant).** Set
+  `ANTHROPIC_API_KEY` (or pass provider keys as request secrets from the vault).
+  Anthropic directs products and services that interact with Claude to use API key
+  auth, so this is the path for any Agenta-orchestrated run that serves users.
+- **OAuth subscription (self-host opt-in only).** An individual operator may mount
+  their own Claude login (e.g. `~/.claude`) into the container and run with their
+  own subscription. This is for personal, individual use of Claude Code, never for
+  serving other users, and it is the operator's responsibility. Anthropic restricts
+  Free/Pro/Max OAuth to first-party use and forbids third parties routing requests
+  through it (enforced since 2026-03). Cloud and multi-tenant deployments must stay
+  API-key only.
+
+We never bake an OAuth login or an API key into an image.
+
+## Build recipes (two paths)
+
+- **Cloud / Daytona (API key).** The Daytona snapshot recipe bakes Pi. Agenta Cloud
+  builds and uses its own snapshot internally; self-hosters run the same recipe
+  against their own Daytona account. We ship the build script (the recipe), not the
+  built snapshot, so we never distribute a Claude-containing artifact. Snapshot
+  builder: `docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.py`.
+  Today it bases on rivet's `-full` image, which already bundles Claude. That is
+  compliant under the recipe-not-image model. **Cleaner-provenance follow-up
+  (needs a live Daytona build to verify):** base on a daemon-only rivet image and
+  install Claude from Anthropic at build, so the snapshot's Claude comes straight
+  from Anthropic rather than from a third party's bundled image. Relocation of the
+  builder into this folder is a follow-up.
+- **Self-host (API key, OAuth optional).** Build the production `Dockerfile` (it
+  bakes neither Claude nor a credential), then supply auth at runtime: an
+  `ANTHROPIC_API_KEY` env var, or, for individual use, a mounted OAuth login dir.
diff --git a/services/agent/scripts/build-extension.mjs b/services/agent/scripts/build-extension.mjs
@@ -0,0 +1,30 @@
+/**
+ * Bundle the Agenta Pi extension into one self-contained file so its OpenTelemetry deps
+ * resolve wherever Pi loads it (host, docker sidecar, Daytona snapshot). Pi only accepts
+ * `.ts`/`.js` extension files, so we emit `.js` (ESM) with a default export.
+ *
+ * Run: pnpm run build:extension  ->  dist/extensions/agenta.js
+ */
+import { build } from "esbuild";
+import { dirname, join } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const root = join(dirname(fileURLToPath(import.meta.url)), "..");
+
+await build({
+  entryPoints: [join(root, "src/extensions/agenta.ts")],
+  outfile: join(root, "dist/extensions/agenta.js"),
+  bundle: true,
+  platform: "node",
+  format: "esm",
+  target: "node20",
+  // Pi provides the ExtensionAPI at load time; never bundle the harness SDK.
+  external: ["@earendil-works/pi-coding-agent"],
+  banner: {
+    // protobufjs and some deps expect CommonJS globals under ESM; shim them.
+    js: "import{createRequire as __cr}from'node:module';const require=__cr(import.meta.url);",
+  },
+  logLevel: "info",
+});
+
+process.stderr.write("[build-extension] wrote dist/extensions/agenta.js\n");
diff --git a/services/agent/skills/agenta-getting-started/SKILL.md b/services/agent/skills/agenta-getting-started/SKILL.md
@@ -0,0 +1,21 @@
+---
+name: agenta-getting-started
+description: Baseline guidance for agents running on the Agenta platform. Use at the start of a task to recall how to work with the tools and skills Agenta provides and how to report results clearly.
+---
+
+# Agenta getting started
+
+This is a placeholder Agenta skill that ships with the `AgentaHarness`. It proves the
+forced-skill path end to end; replace its content with real Agenta guidance.
+
+## When to use
+
+Read this when you begin a task and want a reminder of the Agenta conventions below.
+
+## Conventions
+
+- Prefer the provided tools and skills over guessing; call a tool when one fits.
+- When another skill matches the task, read its `SKILL.md` fully before acting.
+- Keep answers grounded in what the tools and skills actually return. Do not fabricate
+  results or tool output.
+- Be concise. State what you did, what it returned, and what is left.