-
Notifications
You must be signed in to change notification settings - Fork 552
feat(agent): runner engines, server, and tracing #4774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mmabrouk
wants to merge
3
commits into
feat/agent-runner-tools
Choose a base branch
from
feat/agent-runner-engine
base: feat/agent-runner-tools
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| # Agent runner (TypeScript) | ||
|
|
||
| The Node side of the agent workflow service. It runs the actual agent loop and serves one | ||
| contract: a JSON request in, a structured result out. The Python service | ||
| (`services/oss/src/agent/`) decides *what* to run (config, tools, secrets, trace) and calls | ||
| in here; this package *runs* it. It lives in Node because the harnesses (Pi, Claude Code, | ||
| rivet's `sandbox-agent`) are Node libraries with no Python SDK. | ||
|
|
||
| ## How it is invoked | ||
|
|
||
| Two entrypoints, same `/run` contract (see `src/protocol.ts`): | ||
|
|
||
| - **`src/cli.ts`** — one JSON request on stdin, one result on stdout. The Python | ||
| SDK adapters use this subprocess transport when `AGENTA_AGENT_PI_URL` is unset. stdout is | ||
| the result channel only; logs go to stderr. | ||
| - **`src/server.ts`** — the same thing as a long-lived HTTP server on `:8765` | ||
| (`GET /health`, `POST /run`). This is the dockerized agent runner sidecar the Python SDK | ||
| adapters call over HTTP when `AGENTA_AGENT_PI_URL` points at it. The dev image | ||
| (`docker/Dockerfile.dev`) runs `tsx watch src/server.ts`. | ||
|
|
||
| Both route to an engine by the request's `backend` field. | ||
|
|
||
| ## Layout (`src/`) | ||
|
|
||
| ``` | ||
| src/ | ||
| cli.ts entrypoint: stdin/stdout (subprocess transport) | ||
| server.ts entrypoint: HTTP sidecar on :8765 | ||
| protocol.ts the /run wire contract (request, result, events, capabilities) | ||
| engines/ | ||
| pi.ts engine: drive the Pi SDK in-process | ||
| rivet.ts engine: drive a harness over ACP via a rivet sandbox-agent daemon | ||
| tracing/ | ||
| otel.ts turn a run into OpenTelemetry spans nested under /invoke | ||
| tools/ | ||
| callback.ts the one /tools/call HTTP client | ||
| code.ts execute resolved code tools in a scoped subprocess | ||
| dispatch.ts dispatch resolved tools by executor kind | ||
| mcp-bridge.ts build the MCP server config that exposes tools to a harness | ||
| mcp-server.ts the stdio MCP server itself (launched per session by the daemon) | ||
| extensions/ | ||
| agenta.ts the Pi extension (tracing + tools), bundled into dist/ for Pi to load | ||
| ``` | ||
|
|
||
| ## Engines | ||
|
|
||
| - **`pi`** (`engines/pi.ts`) — drives the Pi SDK directly in-process. | ||
| - **`rivet`** (`engines/rivet.ts`) — drives any harness (`pi`, `claude`) over the Agent | ||
| Client Protocol through a rivet `sandbox-agent` daemon, either local or in a Daytona | ||
| sandbox. This is the default on the platform. | ||
|
|
||
| The engine is a deployment choice (`backend` on the wire / `AGENT_BACKEND` env), not a | ||
| harness. Harness choice (`pi`, `claude`, or experimental `agenta`) and sandbox (`local` or | ||
| `daytona`, where supported) are per-run config the Python service sends. | ||
|
|
||
| ## Result | ||
|
|
||
| ```json | ||
| { | ||
| "ok": true, | ||
| "output": "Rome", | ||
| "messages": [{ "role": "assistant", "content": "Rome" }], | ||
| "events": [{ "type": "message", "text": "Rome" }, { "type": "done" }], | ||
| "usage": { "input": 1297, "output": 5, "total": 1302, "cost": 0.0066 }, | ||
| "stopReason": "end_turn", | ||
| "capabilities": { "mcpTools": false, "images": true, "...": "..." }, | ||
| "sessionId": "...", | ||
| "model": "openai-codex/gpt-5.5", | ||
| "traceId": "..." | ||
| } | ||
| ``` | ||
|
|
||
| `runRivet` probes the harness's capabilities and branches on them (for example, tools go | ||
| over MCP only when the harness advertises `mcpTools`); usage and the structured event log | ||
| come back on every run. | ||
|
|
||
| ## Tracing | ||
|
|
||
| When the request carries a `trace` block, the run is exported to Agenta as OpenTelemetry | ||
| spans nested under the caller's `/invoke` span. The Pi path self-instruments via the | ||
| bundled extension (`extensions/agenta.ts`); other harnesses are traced from the rivet ACP | ||
| event stream (`tracing/otel.ts`). The Python `tracing` module fills `trace` in from the | ||
| live workflow span. | ||
|
|
||
| ## Tools | ||
|
|
||
| Tools are resolved in the Python backend and arrive on the request as `customTools` plus a | ||
| `toolCallback`. Delivery is capability-routed: the Pi extension registers them natively; | ||
| other harnesses get them over MCP through `tools/mcp-bridge.ts` + `tools/mcp-server.ts`. | ||
| Either way each call POSTs back to Agenta's `/tools/call` (`tools/callback.ts`), so the | ||
| provider key and connection auth stay server-side. | ||
|
|
||
| ## The extension bundle | ||
|
|
||
| `scripts/build-extension.mjs` esbuild-bundles `src/extensions/agenta.ts` into one | ||
| self-contained `dist/extensions/agenta.js` that Pi can load anywhere (host, the sidecar, a | ||
| Daytona snapshot). The dev image bakes it; rebuild after editing the extension or the | ||
| tracer: | ||
|
|
||
| ```bash | ||
| pnpm run build:extension | ||
| ``` | ||
|
|
||
| ## Auth | ||
|
|
||
| Provider keys arrive as `request.secrets` (resolved from the project vault) or fall back to | ||
| the harness's own login: Pi reads `~/.pi/agent/auth.json` (`pnpm exec pi` then `/login`), | ||
| Claude Code reads `~/.claude`. Set `OPENAI_API_KEY` / `ANTHROPIC_API_KEY` to override. | ||
|
|
||
| ## config/ | ||
|
|
||
| `config/AGENTS.md` and `config/agent.json` are a fallback "hello-world" agent, used only | ||
| when a request arrives with no config. In practice the playground always sends the agent | ||
| revision's config, so these are rarely hit. | ||
|
|
||
| ## Local use | ||
|
|
||
| ```bash | ||
| pnpm install | ||
| echo '{"backend":"pi","messages":[{"role":"user","content":"Hi"}]}' | pnpm run run:cli | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # Hello-world agent | ||
|
|
||
| You are a friendly hello-world agent running on the Agenta agent service. | ||
|
|
||
| - Greet the user warmly. | ||
| - Answer the user's message in one or two short sentences. | ||
| - Do not use tools. Keep replies plain text. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| { | ||
| "model": "gpt-5.5", | ||
| "tools": [] | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| # Agent runner sidecar (sandbox-agent server), production image. | ||
| # | ||
| # Runs the TypeScript runner (src/server.ts) as a long-lived HTTP server on :8765. | ||
| # The Python agent service calls it in-network. Unlike Dockerfile.dev there is no | ||
| # `tsx watch` and no bind mount: the source is baked in. | ||
| # | ||
| # Licensing posture (see docker/README.md): | ||
| # - Pi (@earendil-works/pi-coding-agent, MIT) is baked via the npm dependencies. | ||
| # - Claude Code is proprietary (Anthropic Commercial Terms). It is NEVER baked into | ||
| # this image. The sandbox-agent daemon installs it at runtime from Anthropic over | ||
| # HTTPS (the reason ca-certificates is installed). That keeps Anthropic as the | ||
| # distributor, the only compliant path for an image we build and ship. | ||
| # - No credential is baked: no API key, no OAuth login. Auth is injected at runtime | ||
| # (ANTHROPIC_API_KEY / request secrets; OAuth self-host is a mounted opt-in only). | ||
|
|
||
| FROM node:24-slim | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude | ||
| # Code) over HTTPS using the system trust store, which node:*-slim omits — without this | ||
| # the daemon's `install-agent claude` fails TLS verification. git lets npm/installers | ||
| # fetch git deps. | ||
| RUN apt-get update \ | ||
| && apt-get install -y --no-install-recommends ca-certificates git \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN corepack enable | ||
|
|
||
| # Install deps as a cached layer (manifest + lockfile only). The full dependency set is | ||
| # installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`, | ||
| # both devDependencies. | ||
| COPY package.json pnpm-lock.yaml ./ | ||
| RUN pnpm install --frozen-lockfile | ||
|
|
||
| # Bake the source (no bind mount in production). | ||
| COPY tsconfig.json ./ | ||
| COPY scripts ./scripts | ||
| COPY src ./src | ||
| COPY config ./config | ||
| COPY skills ./skills | ||
|
|
||
| # Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs | ||
| # this baked copy into Pi's agent dir on every run. Rebuild the image after editing | ||
| # src/extensions/agenta.ts or the tracer. | ||
| RUN pnpm run build:extension | ||
|
|
||
| ENV NODE_ENV=production \ | ||
| PORT=8765 | ||
|
|
||
| EXPOSE 8765 | ||
|
|
||
| # Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the | ||
| # container runs as a non-root host uid. | ||
| CMD ["node_modules/.bin/tsx", "src/server.ts"] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| # Pi harness sidecar (WP-2), dev image. | ||
| # | ||
| # Runs the TypeScript Pi wrapper as an HTTP server. The Python agent service calls | ||
| # it in-network. Source is bind-mounted in dev so `tsx watch` hot-reloads; node_modules | ||
| # stays baked into the image. Build context is services/agent. | ||
|
|
||
| FROM node:24-slim | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # CA certificates: the rivet daemon (Rust) downloads harness CLIs (e.g. Claude Code) over | ||
| # HTTPS using the system trust store, which node:*-slim omits — without this the daemon's | ||
| # `install-agent claude` fails TLS verification. git lets npm/installers fetch git deps. | ||
| RUN apt-get update \ | ||
| && apt-get install -y --no-install-recommends ca-certificates git \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN corepack enable | ||
|
|
||
| # Install deps as a cached layer (manifest + lockfile only). | ||
| COPY package.json pnpm-lock.yaml ./ | ||
| RUN pnpm install --frozen-lockfile | ||
|
|
||
| # Fallback copy for non-mounted runs; in dev these are bind-mounted over. | ||
| COPY tsconfig.json ./ | ||
| COPY scripts ./scripts | ||
| COPY src ./src | ||
|
|
||
| # Bundle the Agenta Pi extension (tracing + tools) into dist/. dist/ is NOT bind-mounted | ||
| # in dev, so this baked copy is what runRivet installs into Pi's agent dir. Rebuild the | ||
| # image after editing src/piExtension.ts or src/agenta-otel.ts. | ||
| RUN pnpm run build:extension | ||
|
|
||
| ENV NODE_ENV=development \ | ||
| PORT=8765 | ||
|
|
||
| EXPOSE 8765 | ||
|
|
||
| # Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the | ||
| # container runs as a non-root host uid. | ||
| CMD ["node_modules/.bin/tsx", "watch", "src/server.ts"] | ||
|
Comment on lines
+7
to
+41
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use a non-root runtime user in the dev image too. Line 7-41 leaves the process running as root; switching to a non-root user keeps dev closer to prod hardening and avoids unnecessary privilege. Suggested fix FROM node:24-slim
@@
RUN pnpm run build:extension
ENV NODE_ENV=development \
PORT=8765
+RUN groupadd --system app && useradd --system --gid app --create-home app \
+ && chown -R app:app /app
+USER app
+
EXPOSE 8765
@@
CMD ["node_modules/.bin/tsx", "watch", "src/server.ts"]Source: Linters/SAST tools |
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # Agent sidecar images | ||
|
|
||
| Images for the agent runner sidecar (the `sandbox-agent server` runtime in | ||
| `services/agent/src/server.ts`). The Python service calls it in-network at | ||
| `:8765`. | ||
|
|
||
| - `Dockerfile.dev` — dev image. `tsx watch`, source bind-mounted, hot reload. | ||
| - `Dockerfile` — production image. Source baked in, no watcher. | ||
|
|
||
| ## Licensing posture (read before changing any image or build recipe) | ||
|
|
||
| The rule that shapes every image here: | ||
|
|
||
| > **We ship build recipes, not Claude-containing images, and we never bake a | ||
| > credential into any image.** | ||
|
|
||
| Why: | ||
|
|
||
| - **Pi** (`@earendil-works/pi-coding-agent`) is MIT. We bake it freely via the npm | ||
| dependencies, in every image and snapshot. | ||
| - **Claude Code** is proprietary (© Anthropic PBC, governed by Anthropic's | ||
| [Commercial Terms](https://www.anthropic.com/legal/commercial-terms); | ||
| [legal & compliance](https://code.claude.com/docs/en/legal-and-compliance)). The | ||
| Commercial Terms grant a usage license only. They do not grant any right to | ||
| redistribute, resell, sublicense, or repackage the Services. So an image **we | ||
| build and distribute must not contain Claude Code.** | ||
| - Claude Code is installed **from Anthropic** (`npm install -g | ||
| @anthropic-ai/claude-code`, `https://claude.ai/install.sh`, or the daemon's | ||
| `install-agent claude`). That keeps Anthropic as the distributor, which is the | ||
| permitted path. The production sidecar does this at runtime; a snapshot we build | ||
| for our own use does it at build time. | ||
|
|
||
| ## Authentication | ||
|
|
||
| Auth is injected at runtime, never baked into a layer. | ||
|
|
||
| - **API key (default, and the only option for cloud / multi-tenant).** Set | ||
| `ANTHROPIC_API_KEY` (or pass provider keys as request secrets from the vault). | ||
| Anthropic directs products and services that interact with Claude to use API key | ||
| auth, so this is the path for any Agenta-orchestrated run that serves users. | ||
| - **OAuth subscription (self-host opt-in only).** An individual operator may mount | ||
| their own Claude login (e.g. `~/.claude`) into the container and run with their | ||
| own subscription. This is for personal, individual use of Claude Code, never for | ||
| serving other users, and it is the operator's responsibility. Anthropic restricts | ||
| Free/Pro/Max OAuth to first-party use and forbids third parties routing requests | ||
| through it (enforced since 2026-03). Cloud and multi-tenant deployments must stay | ||
| API-key only. | ||
|
|
||
| We never bake an OAuth login or an API key into an image. | ||
|
|
||
| ## Build recipes (two paths) | ||
|
|
||
| - **Cloud / Daytona (API key).** The Daytona snapshot recipe bakes Pi. Agenta Cloud | ||
| builds and uses its own snapshot internally; self-hosters run the same recipe | ||
| against their own Daytona account. We ship the build script (the recipe), not the | ||
| built snapshot, so we never distribute a Claude-containing artifact. Snapshot | ||
| builder: `docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.py`. | ||
| Today it bases on rivet's `-full` image, which already bundles Claude. That is | ||
| compliant under the recipe-not-image model. **Cleaner-provenance follow-up | ||
| (needs a live Daytona build to verify):** base on a daemon-only rivet image and | ||
| install Claude from Anthropic at build, so the snapshot's Claude comes straight | ||
| from Anthropic rather than from a third party's bundled image. Relocation of the | ||
| builder into this folder is a follow-up. | ||
| - **Self-host (API key, OAuth optional).** Build the production `Dockerfile` (it | ||
| bakes neither Claude nor a credential), then supply auth at runtime: an | ||
| `ANTHROPIC_API_KEY` env var, or, for individual use, a mounted OAuth login dir. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| /** | ||
| * Bundle the Agenta Pi extension into one self-contained file so its OpenTelemetry deps | ||
| * resolve wherever Pi loads it (host, docker sidecar, Daytona snapshot). Pi only accepts | ||
| * `.ts`/`.js` extension files, so we emit `.js` (ESM) with a default export. | ||
| * | ||
| * Run: pnpm run build:extension -> dist/extensions/agenta.js | ||
| */ | ||
| import { build } from "esbuild"; | ||
| import { dirname, join } from "node:path"; | ||
| import { fileURLToPath } from "node:url"; | ||
|
|
||
| const root = join(dirname(fileURLToPath(import.meta.url)), ".."); | ||
|
|
||
| await build({ | ||
| entryPoints: [join(root, "src/extensions/agenta.ts")], | ||
| outfile: join(root, "dist/extensions/agenta.js"), | ||
| bundle: true, | ||
| platform: "node", | ||
| format: "esm", | ||
| target: "node20", | ||
| // Pi provides the ExtensionAPI at load time; never bundle the harness SDK. | ||
| external: ["@earendil-works/pi-coding-agent"], | ||
| banner: { | ||
| // protobufjs and some deps expect CommonJS globals under ESM; shim them. | ||
| js: "import{createRequire as __cr}from'node:module';const require=__cr(import.meta.url);", | ||
| }, | ||
| logLevel: "info", | ||
| }); | ||
|
|
||
| process.stderr.write("[build-extension] wrote dist/extensions/agenta.js\n"); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| --- | ||
| name: agenta-getting-started | ||
| description: Baseline guidance for agents running on the Agenta platform. Use at the start of a task to recall how to work with the tools and skills Agenta provides and how to report results clearly. | ||
| --- | ||
|
|
||
| # Agenta getting started | ||
|
|
||
| This is a placeholder Agenta skill that ships with the `AgentaHarness`. It proves the | ||
| forced-skill path end to end; replace its content with real Agenta guidance. | ||
|
|
||
| ## When to use | ||
|
|
||
| Read this when you begin a task and want a reminder of the Agenta conventions below. | ||
|
|
||
| ## Conventions | ||
|
|
||
| - Prefer the provided tools and skills over guessing; call a tool when one fits. | ||
| - When another skill matches the task, read its `SKILL.md` fully before acting. | ||
| - Keep answers grounded in what the tools and skills actually return. Do not fabricate | ||
| results or tool output. | ||
| - Be concise. State what you did, what it returned, and what is left. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run the production container as a non-root user.
Line 16-55 currently runs the sidecar as root (no
USERset), which weakens container isolation for a network-exposed service process.Suggested fix
FROM node:24-slim WORKDIR /app @@ RUN pnpm run build:extension ENV NODE_ENV=production \ PORT=8765 +RUN groupadd --system app && useradd --system --gid app --create-home app \ + && chown -R app:app /app +USER app + EXPOSE 8765 @@ CMD ["node_modules/.bin/tsx", "src/server.ts"]📝 Committable suggestion
Source: Linters/SAST tools