Agenta-AI · mmabrouk · Jun 19, 2026 · Jun 19, 2026
diff --git a/docs/design/agent-workflows/README.md b/docs/design/agent-workflows/README.md
@@ -0,0 +1,66 @@
+# Agent Workflows
+
+This workspace documents the active agent-workflows PR stack and the work still needed to
+make it production-ready.
+
+The source of truth is the code listed in [Ground Truth](ground-truth.md). Design pages at
+this level describe the active-stack implementation unless they explicitly say "planned",
+"blocked", or "not implemented". The docs PR commit itself is docs-only and does not
+contain every referenced code file. Use [PR Stack](pr-stack.md) to map each code reference
+to the sibling PR that carries it. Historical work-package notes and old RFCs live in
+[trash/](trash/).
+
+## Read In This Order
+
+1. [Ground Truth](ground-truth.md): what the active-stack code does, what is wired, and
+   what is still missing.
+2. [Status](status.md): active-stack cleanup state, decisions, blockers, and next steps.
+3. [Meeting Alignment](meeting-alignment.md): where the active work matches the June 18
+   design discussion, where it diverges, and what still needs to be done.
+4. [Architecture](architecture.md): the service, agent runner sidecar, harnesses, and
+   sandboxes.
+5. [Protocol](protocol.md): `/invoke`, `/messages`, `/load-session`, and the runner `/run`
+   wire contract.
+6. [Ports and Adapters](ports-and-adapters.md): the SDK runtime ports, backend adapters,
+   harness adapters, and browser protocol adapter.
+7. [Agent Template](agent-template.md): the intended split between generic agent identity,
+   harness-specific config, and runtime infrastructure.
+8. [Sessions](sessions.md): cold replay, streaming, session ids, and the missing session
+   store.
+9. [Triggers](triggers.md): planned trigger/event integration and the missing Compose.io
+   POC.
+10. [Pi Adapter](adapters/pi.md): Pi-specific tool delivery, prompt layers, tracing, and
+   usage writeback.
+11. [Claude Code Adapter](adapters/claude-code.md): Claude over ACP, MCP tool delivery,
+   permissions, tracing, and usage.
+12. [Agenta Harness](adapters/agenta.md): the experimental Agenta-flavored Pi harness.
+13. [SDK Local Tools](sdk-local-tools/): planned and partly implemented work for standalone
+   SDK tool resolution. This remains blocked by `LocalBackend`.
+14. [PR Stack](pr-stack.md): functional breakpoints for reviewable stacked PRs.
+15. [Implementation Review](implementation-review.md): high-level cleanup risks and PR
+    slicing notes.
+16. [Open Issues](open-issues.md): deferred decisions that need ownership.
+
+## Active-Stack State
+
+The agent workflow runs a coding harness as an Agenta workflow. It supports:
+
+- A batch `/invoke` path that returns the final assistant message.
+- An agent-only `/messages` path that accepts Vercel `UIMessage` input and can stream a
+  Vercel UI Message Stream over SSE.
+- A `/load-session` route with the right contract but no durable storage by default.
+- Pi and Claude harnesses through the rivet runner.
+- Pi and the experimental `agenta` harness through the in-process Pi backend.
+- Server-resolved tool specs, code tool execution, callback tools, and MCP plumbing behind
+  a feature flag.
+
+The main missing pieces are durable server-owned sessions, future session snapshot
+interfaces, the agent template/config split, trigger integration, a working standalone
+`LocalBackend`, production Agenta harness content, first-class built-in workflow
+registration, and the final cleanup of historical work-package names in comments and docs.
+
+## Trash
+
+[trash/](trash/) holds old work-package notes, research spikes, and superseded RFCs. It is
+kept for archaeology only. Do not treat it as design truth unless a current page links to a
+specific note as background.
diff --git a/docs/design/agent-workflows/adapters/agenta.md b/docs/design/agent-workflows/adapters/agenta.md
@@ -0,0 +1,64 @@
+# The Agenta harness
+
+`AgentaHarness` is Pi with an opinion. It runs on the same engine as the [Pi
+adapter](pi.md) and produces a Pi-shaped config, so it inherits everything Pi does (native
+tools, the system-prompt layers, tracing). What it adds is a fixed set of Agenta-shipped
+extras that the agent author cannot turn off:
+
+- **Forced tools** — always unioned into the agent's resolved tools. At minimum `read`
+  (Pi only renders the skills section when `read` is enabled) and `bash` (so skills can run
+  their helper scripts).
+- **Forced skills** — Agenta-shipped Pi skills loaded on every run.
+- **A base AGENTS.md preamble** — the author's `instructions` are appended after it.
+- **A base persona** — forced onto Pi's `append_system`, with any author-supplied
+  `append_system` appended after it.
+
+Read the [architecture](../architecture.md), [ports and adapters](../ports-and-adapters.md),
+and [Pi adapter](pi.md) pages first. This page assumes them.
+
+## Where the forced bits live
+
+The forced *policy* lives in the SDK harness layer, in one editable module:
+`sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py` (`AGENTA_PREAMBLE`,
+`AGENTA_FORCED_APPEND_SYSTEM`, `AGENTA_FORCED_TOOLS`, `AGENTA_FORCED_SKILLS`). `AgentaHarness`
+(`adapters/harnesses.py`) reads them in `_to_harness_config` and layers them onto the neutral
+`SessionConfig`, exactly where `PiHarness` and `ClaudeHarness` do their own translation.
+
+The forced skill *files* live with the runner that runs Pi, under
+`services/agent/skills/<name>/` (each a directory with a `SKILL.md`). Skills are real files on
+disk because they reference relative scripts and assets, so they cannot ride the wire as
+text. The contract between the two halves is the skill **name**: `AGENTA_FORCED_SKILLS` lists
+names, and each must match a committed directory under the runner's skills root.
+
+## How a skill reaches the model
+
+1. `AgentaHarness._to_harness_config` puts the forced skill names on the `skills` field of
+   the `/run` request (`AgentaAgentConfig.wire_tools`).
+2. The in-process Pi engine (`engines/pi.ts`) resolves each name against its bundled
+   `skills/` root (override with `AGENTA_AGENT_SKILLS_DIR`) and passes the directories to Pi's
+   `DefaultResourceLoader` as `additionalSkillPaths`, with `noSkills: true` so only the
+   bundled skills load (the run stays hermetic, like `noContextFiles`).
+3. Pi loads them, and because the forced `read` tool is enabled, surfaces them in the system
+   prompt. The model reads a skill's `SKILL.md` on demand (progressive disclosure).
+
+## Two prompt layers, kept distinct
+
+This follows Pi's own split (see `PiAgentConfig`): the **persona** ("who the agent is")
+belongs in `append_system`, and **project conventions** belong in `AGENTS.md`. So the Agenta
+persona is a forced `append_system`, while the Agenta base preamble plus the author's
+instructions are the `AGENTS.md`. An author's own `system` / `append_system` (via
+`AgentConfig.harness_options["pi"]`) still apply, layered after the forced persona.
+
+## Selecting it
+
+`agenta` is a harness option alongside `pi` and `claude` (the playground dropdown, the
+`harness` field). It runs on the in-process Pi backend (`InProcessPiBackend` now lists
+`HarnessType.AGENTA` as supported), so `select_backend` keeps `agenta` on the local Pi path.
+
+## Deferred
+
+Only the in-process Pi (local) path is wired. The ACP/rivet path (and therefore the Daytona
+sandbox) does not yet deliver the forced skills — it would teach `runRivet` to read the
+`skills` field and lay the bundled skill directories into the sandbox via the existing
+bundled-file provisioning. Until then, `agenta` with a non-local sandbox raises
+`UnsupportedHarnessError` rather than silently running without its skills.
diff --git a/docs/design/agent-workflows/adapters/claude-code.md b/docs/design/agent-workflows/adapters/claude-code.md
@@ -0,0 +1,95 @@
+# The Claude Code adapter
+
+Claude Code is the second harness. It proves the central claim of this PoC: that swapping
+the agent is one config value. Where the [Pi adapter](pi.md) does much of its work inside Pi
+through an extension, Claude does its work through standard ACP. That makes Claude the
+template for any MCP-capable harness rivet can drive.
+
+Read the [architecture](../architecture.md) and [ports and adapters](../ports-and-adapters.md)
+pages first.
+
+## Running Claude
+
+The daemon resolves the harness id `claude` to the `claude-agent-acp` adapter, which starts
+the `claude` CLI. One operational detail is worth calling out, because it caused a real bug.
+The daemon does not ship the `claude` CLI. It downloads it over HTTPS the first time a run
+asks for Claude. The sidecar image is a slim Node image with no root certificates, so that
+HTTPS download failed until we added `ca-certificates` to the image. With the certs in
+place, the download verifies and Claude runs.
+
+Auth is config, like everything else. Claude authenticates with `ANTHROPIC_API_KEY` from the
+project vault when present, or with an OAuth token (`CLAUDE_CODE_OAUTH_TOKEN`) otherwise. The
+runner turns the common failures into one clear line, so a user sees "add the project's
+Anthropic key" rather than a stack trace.
+
+## Tools over MCP
+
+Claude advertises the `mcpTools` capability, so the runner delivers tools to Claude the
+standard ACP way, over MCP. This is the branch that the [capability probe](../ports-and-adapters.md)
+chooses: deliver over MCP when the harness reports `mcpTools`, not when the harness name is
+something in particular.
+
+The mechanism is a small stdio MCP server (`tools/mcp-server.ts`) that the daemon launches
+and attaches to the session. Its tool bodies POST back to Agenta's `/tools/call` with the
+same callback-tool envelope the Pi path uses. The resolved specs and the callback endpoint reach the
+MCP server through its environment, so nothing tool-specific is written to a file the agent
+can read. The safety property is identical to Pi's: the provider key and the connection auth
+stay server-side, and the agent only ever asks Agenta to run a named tool.
+
+## Permissions
+
+Claude gates tool use behind a permission prompt. In an Agenta run there is no human at the
+keyboard to answer it, so the runner answers for it. By default it auto-approves, because the
+tools are backend-resolved and trusted. The per-run permission policy (or an env override)
+can flip this to deny, which rejects tool use instead. This is handled on
+`session.onPermissionRequest`, a hook Pi does not need because Pi does not gate tools this
+way.
+
+## Tracing from the event stream
+
+Claude does not self-instrument the way Pi does, because we do not load an Agenta extension
+into Claude. So the runner builds the trace itself, from the ACP event stream. It subscribes
+to the session's `session/update` notifications and turns them into the same span tree Pi
+produces:
+
+```
+invoke_agent            (AGENT)
+  turn 0                (CHAIN)
+    chat <model>        (LLM)
+    execute_tool <name> (TOOL)   one per ACP tool_call
+```
+
+This is the general path. Any harness rivet drives that does not bring its own
+instrumentation gets traced this way. Pi is the exception that traces itself; Claude is the
+rule.
+
+## Usage and output
+
+Claude reports usage in two places, so the runner reads both. The per-call input and output
+token split rides on the ACP `PromptResponse`, and the cost rides on the `usage_update`
+event. The runner combines them into the run total, which then rolls onto the workflow span
+the same way Pi's writeback total does.
+
+Output needs one small piece of care. Claude streams text deltas and also periodically
+streams a full cumulative snapshot of the message so far. If the runner naively appended
+everything, the answer would double. The runner detects a snapshot (a chunk that is a
+superset of what it already has) and replaces rather than appends, so the final text is
+correct whether a chunk is a delta or a snapshot.
+
+## Models
+
+Claude ignores a model id meant for another provider. Ask it for `gpt-5.5` and it keeps its
+own default. The runner handles this honestly: when the harness does not accept the requested
+model, the chat span is labelled `chat` rather than falsely claiming a model the run did not
+use.
+
+## What Claude demonstrates
+
+Claude is the proof that the seam works. Adding it took a `ClaudeHarness` (which holds its
+Pi-versus-Claude config mapping) and no change to the workflow handler above the ports; the
+same `RivetBackend` drives it. It also exercises the capability-driven branches the design is
+built on: tools over MCP because it reports `mcpTools`, a permission answer because it gates
+tools, and event-stream tracing because it does not self-instrument. A future harness that
+rivet can drive would reuse this exact path. A future harness that rivet cannot drive would
+instead get its own backend beside `RivetBackend` and `InProcessPiBackend`, behind the same
+`/run` contract.