Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions docs/design/agent-workflows/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Agent Workflows

This workspace documents the active agent-workflows PR stack and the work still needed to
make it production-ready.

The source of truth is the code listed in [Ground Truth](ground-truth.md). Design pages at
this level describe the active-stack implementation unless they explicitly say "planned",
"blocked", or "not implemented". The docs PR commit itself is docs-only and does not
contain every referenced code file. Use [PR Stack](pr-stack.md) to map each code reference
to the sibling PR that carries it. Historical work-package notes and old RFCs live in
[trash/](trash/).

## Read In This Order

1. [Ground Truth](ground-truth.md): what the active-stack code does, what is wired, and
what is still missing.
2. [Status](status.md): active-stack cleanup state, decisions, blockers, and next steps.
3. [Meeting Alignment](meeting-alignment.md): where the active work matches the June 18
design discussion, where it diverges, and what still needs to be done.
4. [Architecture](architecture.md): the service, agent runner sidecar, harnesses, and
sandboxes.
5. [Protocol](protocol.md): `/invoke`, `/messages`, `/load-session`, and the runner `/run`
wire contract.
6. [Ports and Adapters](ports-and-adapters.md): the SDK runtime ports, backend adapters,
harness adapters, and browser protocol adapter.
7. [Agent Template](agent-template.md): the intended split between generic agent identity,
harness-specific config, and runtime infrastructure.
8. [Sessions](sessions.md): cold replay, streaming, session ids, and the missing session
store.
9. [Triggers](triggers.md): planned trigger/event integration and the missing Compose.io
POC.
10. [Pi Adapter](adapters/pi.md): Pi-specific tool delivery, prompt layers, tracing, and
usage writeback.
11. [Claude Code Adapter](adapters/claude-code.md): Claude over ACP, MCP tool delivery,
permissions, tracing, and usage.
12. [Agenta Harness](adapters/agenta.md): the experimental Agenta-flavored Pi harness.
13. [SDK Local Tools](sdk-local-tools/): planned and partly implemented work for standalone
SDK tool resolution. This remains blocked by `LocalBackend`.
14. [PR Stack](pr-stack.md): functional breakpoints for reviewable stacked PRs.
15. [Implementation Review](implementation-review.md): high-level cleanup risks and PR
slicing notes.
16. [Open Issues](open-issues.md): deferred decisions that need ownership.

## Active-Stack State

The agent workflow runs a coding harness as an Agenta workflow. It supports:

- A batch `/invoke` path that returns the final assistant message.
- An agent-only `/messages` path that accepts Vercel `UIMessage` input and can stream a
Vercel UI Message Stream over SSE.
- A `/load-session` route with the right contract but no durable storage by default.
- Pi and Claude harnesses through the rivet runner.
- Pi and the experimental `agenta` harness through the in-process Pi backend.
- Server-resolved tool specs, code tool execution, callback tools, and MCP plumbing behind
a feature flag.

The main missing pieces are durable server-owned sessions, future session snapshot
interfaces, the agent template/config split, trigger integration, a working standalone
`LocalBackend`, production Agenta harness content, first-class built-in workflow
registration, and the final cleanup of historical work-package names in comments and docs.

## Trash

[trash/](trash/) holds old work-package notes, research spikes, and superseded RFCs. It is
kept for archaeology only. Do not treat it as design truth unless a current page links to a
specific note as background.
64 changes: 64 additions & 0 deletions docs/design/agent-workflows/adapters/agenta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# The Agenta harness

`AgentaHarness` is Pi with an opinion. It runs on the same engine as the [Pi
adapter](pi.md) and produces a Pi-shaped config, so it inherits everything Pi does (native
tools, the system-prompt layers, tracing). What it adds is a fixed set of Agenta-shipped
extras that the agent author cannot turn off:

- **Forced tools** — always unioned into the agent's resolved tools. At minimum `read`
(Pi only renders the skills section when `read` is enabled) and `bash` (so skills can run
their helper scripts).
- **Forced skills** — Agenta-shipped Pi skills loaded on every run.
- **A base AGENTS.md preamble** — the author's `instructions` are appended after it.
- **A base persona** — forced onto Pi's `append_system`, with any author-supplied
`append_system` appended after it.

Read the [architecture](../architecture.md), [ports and adapters](../ports-and-adapters.md),
and [Pi adapter](pi.md) pages first. This page assumes them.

## Where the forced bits live

The forced *policy* lives in the SDK harness layer, in one editable module:
`sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py` (`AGENTA_PREAMBLE`,
`AGENTA_FORCED_APPEND_SYSTEM`, `AGENTA_FORCED_TOOLS`, `AGENTA_FORCED_SKILLS`). `AgentaHarness`
(`adapters/harnesses.py`) reads them in `_to_harness_config` and layers them onto the neutral
`SessionConfig`, exactly where `PiHarness` and `ClaudeHarness` do their own translation.

The forced skill *files* live with the runner that runs Pi, under
`services/agent/skills/<name>/` (each a directory with a `SKILL.md`). Skills are real files on
disk because they reference relative scripts and assets, so they cannot ride the wire as
text. The contract between the two halves is the skill **name**: `AGENTA_FORCED_SKILLS` lists
names, and each must match a committed directory under the runner's skills root.

## How a skill reaches the model

1. `AgentaHarness._to_harness_config` puts the forced skill names on the `skills` field of
the `/run` request (`AgentaAgentConfig.wire_tools`).
2. The in-process Pi engine (`engines/pi.ts`) resolves each name against its bundled
`skills/` root (override with `AGENTA_AGENT_SKILLS_DIR`) and passes the directories to Pi's
`DefaultResourceLoader` as `additionalSkillPaths`, with `noSkills: true` so only the
bundled skills load (the run stays hermetic, like `noContextFiles`).
3. Pi loads them, and because the forced `read` tool is enabled, surfaces them in the system
prompt. The model reads a skill's `SKILL.md` on demand (progressive disclosure).

## Two prompt layers, kept distinct

This follows Pi's own split (see `PiAgentConfig`): the **persona** ("who the agent is")
belongs in `append_system`, and **project conventions** belong in `AGENTS.md`. So the Agenta
persona is a forced `append_system`, while the Agenta base preamble plus the author's
instructions are the `AGENTS.md`. An author's own `system` / `append_system` (via
`AgentConfig.harness_options["pi"]`) still apply, layered after the forced persona.

## Selecting it

`agenta` is a harness option alongside `pi` and `claude` (the playground dropdown, the
`harness` field). It runs on the in-process Pi backend (`InProcessPiBackend` now lists
`HarnessType.AGENTA` as supported), so `select_backend` keeps `agenta` on the local Pi path.

## Deferred

Only the in-process Pi (local) path is wired. The ACP/rivet path (and therefore the Daytona
sandbox) does not yet deliver the forced skills — it would teach `runRivet` to read the
`skills` field and lay the bundled skill directories into the sandbox via the existing
bundled-file provisioning. Until then, `agenta` with a non-local sandbox raises
`UnsupportedHarnessError` rather than silently running without its skills.
95 changes: 95 additions & 0 deletions docs/design/agent-workflows/adapters/claude-code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# The Claude Code adapter

Claude Code is the second harness. It proves the central claim of this PoC: that swapping
the agent is one config value. Where the [Pi adapter](pi.md) does much of its work inside Pi
through an extension, Claude does its work through standard ACP. That makes Claude the
template for any MCP-capable harness rivet can drive.

Read the [architecture](../architecture.md) and [ports and adapters](../ports-and-adapters.md)
pages first.

## Running Claude

The daemon resolves the harness id `claude` to the `claude-agent-acp` adapter, which starts
the `claude` CLI. One operational detail is worth calling out, because it caused a real bug.
The daemon does not ship the `claude` CLI. It downloads it over HTTPS the first time a run
asks for Claude. The sidecar image is a slim Node image with no root certificates, so that
HTTPS download failed until we added `ca-certificates` to the image. With the certs in
place, the download verifies and Claude runs.

Auth is config, like everything else. Claude authenticates with `ANTHROPIC_API_KEY` from the
project vault when present, or with an OAuth token (`CLAUDE_CODE_OAUTH_TOKEN`) otherwise. The
runner turns the common failures into one clear line, so a user sees "add the project's
Anthropic key" rather than a stack trace.

## Tools over MCP

Claude advertises the `mcpTools` capability, so the runner delivers tools to Claude the
standard ACP way, over MCP. This is the branch that the [capability probe](../ports-and-adapters.md)
chooses: deliver over MCP when the harness reports `mcpTools`, not when the harness name is
something in particular.

The mechanism is a small stdio MCP server (`tools/mcp-server.ts`) that the daemon launches
and attaches to the session. Its tool bodies POST back to Agenta's `/tools/call` with the
same callback-tool envelope the Pi path uses. The resolved specs and the callback endpoint reach the
MCP server through its environment, so nothing tool-specific is written to a file the agent
can read. The safety property is identical to Pi's: the provider key and the connection auth
stay server-side, and the agent only ever asks Agenta to run a named tool.

## Permissions

Claude gates tool use behind a permission prompt. In an Agenta run there is no human at the
keyboard to answer it, so the runner answers for it. By default it auto-approves, because the
tools are backend-resolved and trusted. The per-run permission policy (or an env override)
can flip this to deny, which rejects tool use instead. This is handled on
`session.onPermissionRequest`, a hook Pi does not need because Pi does not gate tools this
way.

## Tracing from the event stream

Claude does not self-instrument the way Pi does, because we do not load an Agenta extension
into Claude. So the runner builds the trace itself, from the ACP event stream. It subscribes
to the session's `session/update` notifications and turns them into the same span tree Pi
produces:

```
invoke_agent (AGENT)
turn 0 (CHAIN)
chat <model> (LLM)
execute_tool <name> (TOOL) one per ACP tool_call
```

This is the general path. Any harness rivet drives that does not bring its own
instrumentation gets traced this way. Pi is the exception that traces itself; Claude is the
rule.

## Usage and output

Claude reports usage in two places, so the runner reads both. The per-call input and output
token split rides on the ACP `PromptResponse`, and the cost rides on the `usage_update`
event. The runner combines them into the run total, which then rolls onto the workflow span
the same way Pi's writeback total does.

Output needs one small piece of care. Claude streams text deltas and also periodically
streams a full cumulative snapshot of the message so far. If the runner naively appended
everything, the answer would double. The runner detects a snapshot (a chunk that is a
superset of what it already has) and replaces rather than appends, so the final text is
correct whether a chunk is a delta or a snapshot.

## Models

Claude ignores a model id meant for another provider. Ask it for `gpt-5.5` and it keeps its
own default. The runner handles this honestly: when the harness does not accept the requested
model, the chat span is labelled `chat` rather than falsely claiming a model the run did not
use.

## What Claude demonstrates

Claude is the proof that the seam works. Adding it took a `ClaudeHarness` (which holds its
Pi-versus-Claude config mapping) and no change to the workflow handler above the ports; the
same `RivetBackend` drives it. It also exercises the capability-driven branches the design is
built on: tools over MCP because it reports `mcpTools`, a permission answer because it gates
tools, and event-stream tracing because it does not self-instrument. A future harness that
rivet can drive would reuse this exact path. A future harness that rivet cannot drive would
instead get its own backend beside `RivetBackend` and `InProcessPiBackend`, behind the same
`/run` contract.
Loading
Loading