docs(agent): agent-workflows design wiki, ground truth, and archived POCs by mmabrouk · Pull Request #4779 · Agenta-AI/agenta

mmabrouk · 2026-06-19T16:30:19Z

Part of the agent-workflows PR set (sliced by functional area, final code only)

Most of these are independent off main. Two short stacks exist (an indented item stacks on the line above it). Each PR shows only its own area.

feat(sdk): agent runtime behind backend/harness ports #4771: SDK agent runtime: ports, adapters, tools, messages protocol
- feat(agent): agent workflow service and tool-resolution API #4772: agent workflow service + tool-resolution API
feat(agent): runner wire contract and tool execution #4773: runner wire protocol + tool execution
- feat(agent): runner engines, HTTP server, tracing, and docker image #4778: runner engines, server, tracing, docker image
feat(frontend): agent config playground controls #4775: agent config playground UI
chore(hosting): wire the agent runner sidecar into compose #4776: hosting compose wiring
docs(agent): agent-workflows design wiki, ground truth, and archived POCs #4779: agent-workflows design docs <- you are here

This PR's base is main. Review it on its own.

Context

This is the design wiki for the agent-workflows project. It is an independent docs PR off main and carries no runnable code. Read it before the code PRs in the stack. The pages map what the current code does, what is still missing, and how the remaining work should be sliced. Start here so the behavior PRs are easier to follow.

What this adds

Current-state design pages under docs/design/agent-workflows/. ground-truth.md maps every code surface to its current role and splits the work into implemented, not implemented, and planned. meeting-alignment.md checks the work against the June 18 design discussion. implementation-review.md lists cleanup risks. pr-stack.md proposes the reviewable PR breakpoints. agent-template.md, protocol.md, triggers.md, status.md, ports-and-adapters.md, and sessions.md each cover one axis of the design. adapters/ holds the Pi, Claude Code, and Agenta harness notes.

Two scoped design folders. sdk-local-tools/ covers standalone SDK tool resolution, which is partly built and blocked on LocalBackend. docs/design/vault-named-secrets/ covers user-named project secrets in the vault, storage and management UI only for this iteration.

Archived POC material under trash/. This holds the original work-package notes, research spikes, proof-of-concept code, and superseded RFCs. It is kept for provenance. It is not design truth.

How to review this PR

Read ground-truth.md first. It is the source of truth that the other pages defer to, so it frames everything else. Then read pr-stack.md to see how the remaining work splits into reviewable PRs. Treat trash/ as an archive, not files to read line by line. Open a trash/ note only when a current page points you there for history or rationale.

…POCs

vercel · 2026-06-19T16:30:26Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 19, 2026 8:32pm

coderabbitai · 2026-06-19T16:30:30Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 86b63082-f0ba-4040-bf46-3b2c84576934

📥 Commits

Reviewing files that changed from the base of the PR and between 1087fa2 and 8131d20.

📒 Files selected for processing (6)

docs/design/agent-workflows/README.md
docs/design/agent-workflows/architecture.md
docs/design/agent-workflows/ground-truth.md
docs/design/agent-workflows/implementation-review.md
docs/design/agent-workflows/pr-stack.md
docs/design/agent-workflows/status.md

✅ Files skipped from review due to trivial changes (5)

docs/design/agent-workflows/README.md
docs/design/agent-workflows/status.md
docs/design/agent-workflows/architecture.md
docs/design/agent-workflows/pr-stack.md
docs/design/agent-workflows/ground-truth.md

🚧 Files skipped from review as they are similar to previous changes (1)

docs/design/agent-workflows/implementation-review.md

📝 Walkthrough

Summary by CodeRabbit

Documentation
- Added comprehensive agent-workflows documentation covering overall architecture, the public protocol (invoke/messages/load-session), session behavior (cold replay), and upcoming trigger concepts.
- Published detailed adapter design write-ups for Pi, Claude Code, and the Agenta harness, plus a “ground truth” hub and status/known-gaps guidance.
- Documented SDK local tools work, including review findings and conventions, along with vault named secrets planning and related design decisions.

Walkthrough

This PR adds agent-workflow design documentation, archived historical research and POC notes, sdk-local-tools review artifacts, and a separate planning set for vault named secrets.

Changes

Agent workflows documentation

Layer / File(s)	Summary
Current workflow architecture and contracts `docs/design/agent-workflows/{README.md,architecture.md,ground-truth.md,ports-and-adapters.md,protocol.md,sessions.md,agent-template.md,triggers.md,status.md,meeting-alignment.md,open-issues.md,pr-stack.md,implementation-review.md}`	Documents the current agent workflow runtime, routing, protocol contracts, session model, template shape, trigger concept, open gaps, status, and proposed PR slicing.
Harness and adapter behavior `docs/design/agent-workflows/adapters/{agenta,claude-code,pi}.md`	Describes Pi, Claude, and Agenta harness behavior, including execution paths, prompt layering, tool delivery, tracing, permission handling, and backend support limits.
SDK local tools design set `docs/design/agent-workflows/sdk-local-tools/{README,context,research,plan,codebase-conventions,conventions-review,organization-proposal,status}.md`	Defines the standalone local-tools problem, proposed package organization, conventions, phased rollout, current limitations, and status for local tool resolution work.
SDK local tools review records `docs/design/agent-workflows/sdk-local-tools/review/*`	Adds the review scope, evidence, findings, risk tracking, scorecard, progress, metadata, and summary for the sdk-local-tools review.
Historical architecture and RFC archive `docs/design/agent-workflows/trash/README.md`, `docs/design/agent-workflows/trash/harness-port-redesign/`, `docs/design/agent-workflows/trash/old-rfcs/`, `docs/design/agent-workflows/trash/sdk-local-backend/status.md`	Archives superseded architecture proposals, build plans, RFCs, and older local-backend status material under the trash workspace.
Historical research archive `docs/design/agent-workflows/trash/research/*`	Adds archived research notes covering auth and secrets, Daytona sandboxes, in-memory Pi execution, telemetry, Pi interaction details, and shared-sandbox behavior.
Historical work packages and POCs `docs/design/agent-workflows/trash/wp-/`	Adds archived work-package docs and POC scripts for Pi tracing, agent service experiments, Daytona sandbox runs, tool bridging, and Rivet ACP runtime experiments.

Vault named secrets planning docs

Layer / File(s)	Summary
Vault named secrets design and rollout `docs/design/vault-named-secrets/{README,context,plan,research,status}.md`	Adds planning documentation for a new `custom_secret` vault kind, including backend, migration, generated client, frontend, and verification work.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.81% which is insufficient. The required threshold is 60.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding comprehensive agent-workflows design documentation including ground truth, design pages, and archived POCs.
Description check	✅ Passed	The description is well-organized and directly related to the changeset, explaining what design pages are added, the purpose of ground-truth.md, the scoped design folders, archived material, and review guidance.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/agent-workflows

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mmabrouk · 2026-06-19T16:36:53Z

Reviewer guide: where to look

docs/design/agent-workflows/ground-truth.md (read first). The current implementation map. It ties each code surface to its role and splits the work into implemented, not implemented, and planned. Every other page defers to this one.
docs/design/agent-workflows/pr-stack.md. The proposed breakpoints for turning this work into reviewable stacked PRs, with scope, out-of-scope, and validation per PR. Read it second to see the review order.
docs/design/agent-workflows/agent-template.md. The intended split between generic agent identity, harness-specific config, and runtime infrastructure. This is the contract the template PR should land.
docs/design/agent-workflows/README.md. The "read in this order" index if you want the full tour past the two headline pages.
docs/design/agent-workflows/trash/ (archive, do not read line by line). Historical work-package notes, research spikes, POC code, and superseded RFCs, kept for provenance only. Open a note here only when a current page links to it for history or rationale.

mmabrouk · 2026-06-19T18:15:46Z

@coderabbitai review

coderabbitai · 2026-06-19T18:15:53Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

mmabrouk

Codex subagent review for #4779

I found one correctness issue to resolve before treating this as mergeable design truth:

docs/design/agent-workflows/ground-truth.md:3 and docs/design/agent-workflows/ground-truth.md:10: this page says it is the current implementation map and lists concrete code surfaces, but this PR is independent off main and the referenced code is not present in this PR head. I checked 1087fa2: services/agent/src/server.ts, sdks/python/agenta/sdk/agents/interfaces.py, and services/oss/src/agent/app.py all 404 at this SHA. README.md:3 and status.md:5 reinforce the same current/checked-in framing. If #4779 lands before the code PRs, main gets a ground-truth page that points to absent files and claims implemented behavior that is only present in sibling PRs. Please either make the docs explicitly scoped to the open agent-workflows PR set (including prerequisite/merge-order expectations), or stack/retarget this docs PR onto the code slices it documents so the file references are true at merge time.

Secondary coordination risk:

docs/design/agent-workflows/pr-stack.md:3 and docs/design/agent-workflows/pr-stack.md:14: the reviewer guide points people here for the review order, but the page lists proposed future slices rather than the concrete active PR map (#4771, #4772, #4773, #4775, #4776, #4778, #4779). Since some sibling PR bodies still point at older #4777/#4774 review targets, please add a short active PR set map or rename the page/intro so reviewers can distinguish proposal from live review stack.

I did not see high-level implemented/not-implemented drift against the sibling PR descriptions, assuming these docs are meant to describe the stack state rather than main at this SHA. I did not run a docs build; this was a GitHub content/consistency review.

coderabbitai

Actionable comments posted: 15

🧹 Nitpick comments (10)

docs/design/agent-workflows/sdk-local-tools/review/progress.md (1)

26-37: ⚡ Quick win

Clarify timeline for remediation-phase test counts.

The "Remediation" section reports different test categories and counts than the initial review phase (e.g., 146 SDK vs. 118 earlier). While this aligns with the narrative in summary.md about subsequent fixes, progress.md doesn't explicitly state that these counts reflect validation after findings were addressed. Consider adding a subheading like "Remediation and post-fix validation" or a brief note explaining the temporal shift.

docs/design/agent-workflows/sdk-local-tools/review/scorecard.md (1)

1-31: ⚡ Quick win

Clarify the temporal relationship between scorecard and summary verdicts.

The scorecard states "PASS WITH CONDITIONS" and notes that medium findings "warrant fixes before rollout" (lines 28-30), suggesting conditions are pending. However, summary.md declares "PASS — CONDITIONS RESOLVED" and explicitly states findings "were addressed in the subsequent organization refactor" (line 25). The documents describe two different timepoints—original review vs. post-remediation—but don't make this relationship explicit.

To avoid reader confusion, consider adding a header note or footer that clearly states: "This scorecard reflects the original review verdict before remediation. See summary.md for the post-remediation verdict."

docs/design/agent-workflows/sdk-local-tools/review/summary.md (1)

64-78: ⚡ Quick win

Clarify the "positional-ordering coupling" detail.

The invariant verification section (line 64) lists all six invariants as verified and notes they "all hold." However, progress.md line 20 mentions "one positional-ordering coupling noted," which isn't explained here. This creates ambiguity about whether this coupling:

Is a risk (but not listed in risks.md)

Is an observation (but unexplained)

Was already addressed (but not detailed)

Either elaborate on what this coupling is in the invariants section, or remove the unexplained reference from progress.md.
docs/design/agent-workflows/adapters/claude-code.md (1)
55-55: 💤 Low value

Add language specification to code block.

The fenced code block at line 55 should include a language identifier for syntax highlighting. Based on the context (showing mcp_servers and callback patterns), specify the language:
-```
+```python
 # Example or context here
Source: Linters/SAST tools
docs/design/agent-workflows/adapters/pi.md (1)
110-110: 💤 Low value

Add language specification to code block.

The fenced code block at line 110 showing the span tree structure should include a language identifier for consistency:
-```
+```
 invoke_agent            (AGENT)
   turn N                (CHAIN)
     chat <model>        (LLM)    real token usage from the provider call
     execute_tool <name> (TOOL)   one per tool the turn ran
-```
+```
Alternatively, if this is meant to be read as plain text structure, use `text `` as the language identifier, or leave it as a blockquote/list instead of a code block.

Source: Linters/SAST tools
docs/design/agent-workflows/trash/harness-port-redesign/plan.md (1)
93-93: 💤 Low value

Fix hyphenation in "Cross cutting" section header.

Line 93 uses "Cross cutting" but should use "Cross-cutting" as a compound adjective modifying the following noun.
Fix
-## Cross cutting
+## Cross-cutting
docs/design/agent-workflows/trash/harness-port-redesign/proposal.md (2)
141-141: 💤 Low value

Fix hyphenation: "client side" → "client-side" streaming.

Compound adjective modifying "streaming" should be hyphenated.
Fix
-  client side streaming (WP-4) becomes a small add on.
+  client-side streaming (WP-4) becomes a small add on.
146-146: 💤 Low value

Fix hyphenation: "cold per invoke" sandboxes.

This should be hyphenated to clarify that "cold-per-invoke" modifies "sandboxes".
Fix
-  decide warm vs cold (the WP-8 status calls this out). First class sessions and ACP
-  `session/load` want a daemon that survives between turns, which reopens the per session
-  env and folder jail questions in
+  decide warm vs cold (the WP-8 status calls this out). First class sessions and ACP
+  `session/load` want a daemon that survives between turns, which reopens the per-session
+  env and folder jail questions in
docs/design/agent-workflows/trash/harness-port-redesign/research.md (1)
150-154: 💤 Low value

Add language identifier to code block.

The fenced code block at lines 150–154 is missing a language identifier. This block contains a Rivet capability model list and should be marked as plain text or bash.
Fix
-\`\`\`
+\`\`\`text
 commandExecution, errorEvents, fileAttachments, fileChanges, images, itemStarted,
 mcpTools, permissions, planMode, questions, reasoning, sessionLifecycle, sharedProcess,
 status, streamingDeltas, textMessages, toolCalls, toolResults
-\`\`\`
+\`\`\`
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/research.md (1)
91-91: 💤 Low value

Minor style: consider alternative to overused "exactly".

LanguageTool flags "exactly" as an overused word. Consider rephrasing for variety (e.g., "in the same way" or "identically").
Suggested alternative
-sandbox `env_vars` on Daytona. This is exactly how `DaytonaRunner`
+sandbox `env_vars` on Daytona. This mirrors how `DaytonaRunner`

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 94c1db69-7596-4692-a169-915065ff7567

📥 Commits

Reviewing files that changed from the base of the PR and between a97e608 and 1087fa2.

⛔ Files ignored due to path filters (1)

docs/design/agent-workflows/trash/wp-1-pi-tracing/poc/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (95)

docs/design/agent-workflows/README.md
docs/design/agent-workflows/adapters/agenta.md
docs/design/agent-workflows/adapters/claude-code.md
docs/design/agent-workflows/adapters/pi.md
docs/design/agent-workflows/agent-template.md
docs/design/agent-workflows/architecture.md
docs/design/agent-workflows/ground-truth.md
docs/design/agent-workflows/implementation-review.md
docs/design/agent-workflows/meeting-alignment.md
docs/design/agent-workflows/open-issues.md
docs/design/agent-workflows/ports-and-adapters.md
docs/design/agent-workflows/pr-stack.md
docs/design/agent-workflows/protocol.md
docs/design/agent-workflows/sdk-local-tools/README.md
docs/design/agent-workflows/sdk-local-tools/codebase-conventions.md
docs/design/agent-workflows/sdk-local-tools/context.md
docs/design/agent-workflows/sdk-local-tools/conventions-review.md
docs/design/agent-workflows/sdk-local-tools/organization-proposal.md
docs/design/agent-workflows/sdk-local-tools/plan.md
docs/design/agent-workflows/sdk-local-tools/research.md
docs/design/agent-workflows/sdk-local-tools/review/evidence/app-mcp-reassign.md
docs/design/agent-workflows/sdk-local-tools/review/evidence/attach-orthogonal-mutation.md
docs/design/agent-workflows/sdk-local-tools/review/evidence/description-default-inconsistency.md
docs/design/agent-workflows/sdk-local-tools/review/evidence/gateway-no-logging.md
docs/design/agent-workflows/sdk-local-tools/review/evidence/gateway-orthogonal-untested.md
docs/design/agent-workflows/sdk-local-tools/review/evidence/handler-resolution-error.md
docs/design/agent-workflows/sdk-local-tools/review/findings.md
docs/design/agent-workflows/sdk-local-tools/review/metadata.json
docs/design/agent-workflows/sdk-local-tools/review/plan.md
docs/design/agent-workflows/sdk-local-tools/review/progress.md
docs/design/agent-workflows/sdk-local-tools/review/questions.md
docs/design/agent-workflows/sdk-local-tools/review/risks.md
docs/design/agent-workflows/sdk-local-tools/review/scope.md
docs/design/agent-workflows/sdk-local-tools/review/scorecard.md
docs/design/agent-workflows/sdk-local-tools/review/summary.md
docs/design/agent-workflows/sdk-local-tools/status.md
docs/design/agent-workflows/sessions.md
docs/design/agent-workflows/status.md
docs/design/agent-workflows/trash/README.md
docs/design/agent-workflows/trash/harness-port-redesign/README.md
docs/design/agent-workflows/trash/harness-port-redesign/implementation.md
docs/design/agent-workflows/trash/harness-port-redesign/plan.md
docs/design/agent-workflows/trash/harness-port-redesign/proposal.md
docs/design/agent-workflows/trash/harness-port-redesign/research.md
docs/design/agent-workflows/trash/harness-port-redesign/status.md
docs/design/agent-workflows/trash/old-rfcs/agent-protocol-rfc.md
docs/design/agent-workflows/trash/old-rfcs/streaming-and-sessions.md
docs/design/agent-workflows/trash/research/auth-secrets.md
docs/design/agent-workflows/trash/research/daytona-sandbox.md
docs/design/agent-workflows/trash/research/diskless-in-memory-config.md
docs/design/agent-workflows/trash/research/open-questions.md
docs/design/agent-workflows/trash/research/otel-instrumentation.md
docs/design/agent-workflows/trash/research/pi-interaction.md
docs/design/agent-workflows/trash/research/sandbox-sharing.md
docs/design/agent-workflows/trash/sdk-local-backend/status.md
docs/design/agent-workflows/trash/wp-1-pi-tracing/README.md
docs/design/agent-workflows/trash/wp-1-pi-tracing/integrating-the-tracing-extension.md
docs/design/agent-workflows/trash/wp-1-pi-tracing/poc/.env.example
docs/design/agent-workflows/trash/wp-1-pi-tracing/poc/README.md
docs/design/agent-workflows/trash/wp-1-pi-tracing/poc/agenta-otel.ts
docs/design/agent-workflows/trash/wp-1-pi-tracing/poc/package.json
docs/design/agent-workflows/trash/wp-1-pi-tracing/poc/run.ts
docs/design/agent-workflows/trash/wp-1-pi-tracing/tracing-in-the-agent-service.md
docs/design/agent-workflows/trash/wp-2-agent-service/README.md
docs/design/agent-workflows/trash/wp-2-agent-service/implementation-plan.md
docs/design/agent-workflows/trash/wp-2-agent-service/qa.md
docs/design/agent-workflows/trash/wp-3-daytona-sandbox/README.md
docs/design/agent-workflows/trash/wp-3-daytona-sandbox/poc/README.md
docs/design/agent-workflows/trash/wp-3-daytona-sandbox/poc/bench_coldstart.py
docs/design/agent-workflows/trash/wp-3-daytona-sandbox/poc/build_snapshot.py
docs/design/agent-workflows/trash/wp-3-daytona-sandbox/poc/cleanup.py
docs/design/agent-workflows/trash/wp-3-daytona-sandbox/poc/run_agent.py
docs/design/agent-workflows/trash/wp-4-multi-message-output/README.md
docs/design/agent-workflows/trash/wp-5-chat-vs-completion/README.md
docs/design/agent-workflows/trash/wp-6-workflow-type-and-template/README.md
docs/design/agent-workflows/trash/wp-7-tools/README.md
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/README.md
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/architecture.md
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/context.md
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/isolation-and-fork.md
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/plan.md
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.py
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/commit_agent_config.py
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/debug-events.ts
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/dump-full.ts
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/package.json
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/spike.ts
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/research.md
docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/status.md
docs/design/agent-workflows/triggers.md
docs/design/vault-named-secrets/README.md
docs/design/vault-named-secrets/context.md
docs/design/vault-named-secrets/plan.md
docs/design/vault-named-secrets/research.md
docs/design/vault-named-secrets/status.md

coderabbitai · 2026-06-19T18:27:37Z

+
+# Status
+
+Source of truth for this design effort. Keep it current.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Archive intent conflicts with “source of truth” wording.

Line 5 asks to “keep it current,” but Line 1 marks this file as superseded and historical. Please reword Line 5 to avoid signaling this as active design authority.

Suggested wording

-Source of truth for this design effort. Keep it current. +Historical status snapshot for this design effort at the time. Kept for provenance only.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Source of truth for this design effort. Keep it current.

Historical status snapshot for this design effort at the time. Kept for provenance only.

coderabbitai · 2026-06-19T18:27:37Z

+1. **Ambition: full A to E arc.** Plan all five phases, including first class sessions and
+   retiring the `Runtime.exec` port. See [`plan.md`](plan.md).
+2. **Session model: stay cold and replay.** Keep WP-8's one daemon per invoke. Do not
+   stand up a warm daemon. This avoids the per session env channel and the folder jail.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hyphenate “per-session” for clarity.

Use a compound modifier here to match standard technical writing style.

Suggested edit

-This avoids the per session env channel and the folder jail. +This avoids the per-session env channel and the folder jail.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

stand up a warm daemon. This avoids the per session env channel and the folder jail.

stand up a warm daemon. This avoids the per-session env channel and the folder jail.

🧰 Tools

🪛 LanguageTool

[grammar] ~49-~49: Use a hyphen to join words.
Context: ...nd up a warm daemon. This avoids the per session env channel and the folder jail....

(QB_NEW_EN_HYPHEN)

Source: Linters/SAST tools

coderabbitai · 2026-06-19T18:27:37Z

+```
+            ┌─────────────────────────── client (useChat) ───────────────────────────┐
+            │                                                                          │
+   POST /messages  (Accept: text/event-stream)               POST /load-session        │
+            │                                                          │               │
+            ▼                                                          ▼               │
+   ┌──────────────────┐   AgentEvent stream    ┌───────────────────────────────────┐  │
+   │ agent run         │ ─────────────────────▶│ streaming edge → UI Message Stream │──┘
+   │ (harness loop)    │                        └───────────────────────────────────┘
+   └──────────────────┘                                   persists per turn
+            │                                                     │
+            └──────────────── trace store (ag.session.id) ◀───────┘   load-session reads here
+```
+
+## 3. Relationship to `/invoke`
+
+`/messages` is a new endpoint. It does not change `/invoke`. The generic, stateless workflow
+invoke keeps its exact request and response, and a client that does not run a chat agent never
+touches `/messages`.
+
+`/messages` reuses two things from the workflow contract so the backend does not fork: the
+response envelope (`WorkflowServiceResponse`, with the answer in `data.outputs`) and revision
+resolution (`references`). It diverges from `/invoke` in three ways, which is why it is its own
+endpoint:
+
+1. The conversation is a first-class member, `data.messages`, in the `UIMessage` shape, rather
+   than nested in `data.inputs.messages` as `{role, content}`.
+2. The response can stream as a UI Message Stream (Section 6.2).
+3. A turn belongs to a session (`session_id`, Section 4).
+
+A server **SHOULD** map a `/messages` request onto the same internal agent invocation that
+`/invoke` uses, after lifting `data.messages` and `data.inputs` into the handler's `messages`
+and `inputs` arguments.
+
+## 4. Session model
+
+### 4.1 Identity
+
+A `session_id` is an opaque string scoped to a project. The pair `(project_id, session_id)`
+**MUST** be unique. A bare `session_id` is not a global identifier.
+
+A client-supplied `session_id`:
+
+- **MUST** be treated as an opaque token. A server **MUST NOT** interpolate it into a storage
+  path, a query, or a trace attribute without escaping.
+- **SHOULD** be constrained by the server to a bounded length and a restricted character set.
+  A server **MAY** reject an id outside those bounds with `400 Bad Request`.
+
+### 4.2 Resolution
+
+On receiving a turn, the server resolves the session as follows:
+
+1. If the request omits `session_id`, the server **MUST** mint a new unique id, associate the
+   turn with it, and return that id (Section 6).
+2. If the request supplies a `session_id` that does not exist for the caller's project, the
+   server **MUST** create a session with that id and associate the turn with it.
+3. If the request supplies a `session_id` that exists for the caller's project, the server
+   **MUST** associate the turn with that existing session.
+4. If the request supplies a `session_id` that exists under a **different** project, the
+   server **MUST NOT** resume it. The server **MUST** treat it as case 2 within the caller's
+   own project, or reject the turn. A server **MUST NOT** disclose the existence of a session
+   the caller does not own.
+
+Rule 4 is the ownership boundary. "Resume if it exists" means "resume if it exists and
+belongs to the caller."
+
+### 4.3 Continuation semantics for this version
+
+In this version, associating a turn with a session records the turn under that session for
+tracing and later retrieval. The conversation context the model sees is supplied by the
+`messages` in the request (Section 5.2), not reconstructed from the server's record.
+
+A future version MAY make the server's record authoritative, at which point a turn carries
+only the new message and the server supplies the prior history. The request field is
+unchanged by that evolution. See [streaming-and-sessions.md](streaming-and-sessions.md).
+
+### 4.4 Concurrency
+
+Two turns that create the same new `(project_id, session_id)` concurrently **MUST** resolve
+to a single session. A server **SHOULD** enforce this with a unique constraint and treat the
+losing creation as a resume (case 3).
+
+## 5. Request format (`POST /messages`)
+
+### 5.1 Envelope
+
+```jsonc
+{
+  "session_id": "sess_123",       // OPTIONAL (Section 4)
+  "references":  { ... },         // OPTIONAL: selects the workflow revision (as on /invoke)
+  "data": {
+    "messages":   [ /* UIMessage[] */ ],    // REQUIRED: the conversation (Section 5.2)
+    "inputs":     { "<name>": <value> },     // OPTIONAL: named input variables (Section 5.3)
+    "parameters": { /* agent config */ }     // OPTIONAL (Section 5.4)
+  }
+}
+```
+
+`session_id` sits at the envelope top level, alongside the existing `trace_id` and `span_id`.
+It **MUST NOT** be required in a request header.
+
+`data.messages`, `data.inputs`, and `data.parameters` are siblings. They map onto the agent
+handler's `messages`, `inputs`, and `parameters` arguments. On `/invoke` the conversation is
+nested at `data.inputs.messages`; on `/messages` it is lifted out to `data.messages`, because
+the conversation is the primary input of this endpoint.
+
+### 5.2 `data.messages`
+
+`data.messages` is the conversation as an array of `UIMessage` objects (Appendix B). It is
+REQUIRED. The last element is the new user turn.
+
+In this version the client **MUST** send the full conversation in `data.messages`. Each
+element uses the parts-based `UIMessage` shape (Appendix B), not the `{role, content}` shape
+of `/invoke`.
+
+### 5.3 `data.inputs`
+
+`data.inputs` carries the agent's named input variables for the turn: the workflow's declared
+inputs and any per-turn context the caller supplies (for example a retrieved document or a
+record id). Keys are input names; values are arbitrary JSON. This is the same `inputs` as the
+workflow contract, with the conversation no longer nested inside it.
+
+`data.inputs` is OPTIONAL and MAY be sent on every turn, since its values can change between
+turns.
+
+### 5.4 `data.parameters` and `references`
+
+The agent configuration (instructions, model, tools, harness, sandbox, permission policy)
+travels as on `/invoke`: inline in `data.parameters.agent`, or resolved by the platform from
+`references` when the request targets a stored revision. This protocol does not change that
+resolution.
+
+### 5.5 Content negotiation
+
+The response mode is selected by the `Accept` request header:
+
+| `Accept` | Response |
+| --- | --- |
+| `application/json` (or absent) | Single JSON response (Section 6.1) |
+| `text/event-stream` | UI Message Stream over SSE (Section 6.2) |
+
+A server that cannot satisfy the `Accept` header **MUST** respond `406 Not Acceptable`.
+
+## 6. Response formats
+
+### 6.1 Single JSON response
+
+For `Accept: application/json`, the server returns `200 OK` with a body extending
+`WorkflowServiceResponse`:
+
+```jsonc
+{
+  "trace_id":   "...",
+  "span_id":    "...",
+  "session_id": "sess_123",                 // the resolved id (minted or echoed)
+  "status":     { "code": 200 },
+  "data":       { "outputs": { "role": "assistant", "content": "Berlin." } }
+}
+```
+
+The response **MUST** include `session_id`, set to the resolved session (Section 4). The
+assistant answer rides in `data.outputs` as today. Token usage is not in the body; it is
+recorded on the trace.
+
+### 6.2 UI Message Stream (SSE)
+
+For `Accept: text/event-stream`, the server returns `200 OK` and streams the run in the
+Vercel UI Message Stream format (AI SDK v5/v6).
+
+#### 6.2.1 Response headers
+
+The response **MUST** set:
+
+```
+content-type: text/event-stream
+x-vercel-ai-ui-message-stream: v1
+```
+
+and **SHOULD** set:
+
+```
+cache-control: no-cache
+connection: keep-alive
+x-accel-buffering: no
+```
+
+`x-accel-buffering: no` disables proxy buffering so parts flush immediately.
+
+#### 6.2.2 Framing
+
+Each part is one SSE event: the literal bytes `data: `, followed by the part as compact JSON
+(no insignificant whitespace), followed by `\n\n`.
+
+```
+data: {"type":"text-delta","id":"t1","delta":"Hello"}\n\n
+```
+
+The stream **MUST** terminate with the literal line `data: [DONE]\n\n`.
+
+#### 6.2.3 Part registry
+
+The parts a server emits, with their REQUIRED fields. Fields not listed are OPTIONAL and MAY
+be omitted.
+
+| `type` | Required fields | Meaning |
+| --- | --- | --- |
+| `start` | none | Begin a message. Carries `messageId` and `messageMetadata` (Section 6.2.4). |
+| `start-step` | none | Begin a step of the agent loop. |
+| `finish-step` | none | End the current step. |
+| `finish` | none | End the message. Carries `finishReason`, `messageMetadata`. |
+| `text-start` | `id` | Begin a text block. |
+| `text-delta` | `id`, `delta` | Append `delta` to the text block `id`. |
+| `text-end` | `id` | End the text block. |
+| `reasoning-start` | `id` | Begin a reasoning block. |
+| `reasoning-delta` | `id`, `delta` | Append to the reasoning block. |
+| `reasoning-end` | `id` | End the reasoning block. |
+| `tool-input-start` | `toolCallId`, `toolName` | A tool call begins. |
+| `tool-input-delta` | `toolCallId`, `inputTextDelta` | Append a fragment of the tool arguments (note: `inputTextDelta`, not `delta`). |
+| `tool-input-available` | `toolCallId`, `toolName`, `input` | The full tool arguments are known. |
+| `tool-output-available` | `toolCallId`, `output` | The tool result. |
+| `tool-output-error` | `toolCallId`, `errorText` | The tool failed. |
+| `file` | `url`, `mediaType` | A file or image. `url` MAY be an `https:` or `data:` URL. |
+| `data-<name>` | `data` | An application-defined part (generative UI). MAY carry `id` and `transient`. |
+| `error` | `errorText` | A stream-level error (Section 8.2). |
+
+A server **MUST** order parts so that for any `id` or `toolCallId`, a `*-start` precedes its
+deltas, which precede its `*-end` or `*-available`. Text and reasoning deltas are
+concatenated by `id`. Tool parts are keyed by `toolCallId`.
+
+#### 6.2.4 Session id in the stream
+
+The server **MUST** convey the resolved `session_id` as `messageMetadata.sessionId` on the
+`start` part, which is the first part of the stream:
+
+```
+data: {"type":"start","messageId":"msg_1","messageMetadata":{"sessionId":"sess_123"}}
+```
+
+A server **MAY** additionally mirror `session_id` to a response header. The body remains the
+normative source.
+
+#### 6.2.5 Mapping from agent events
+
+The streaming edge consumes the agent's internal `AgentEvent` stream
+(`services/agent/src/protocol.ts:74`) and emits parts as follows:
+
+| `AgentEvent` | Parts |
+| --- | --- |
+| run start (synthesized) | `start` (with `messageId`, `messageMetadata.sessionId`), then `start-step` |
+| `message` | `text-start`, one or more `text-delta`, `text-end` |
+| `thought` | `reasoning-start`, `reasoning-delta`, `reasoning-end` |
+| `tool_call` | `tool-input-start`, then `tool-input-available` |
+| `tool_result` with `isError=false` | `tool-output-available` |
+| `tool_result` with `isError=true` | `tool-output-error` |
+| `usage` | `messageMetadata` on the `finish` part |
+| `error` | `error` (Section 8.2) |
+| `done` | `finish-step`, then `finish` (`finishReason` = `stopReason`), then `[DONE]` |
+
+A harness that reports `capabilities.streamingDeltas` produces token-level `text-delta`
+parts. A harness that does not produces one `text-delta` carrying the whole text. The wire
+shape is identical, so the client does not distinguish them.
+
+The protocol streams deltas only. There is no full-message snapshot part. The client
+assembles the final `UIMessage` from the parts. The server **SHOULD** record the assembled
+turn on the trace (`ag.session.id`), which is the source `load-session` reads.
+
+## 7. The `load-session` endpoint (`POST /load-session`)
+
+Returns the history of a session so a client can rebuild a conversation it does not hold
+locally.
+
+### 7.1 Request
+
+```jsonc
+{ "session_id": "sess_123" }
+```
+
+`session_id` is REQUIRED. The server **MUST** apply the ownership rule of Section 4.2: if the
+session does not exist for the caller's project, the server **MUST** respond `404 Not Found`
+and **MUST NOT** reveal a session owned by another project.
+
+### 7.2 Response (default, `Accept: application/json`)
+
+The server returns `200 OK` with the conversation as `UIMessage` objects, the shape `useChat`
+accepts as its initial `messages`:
+
+```jsonc
+{
+  "session_id": "sess_123",
+  "messages": [
+    { "id": "m1", "role": "user",      "parts": [ { "type": "text", "text": "capital of France?" } ] },
+    { "id": "m2", "role": "assistant", "parts": [ { "type": "text", "text": "Paris." } ] }
+  ]
+}
+```
+
+### 7.3 Response (negotiated replay, `Accept: text/event-stream`)
+
+A server **MAY** support a delta replay of the stored history under
+`Accept: text/event-stream`, re-emitting the session as a UI Message Stream (Section 6.2).
+This is OPTIONAL. Whether the folded form or the replay is the primary form is left open by
+this draft; a conformant client **SHOULD** request `application/json` for rebuilding a static
+view.
+
+## 8. Error handling
+
+### 8.1 Request and endpoint errors (JSON)
+
+Before a stream begins, the server reports errors with an HTTP status and the existing
+`status` envelope (`WorkflowServiceStatus`: `code`, `message`, `type`, `stacktrace`):
+
+| Status | Condition |
+| --- | --- |
+| `400 Bad Request` | Malformed body, or a `session_id` that violates Section 4.1. |
+| `401 Unauthorized` / `403 Forbidden` | Missing or invalid credentials. |
+| `404 Not Found` | `load-session` on a session the caller does not own. |
+| `406 Not Acceptable` | The `Accept` header cannot be satisfied. |
+| `5xx` | Server failure before streaming starts. |
+
+### 8.2 In-stream errors
+
+A failure after the stream has started **MUST** be reported as an `error` part:
+
+```
+data: {"type":"error","errorText":"the agent run failed: ..."}
+```
+
+After emitting an `error` part, the server **SHOULD** terminate the stream. It **MAY** omit
+the `finish` part. It **SHOULD** still emit `[DONE]` to close the SSE channel cleanly. The
+client surfaces the error to the user.
+
+## 9. Security considerations
+
+- **Session ownership.** Section 4.2 rule 4 is a security requirement, not a convenience.
+  Because a client may supply a `session_id` for an unknown id (case 2), a server that keys
+  sessions on `session_id` alone would let a caller read or extend another tenant's
+  conversation. Servers **MUST** key on `(project_id, session_id)` and scope every resume,
+  every `load-session`, and every existence check to the caller's project.
+- **Opaque ids.** A client-supplied `session_id` is untrusted input. See Section 4.1.
+- **Secrets.** Provider keys and tool credentials travel and resolve as in the current
+  contract. This protocol adds no new secret-bearing field. `inputs` is caller-supplied
+  input and **MUST NOT** be used to smuggle credentials in place of the existing `secrets`
+  and signed-credential mechanisms.
+- **Content negotiation and buffering.** A streaming response disables proxy buffering
+  (Section 6.2.1). Operators **MUST** ensure intermediaries do not re-buffer `text/event-
+  stream` responses, or streaming degrades to a single delayed flush.
+
+## 10. Interaction sequences
+
+### 10.1 New session, streaming turn
+
+```
+client                                  server
+  │  POST /messages                        │
+  │  Accept: text/event-stream             │
+  │  { data:{ messages:[...] } }           │   (no session_id)
+  │───────────────────────────────────────▶│
+  │                                         │  mint sess_123
+  │  200 text/event-stream                  │
+  │  data: {"type":"start",                 │
+  │         "messageMetadata":              │
+  │           {"sessionId":"sess_123"}}     │
+  │◀───────────────────────────────────────│
+  │  data: {"type":"start-step"} ...        │
+  │  ... tool / text parts ...              │
+  │  data: {"type":"finish"}                │
+  │  data: [DONE]                           │
+  │◀───────────────────────────────────────│
+  │  (client stores sess_123 for next turn) │
+```
+
+### 10.2 Returning to a known session
+
+```
+client                                  server
+  │  POST /load-session                     │
+  │  { "session_id": "sess_123" }           │
+  │───────────────────────────────────────▶│  check ownership
+  │  200 { messages: [ UIMessage, ... ] }   │
+  │◀───────────────────────────────────────│
+  │  (render history; hold it)              │
+  │                                         │
+  │  POST /messages                         │
+  │  Accept: text/event-stream              │
+  │  { session_id:"sess_123",               │
+  │    data:{ messages:[...full] } }        │
+  │───────────────────────────────────────▶│  resolve existing sess_123
+  │  200 text/event-stream → parts → [DONE] │
+  │◀───────────────────────────────────────│
+```
+
+## Appendix A: Full stream transcript
+
+One turn: the agent calls a weather tool, reads the result, and answers. Every `data:` line
+in order, each followed by a blank line.
+
+```
+data: {"type":"start","messageId":"msg_1","messageMetadata":{"sessionId":"sess_123"}}
+
+data: {"type":"start-step"}
+
+data: {"type":"tool-input-start","toolCallId":"call_1","toolName":"getWeather"}
+
+data: {"type":"tool-input-available","toolCallId":"call_1","toolName":"getWeather","input":{"city":"Paris"}}
+
+data: {"type":"tool-output-available","toolCallId":"call_1","output":{"weather":"sunny","temp":24}}
+
+data: {"type":"finish-step"}
+
+data: {"type":"start-step"}
+
+data: {"type":"text-start","id":"t1"}
+
+data: {"type":"text-delta","id":"t1","delta":"It is sunny "}
+
+data: {"type":"text-delta","id":"t1","delta":"and 24°C in Paris."}
+
+data: {"type":"text-end","id":"t1"}
+
+data: {"type":"finish-step"}
+
+data: {"type":"finish","messageMetadata":{"usage":{"input":820,"output":36,"cost":0.004}}}
+
+data: [DONE]
+```
+
+## Appendix B: `UIMessage` schema
+
+A message accumulated by the client and accepted by `load-session`:
+
+```jsonc
+{
+  "id":   "m2",
+  "role": "user | assistant | system",
+  "parts": [
+    { "type": "text",        "text": "..." },
+    { "type": "reasoning",   "text": "..." },
+    { "type": "tool-<name>", "toolCallId": "...", "state": "output-available", "input": {}, "output": {} },
+    { "type": "file",        "url": "...", "mediaType": "image/png" },
+    { "type": "data-<name>", "data": { } },
+    { "type": "step-start" }
+  ],
+  "metadata": { }
+}
+```
+
+A `UIMessage` carries no top-level `content` string in v5/v6. All content lives in `parts`.
+
+## Appendix C: References
+
+- RFC 2119, RFC 8174: requirement keywords.
+- RFC 8259: JSON.
+- WHATWG HTML, Server-Sent Events: `text/event-stream`.
+- Vercel AI SDK UI Message Stream (v5/v6): https://ai-sdk.dev, and the chunk schema at
+  https://github.com/vercel/ai/blob/main/packages/ai/src/ui-message-stream/ui-message-chunks.ts
+- Current contract: `sdks/python/agenta/sdk/models/workflows.py`,
+  `sdks/python/agenta/sdk/decorators/routing.py` (Accept negotiation at `:236`).
+- Agent events and session id: `services/agent/src/protocol.ts:74`,
+  `sdks/python/agenta/sdk/agents/dtos.py`, `services/oss/src/agent/app.py`.
+- Design rationale and trade-offs: [streaming-and-sessions.md](streaming-and-sessions.md).
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix markdownlint violations for fenced blocks and code span formatting.

This file currently triggers MD040 across multiple fences and MD038 around the code span near Line 256. Please add explicit fence languages (text, json, jsonc) and remove extra spaces inside the affected code span.

Example fixes

-``` +```text ┌─────────────────────────── client (useChat) ───────────────────────────┐ ... -``` +``` -``` +```text content-type: text/event-stream x-vercel-ai-ui-message-stream: v1 -``` +``` -Each part is one SSE event: the literal bytes `data: `, followed by ... +Each part is one SSE event: the literal bytes `data:`, followed by ...

🧰 Tools

🪛 LanguageTool

[grammar] ~283-~283: Ensure spelling is correct
Context: ..., toolName| A tool call begins. | |tool-input-delta|toolCallId, inputTextDelta`...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~283-~283: Ensure spelling is correct
Context: ... | | tool-input-delta | toolCallId, inputTextDelta | Append a fragment of the tool argumen...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~283-~283: Ensure spelling is correct
Context: ...a fragment of the tool arguments (note: inputTextDelta, not delta). | | `tool-input-availabl...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.22.1)

[warning] 66-66: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 239-239: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 246-246: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 256-256: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 259-259: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 300-300: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 389-389: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 417-417: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 439-439: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 462-462: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 526-526: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Source: Linters/SAST tools

coderabbitai · 2026-06-19T18:27:37Z

+AGENTA_HOST=http://144.76.237.122:8280/
+AGENTA_API_KEY=your-agenta-project-api-key


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid publishing an API-key example over plaintext HTTP to a fixed public IP.

Line 2 currently encourages sending AGENTA_API_KEY over http://..., which can leak credentials in transit and creates a brittle environment default.

🔒 Suggested fix

-AGENTA_HOST=http://144.76.237.122:8280/ +AGENTA_HOST=https://cloud.agenta.ai AGENTA_API_KEY=your-agenta-project-api-key

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

AGENTA_HOST=http://144.76.237.122:8280/

AGENTA_API_KEY=your-agenta-project-api-key

AGENTA_HOST=https://cloud.agenta.ai

AGENTA_API_KEY=your-agenta-project-api-key

🧰 Tools

🪛 dotenv-linter (4.0.0)

[warning] 3-3: [UnorderedKey] The AGENTA_API_KEY key should go before the AGENTA_HOST key

(UnorderedKey)

coderabbitai · 2026-06-19T18:27:37Z

+const toolSpans = new Map<string, Span>();
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Close tool spans even when toolCallId is absent.

At Line 362 a tool span is always started, but at Line 371/375 it is only tracked and closed when toolCallId exists. Missing IDs leave spans open and skew the exported trace.

🛠️ Suggested fix

const toolSpans = new Map<string, Span>(); +const anonymousToolSpans: Span[] = []; @@ pi.on("tool_execution_start", async (event: any) => { @@ - if (event?.toolCallId) toolSpans.set(event.toolCallId, span); + if (event?.toolCallId) toolSpans.set(event.toolCallId, span); + else anonymousToolSpans.push(span); }); @@ pi.on("tool_execution_end", async (event: any) => { - const span = event?.toolCallId ? toolSpans.get(event.toolCallId) : undefined; + const span = event?.toolCallId + ? toolSpans.get(event.toolCallId) + : anonymousToolSpans.shift(); if (!span) return; setOutput(span, toolResultText(event?.result)); if (event?.isError) span.setStatus({ code: SpanStatusCode.ERROR }); span.end(); - toolSpans.delete(event.toolCallId); + if (event?.toolCallId) toolSpans.delete(event.toolCallId); });

Also applies to: 362-380

coderabbitai · 2026-06-19T18:27:38Z

+        t = time.monotonic()
+        sb = daytona.create(
+            CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0),
+            timeout=120,
+        )
+        dt = time.monotonic() - t
+        times.append(dt)
+        print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s  state={sb.state}", flush=True)
+        daytona.delete(sb)
+    results[snap] = times


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guarantee sandbox teardown per run to prevent leaks.

daytona.delete(sb) only runs on the happy path. Any exception after create() can leave sandboxes running and stop the benchmark early.

Suggested fix

for snap in SNAPSHOTS: times: list[float] = [] for i in range(N): - t = time.monotonic() - sb = daytona.create( - CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0), - timeout=120, - ) - dt = time.monotonic() - t - times.append(dt) - print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s state={sb.state}", flush=True) - daytona.delete(sb) + sb = None + try: + t = time.monotonic() + sb = daytona.create( + CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0), + timeout=120, + ) + dt = time.monotonic() - t + times.append(dt) + print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s state={sb.state}", flush=True) + finally: + if sb is not None: + daytona.delete(sb) results[snap] = times

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

t = time.monotonic()

sb = daytona.create(

CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0),

timeout=120,

)

dt = time.monotonic() - t

times.append(dt)

print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s state={sb.state}", flush=True)

daytona.delete(sb)

results[snap] = times

for snap in SNAPSHOTS:

times: list[float] = []

for i in range(N):

sb = None

try:

t = time.monotonic()

sb = daytona.create(

CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0),

timeout=120,

)

dt = time.monotonic() - t

times.append(dt)

print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s state={sb.state}", flush=True)

finally:

if sb is not None:

daytona.delete(sb)

results[snap] = times

coderabbitai · 2026-06-19T18:27:38Z

+def arg(name: str, default: str) -> str:
+    return sys.argv[sys.argv.index(name) + 1] if name in sys.argv else default


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

arg() can crash on missing flag values.

If a user passes --model (or --auth) without a value, sys.argv[index + 1] raises IndexError and exits with a traceback instead of a clear CLI error.

Suggested fix

def arg(name: str, default: str) -> str: - return sys.argv[sys.argv.index(name) + 1] if name in sys.argv else default + if name not in sys.argv: + return default + idx = sys.argv.index(name) + 1 + if idx >= len(sys.argv) or sys.argv[idx].startswith("--"): + raise ValueError(f"Missing value for {name}") + return sys.argv[idx]

coderabbitai · 2026-06-19T18:27:38Z

+            pi_cmd = (
+                f"cd {run_dir} && TMPDIR={run_dir}/tmp "
+                f"pi -p {json.dumps(PROMPT)} "
+                f"--mode json --approve --provider {provider} --model {model} "
+                f"-t read,bash,edit,write,ls "
+                f"--session-dir {run_dir}/.pi-sessions --name {session_id} "
+                f"< /dev/null"
+            )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Shell command assembly is vulnerable to argument injection via --model.

model is user-controlled and interpolated unquoted into pi_cmd. A crafted value can append shell operators and execute unintended commands in the sandbox.

Suggested fix

+import shlex ... pi_cmd = ( - f"cd {run_dir} && TMPDIR={run_dir}/tmp " + f"cd {shlex.quote(run_dir)} && TMPDIR={shlex.quote(run_dir)}/tmp " f"pi -p {json.dumps(PROMPT)} " - f"--mode json --approve --provider {provider} --model {model} " + f"--mode json --approve --provider {shlex.quote(provider)} --model {shlex.quote(model)} " f"-t read,bash,edit,write,ls " - f"--session-dir {run_dir}/.pi-sessions --name {session_id} " + f"--session-dir {shlex.quote(run_dir)}/.pi-sessions --name {shlex.quote(session_id)} " f"< /dev/null" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pi_cmd = (

f"cd {run_dir} && TMPDIR={run_dir}/tmp "

f"pi -p {json.dumps(PROMPT)} "

f"--mode json --approve --provider {provider} --model {model} "

f"-t read,bash,edit,write,ls "

f"--session-dir {run_dir}/.pi-sessions --name {session_id} "

f"< /dev/null"

)

import shlex

pi_cmd = (

f"cd {shlex.quote(run_dir)} && TMPDIR={shlex.quote(run_dir)}/tmp "

f"pi -p {json.dumps(PROMPT)} "

f"--mode json --approve --provider {shlex.quote(provider)} --model {shlex.quote(model)} "

f"-t read,bash,edit,write,ls "

f"--session-dir {shlex.quote(run_dir)}/.pi-sessions --name {shlex.quote(session_id)} "

f"< /dev/null"

)

🧰 Tools

🪛 ast-grep (0.43.0)

[info] 260-260: use jsonify instead of json.dumps for JSON output
Context: json.dumps(PROMPT)
Note: Security best practice.

(use-jsonify)

coderabbitai · 2026-06-19T18:27:38Z

+import os
+import httpx
+
+BASE = os.getenv("AGENTA_HOST", "http://144.76.237.122:8280").rstrip("/")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Unencrypted HTTP default with API key exposure.

Line 15 defaults BASE to http://144.76.237.122:8280 (unencrypted HTTP). When AGENTA_API_KEY is sent to this endpoint at line 28/62, credentials are transmitted in plaintext. Although the AGENTA_HOST environment variable allows override to HTTPS, the hardcoded default should be HTTPS or explicitly documented as development-only.

Recommendation: Either (a) change the default to https://..., (b) add a prominent warning in the script's docstring, or (c) validate that AGENTA_HOST is HTTPS before sending credentials.

coderabbitai · 2026-06-19T18:27:38Z

+        out = resp.json()
+        new = out.get("workflow_revision") or out
+        print("new revision id:", new.get("id"), "version:", new.get("version"))


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Robust error handling for HTTP response.

Line 69 calls resp.json() without checking the status code first. If the POST fails (4xx/5xx), the response may not contain the expected JSON structure, causing .get() on line 70 to mask the real error.

For a POC script, consider adding:

resp.raise_for_status() # after line 67, before line 69

This makes the script fail loudly if the API call fails.

docs(agent): agent-workflows design wiki, ground truth, and archived …

1087fa2

…POCs

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. documentation Improvements or additions to documentation labels Jun 19, 2026

vercel Bot deployed to Preview June 19, 2026 16:30 View deployment

mmabrouk commented Jun 19, 2026

View reviewed changes

This was referenced Jun 19, 2026

docs(agent): agent-workflows design and ground truth #4777

Open

feat(agent): runner engines, server, and tracing #4774

Open

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

docs(agent): clarify active stack docs map

8131d20

vercel Bot deployed to Preview June 19, 2026 20:32 View deployment


		# Status

		Source of truth for this design effort. Keep it current.

	Source of truth for this design effort. Keep it current.
	Historical status snapshot for this design effort at the time. Kept for provenance only.

	stand up a warm daemon. This avoids the per session env channel and the folder jail.
	stand up a warm daemon. This avoids the per-session env channel and the folder jail.

		AGENTA_HOST=http://144.76.237.122:8280/
		AGENTA_API_KEY=your-agenta-project-api-key

		def arg(name: str, default: str) -> str:
		return sys.argv[sys.argv.index(name) + 1] if name in sys.argv else default

Conversation

mmabrouk commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Part of the agent-workflows PR set (sliced by functional area, final code only)

Context

What this adds

How to review this PR

Uh oh!

vercel Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

mmabrouk commented Jun 19, 2026

Reviewer guide: where to look

Uh oh!

mmabrouk commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmabrouk left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mmabrouk commented Jun 19, 2026 •

edited

Loading

vercel Bot commented Jun 19, 2026 •

edited

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading