docs(agent): agent-workflows design wiki, ground truth, and archived POCs#4779
docs(agent): agent-workflows design wiki, ground truth, and archived POCs#4779mmabrouk wants to merge 2 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (6)
✅ Files skipped from review due to trivial changes (5)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR adds agent-workflow design documentation, archived historical research and POC notes, sdk-local-tools review artifacts, and a separate planning set for vault named secrets. ChangesAgent workflows documentation
Vault named secrets planning docs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer guide: where to look
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
mmabrouk
left a comment
There was a problem hiding this comment.
Codex subagent review for #4779
I found one correctness issue to resolve before treating this as mergeable design truth:
docs/design/agent-workflows/ground-truth.md:3anddocs/design/agent-workflows/ground-truth.md:10: this page says it is the current implementation map and lists concrete code surfaces, but this PR is independent offmainand the referenced code is not present in this PR head. I checked1087fa2:services/agent/src/server.ts,sdks/python/agenta/sdk/agents/interfaces.py, andservices/oss/src/agent/app.pyall 404 at this SHA.README.md:3andstatus.md:5reinforce the same current/checked-in framing. If #4779 lands before the code PRs,maingets a ground-truth page that points to absent files and claims implemented behavior that is only present in sibling PRs. Please either make the docs explicitly scoped to the open agent-workflows PR set (including prerequisite/merge-order expectations), or stack/retarget this docs PR onto the code slices it documents so the file references are true at merge time.
Secondary coordination risk:
docs/design/agent-workflows/pr-stack.md:3anddocs/design/agent-workflows/pr-stack.md:14: the reviewer guide points people here for the review order, but the page lists proposed future slices rather than the concrete active PR map (#4771, #4772, #4773, #4775, #4776, #4778, #4779). Since some sibling PR bodies still point at older #4777/#4774 review targets, please add a short active PR set map or rename the page/intro so reviewers can distinguish proposal from live review stack.
I did not see high-level implemented/not-implemented drift against the sibling PR descriptions, assuming these docs are meant to describe the stack state rather than main at this SHA. I did not run a docs build; this was a GitHub content/consistency review.
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (10)
docs/design/agent-workflows/sdk-local-tools/review/progress.md (1)
26-37: ⚡ Quick winClarify timeline for remediation-phase test counts.
The "Remediation" section reports different test categories and counts than the initial review phase (e.g., 146 SDK vs. 118 earlier). While this aligns with the narrative in
summary.mdabout subsequent fixes,progress.mddoesn't explicitly state that these counts reflect validation after findings were addressed. Consider adding a subheading like "Remediation and post-fix validation" or a brief note explaining the temporal shift.docs/design/agent-workflows/sdk-local-tools/review/scorecard.md (1)
1-31: ⚡ Quick winClarify the temporal relationship between scorecard and summary verdicts.
The scorecard states "PASS WITH CONDITIONS" and notes that medium findings "warrant fixes before rollout" (lines 28-30), suggesting conditions are pending. However,
summary.mddeclares "PASS — CONDITIONS RESOLVED" and explicitly states findings "were addressed in the subsequent organization refactor" (line 25). The documents describe two different timepoints—original review vs. post-remediation—but don't make this relationship explicit.To avoid reader confusion, consider adding a header note or footer that clearly states: "This scorecard reflects the original review verdict before remediation. See
summary.mdfor the post-remediation verdict."docs/design/agent-workflows/sdk-local-tools/review/summary.md (1)
64-78: ⚡ Quick winClarify the "positional-ordering coupling" detail.
The invariant verification section (line 64) lists all six invariants as verified and notes they "all hold." However,
progress.mdline 20 mentions "one positional-ordering coupling noted," which isn't explained here. This creates ambiguity about whether this coupling:
- Is a risk (but not listed in
risks.md)- Is an observation (but unexplained)
- Was already addressed (but not detailed)
Either elaborate on what this coupling is in the invariants section, or remove the unexplained reference from
progress.md.docs/design/agent-workflows/adapters/claude-code.md (1)
55-55: 💤 Low valueAdd language specification to code block.
The fenced code block at line 55 should include a language identifier for syntax highlighting. Based on the context (showing mcp_servers and callback patterns), specify the language:
-``` +```python # Example or context hereSource: Linters/SAST tools
docs/design/agent-workflows/adapters/pi.md (1)
110-110: 💤 Low valueAdd language specification to code block.
The fenced code block at line 110 showing the span tree structure should include a language identifier for consistency:
-``` +``` invoke_agent (AGENT) turn N (CHAIN) chat <model> (LLM) real token usage from the provider call execute_tool <name> (TOOL) one per tool the turn ran -``` +```Alternatively, if this is meant to be read as plain text structure, use
`text`` as the language identifier, or leave it as a blockquote/list instead of a code block.Source: Linters/SAST tools
docs/design/agent-workflows/trash/harness-port-redesign/plan.md (1)
93-93: 💤 Low valueFix hyphenation in "Cross cutting" section header.
Line 93 uses "Cross cutting" but should use "Cross-cutting" as a compound adjective modifying the following noun.
Fix
-## Cross cutting +## Cross-cuttingdocs/design/agent-workflows/trash/harness-port-redesign/proposal.md (2)
141-141: 💤 Low valueFix hyphenation: "client side" → "client-side" streaming.
Compound adjective modifying "streaming" should be hyphenated.
Fix
- client side streaming (WP-4) becomes a small add on. + client-side streaming (WP-4) becomes a small add on.
146-146: 💤 Low valueFix hyphenation: "cold per invoke" sandboxes.
This should be hyphenated to clarify that "cold-per-invoke" modifies "sandboxes".
Fix
- decide warm vs cold (the WP-8 status calls this out). First class sessions and ACP - `session/load` want a daemon that survives between turns, which reopens the per session - env and folder jail questions in + decide warm vs cold (the WP-8 status calls this out). First class sessions and ACP + `session/load` want a daemon that survives between turns, which reopens the per-session + env and folder jail questions indocs/design/agent-workflows/trash/harness-port-redesign/research.md (1)
150-154: 💤 Low valueAdd language identifier to code block.
The fenced code block at lines 150–154 is missing a language identifier. This block contains a Rivet capability model list and should be marked as plain text or bash.
Fix
-\`\`\` +\`\`\`text commandExecution, errorEvents, fileAttachments, fileChanges, images, itemStarted, mcpTools, permissions, planMode, questions, reasoning, sessionLifecycle, sharedProcess, status, streamingDeltas, textMessages, toolCalls, toolResults -\`\`\` +\`\`\`docs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/research.md (1)
91-91: 💤 Low valueMinor style: consider alternative to overused "exactly".
LanguageTool flags "exactly" as an overused word. Consider rephrasing for variety (e.g., "in the same way" or "identically").
Suggested alternative
-sandbox `env_vars` on Daytona. This is exactly how `DaytonaRunner` +sandbox `env_vars` on Daytona. This mirrors how `DaytonaRunner`
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 94c1db69-7596-4692-a169-915065ff7567
⛔ Files ignored due to path filters (1)
docs/design/agent-workflows/trash/wp-1-pi-tracing/poc/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (95)
docs/design/agent-workflows/README.mddocs/design/agent-workflows/adapters/agenta.mddocs/design/agent-workflows/adapters/claude-code.mddocs/design/agent-workflows/adapters/pi.mddocs/design/agent-workflows/agent-template.mddocs/design/agent-workflows/architecture.mddocs/design/agent-workflows/ground-truth.mddocs/design/agent-workflows/implementation-review.mddocs/design/agent-workflows/meeting-alignment.mddocs/design/agent-workflows/open-issues.mddocs/design/agent-workflows/ports-and-adapters.mddocs/design/agent-workflows/pr-stack.mddocs/design/agent-workflows/protocol.mddocs/design/agent-workflows/sdk-local-tools/README.mddocs/design/agent-workflows/sdk-local-tools/codebase-conventions.mddocs/design/agent-workflows/sdk-local-tools/context.mddocs/design/agent-workflows/sdk-local-tools/conventions-review.mddocs/design/agent-workflows/sdk-local-tools/organization-proposal.mddocs/design/agent-workflows/sdk-local-tools/plan.mddocs/design/agent-workflows/sdk-local-tools/research.mddocs/design/agent-workflows/sdk-local-tools/review/evidence/app-mcp-reassign.mddocs/design/agent-workflows/sdk-local-tools/review/evidence/attach-orthogonal-mutation.mddocs/design/agent-workflows/sdk-local-tools/review/evidence/description-default-inconsistency.mddocs/design/agent-workflows/sdk-local-tools/review/evidence/gateway-no-logging.mddocs/design/agent-workflows/sdk-local-tools/review/evidence/gateway-orthogonal-untested.mddocs/design/agent-workflows/sdk-local-tools/review/evidence/handler-resolution-error.mddocs/design/agent-workflows/sdk-local-tools/review/findings.mddocs/design/agent-workflows/sdk-local-tools/review/metadata.jsondocs/design/agent-workflows/sdk-local-tools/review/plan.mddocs/design/agent-workflows/sdk-local-tools/review/progress.mddocs/design/agent-workflows/sdk-local-tools/review/questions.mddocs/design/agent-workflows/sdk-local-tools/review/risks.mddocs/design/agent-workflows/sdk-local-tools/review/scope.mddocs/design/agent-workflows/sdk-local-tools/review/scorecard.mddocs/design/agent-workflows/sdk-local-tools/review/summary.mddocs/design/agent-workflows/sdk-local-tools/status.mddocs/design/agent-workflows/sessions.mddocs/design/agent-workflows/status.mddocs/design/agent-workflows/trash/README.mddocs/design/agent-workflows/trash/harness-port-redesign/README.mddocs/design/agent-workflows/trash/harness-port-redesign/implementation.mddocs/design/agent-workflows/trash/harness-port-redesign/plan.mddocs/design/agent-workflows/trash/harness-port-redesign/proposal.mddocs/design/agent-workflows/trash/harness-port-redesign/research.mddocs/design/agent-workflows/trash/harness-port-redesign/status.mddocs/design/agent-workflows/trash/old-rfcs/agent-protocol-rfc.mddocs/design/agent-workflows/trash/old-rfcs/streaming-and-sessions.mddocs/design/agent-workflows/trash/research/auth-secrets.mddocs/design/agent-workflows/trash/research/daytona-sandbox.mddocs/design/agent-workflows/trash/research/diskless-in-memory-config.mddocs/design/agent-workflows/trash/research/open-questions.mddocs/design/agent-workflows/trash/research/otel-instrumentation.mddocs/design/agent-workflows/trash/research/pi-interaction.mddocs/design/agent-workflows/trash/research/sandbox-sharing.mddocs/design/agent-workflows/trash/sdk-local-backend/status.mddocs/design/agent-workflows/trash/wp-1-pi-tracing/README.mddocs/design/agent-workflows/trash/wp-1-pi-tracing/integrating-the-tracing-extension.mddocs/design/agent-workflows/trash/wp-1-pi-tracing/poc/.env.exampledocs/design/agent-workflows/trash/wp-1-pi-tracing/poc/README.mddocs/design/agent-workflows/trash/wp-1-pi-tracing/poc/agenta-otel.tsdocs/design/agent-workflows/trash/wp-1-pi-tracing/poc/package.jsondocs/design/agent-workflows/trash/wp-1-pi-tracing/poc/run.tsdocs/design/agent-workflows/trash/wp-1-pi-tracing/tracing-in-the-agent-service.mddocs/design/agent-workflows/trash/wp-2-agent-service/README.mddocs/design/agent-workflows/trash/wp-2-agent-service/implementation-plan.mddocs/design/agent-workflows/trash/wp-2-agent-service/qa.mddocs/design/agent-workflows/trash/wp-3-daytona-sandbox/README.mddocs/design/agent-workflows/trash/wp-3-daytona-sandbox/poc/README.mddocs/design/agent-workflows/trash/wp-3-daytona-sandbox/poc/bench_coldstart.pydocs/design/agent-workflows/trash/wp-3-daytona-sandbox/poc/build_snapshot.pydocs/design/agent-workflows/trash/wp-3-daytona-sandbox/poc/cleanup.pydocs/design/agent-workflows/trash/wp-3-daytona-sandbox/poc/run_agent.pydocs/design/agent-workflows/trash/wp-4-multi-message-output/README.mddocs/design/agent-workflows/trash/wp-5-chat-vs-completion/README.mddocs/design/agent-workflows/trash/wp-6-workflow-type-and-template/README.mddocs/design/agent-workflows/trash/wp-7-tools/README.mddocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/README.mddocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/architecture.mddocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/context.mddocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/isolation-and-fork.mddocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/plan.mddocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.pydocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/commit_agent_config.pydocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/debug-events.tsdocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/dump-full.tsdocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/package.jsondocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/poc/spike.tsdocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/research.mddocs/design/agent-workflows/trash/wp-8-rivet-acp-runtime/status.mddocs/design/agent-workflows/triggers.mddocs/design/vault-named-secrets/README.mddocs/design/vault-named-secrets/context.mddocs/design/vault-named-secrets/plan.mddocs/design/vault-named-secrets/research.mddocs/design/vault-named-secrets/status.md
|
|
||
| # Status | ||
|
|
||
| Source of truth for this design effort. Keep it current. |
There was a problem hiding this comment.
Archive intent conflicts with “source of truth” wording.
Line 5 asks to “keep it current,” but Line 1 marks this file as superseded and historical. Please reword Line 5 to avoid signaling this as active design authority.
Suggested wording
-Source of truth for this design effort. Keep it current.
+Historical status snapshot for this design effort at the time. Kept for provenance only.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Source of truth for this design effort. Keep it current. | |
| Historical status snapshot for this design effort at the time. Kept for provenance only. |
| 1. **Ambition: full A to E arc.** Plan all five phases, including first class sessions and | ||
| retiring the `Runtime.exec` port. See [`plan.md`](plan.md). | ||
| 2. **Session model: stay cold and replay.** Keep WP-8's one daemon per invoke. Do not | ||
| stand up a warm daemon. This avoids the per session env channel and the folder jail. |
There was a problem hiding this comment.
Hyphenate “per-session” for clarity.
Use a compound modifier here to match standard technical writing style.
Suggested edit
-This avoids the per session env channel and the folder jail.
+This avoids the per-session env channel and the folder jail.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| stand up a warm daemon. This avoids the per session env channel and the folder jail. | |
| stand up a warm daemon. This avoids the per-session env channel and the folder jail. |
🧰 Tools
🪛 LanguageTool
[grammar] ~49-~49: Use a hyphen to join words.
Context: ...nd up a warm daemon. This avoids the per session env channel and the folder jail....
(QB_NEW_EN_HYPHEN)
Source: Linters/SAST tools
| ``` | ||
| ┌─────────────────────────── client (useChat) ───────────────────────────┐ | ||
| │ │ | ||
| POST /messages (Accept: text/event-stream) POST /load-session │ | ||
| │ │ │ | ||
| ▼ ▼ │ | ||
| ┌──────────────────┐ AgentEvent stream ┌───────────────────────────────────┐ │ | ||
| │ agent run │ ─────────────────────▶│ streaming edge → UI Message Stream │──┘ | ||
| │ (harness loop) │ └───────────────────────────────────┘ | ||
| └──────────────────┘ persists per turn | ||
| │ │ | ||
| └──────────────── trace store (ag.session.id) ◀───────┘ load-session reads here | ||
| ``` | ||
|
|
||
| ## 3. Relationship to `/invoke` | ||
|
|
||
| `/messages` is a new endpoint. It does not change `/invoke`. The generic, stateless workflow | ||
| invoke keeps its exact request and response, and a client that does not run a chat agent never | ||
| touches `/messages`. | ||
|
|
||
| `/messages` reuses two things from the workflow contract so the backend does not fork: the | ||
| response envelope (`WorkflowServiceResponse`, with the answer in `data.outputs`) and revision | ||
| resolution (`references`). It diverges from `/invoke` in three ways, which is why it is its own | ||
| endpoint: | ||
|
|
||
| 1. The conversation is a first-class member, `data.messages`, in the `UIMessage` shape, rather | ||
| than nested in `data.inputs.messages` as `{role, content}`. | ||
| 2. The response can stream as a UI Message Stream (Section 6.2). | ||
| 3. A turn belongs to a session (`session_id`, Section 4). | ||
|
|
||
| A server **SHOULD** map a `/messages` request onto the same internal agent invocation that | ||
| `/invoke` uses, after lifting `data.messages` and `data.inputs` into the handler's `messages` | ||
| and `inputs` arguments. | ||
|
|
||
| ## 4. Session model | ||
|
|
||
| ### 4.1 Identity | ||
|
|
||
| A `session_id` is an opaque string scoped to a project. The pair `(project_id, session_id)` | ||
| **MUST** be unique. A bare `session_id` is not a global identifier. | ||
|
|
||
| A client-supplied `session_id`: | ||
|
|
||
| - **MUST** be treated as an opaque token. A server **MUST NOT** interpolate it into a storage | ||
| path, a query, or a trace attribute without escaping. | ||
| - **SHOULD** be constrained by the server to a bounded length and a restricted character set. | ||
| A server **MAY** reject an id outside those bounds with `400 Bad Request`. | ||
|
|
||
| ### 4.2 Resolution | ||
|
|
||
| On receiving a turn, the server resolves the session as follows: | ||
|
|
||
| 1. If the request omits `session_id`, the server **MUST** mint a new unique id, associate the | ||
| turn with it, and return that id (Section 6). | ||
| 2. If the request supplies a `session_id` that does not exist for the caller's project, the | ||
| server **MUST** create a session with that id and associate the turn with it. | ||
| 3. If the request supplies a `session_id` that exists for the caller's project, the server | ||
| **MUST** associate the turn with that existing session. | ||
| 4. If the request supplies a `session_id` that exists under a **different** project, the | ||
| server **MUST NOT** resume it. The server **MUST** treat it as case 2 within the caller's | ||
| own project, or reject the turn. A server **MUST NOT** disclose the existence of a session | ||
| the caller does not own. | ||
|
|
||
| Rule 4 is the ownership boundary. "Resume if it exists" means "resume if it exists and | ||
| belongs to the caller." | ||
|
|
||
| ### 4.3 Continuation semantics for this version | ||
|
|
||
| In this version, associating a turn with a session records the turn under that session for | ||
| tracing and later retrieval. The conversation context the model sees is supplied by the | ||
| `messages` in the request (Section 5.2), not reconstructed from the server's record. | ||
|
|
||
| A future version MAY make the server's record authoritative, at which point a turn carries | ||
| only the new message and the server supplies the prior history. The request field is | ||
| unchanged by that evolution. See [streaming-and-sessions.md](streaming-and-sessions.md). | ||
|
|
||
| ### 4.4 Concurrency | ||
|
|
||
| Two turns that create the same new `(project_id, session_id)` concurrently **MUST** resolve | ||
| to a single session. A server **SHOULD** enforce this with a unique constraint and treat the | ||
| losing creation as a resume (case 3). | ||
|
|
||
| ## 5. Request format (`POST /messages`) | ||
|
|
||
| ### 5.1 Envelope | ||
|
|
||
| ```jsonc | ||
| { | ||
| "session_id": "sess_123", // OPTIONAL (Section 4) | ||
| "references": { ... }, // OPTIONAL: selects the workflow revision (as on /invoke) | ||
| "data": { | ||
| "messages": [ /* UIMessage[] */ ], // REQUIRED: the conversation (Section 5.2) | ||
| "inputs": { "<name>": <value> }, // OPTIONAL: named input variables (Section 5.3) | ||
| "parameters": { /* agent config */ } // OPTIONAL (Section 5.4) | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| `session_id` sits at the envelope top level, alongside the existing `trace_id` and `span_id`. | ||
| It **MUST NOT** be required in a request header. | ||
|
|
||
| `data.messages`, `data.inputs`, and `data.parameters` are siblings. They map onto the agent | ||
| handler's `messages`, `inputs`, and `parameters` arguments. On `/invoke` the conversation is | ||
| nested at `data.inputs.messages`; on `/messages` it is lifted out to `data.messages`, because | ||
| the conversation is the primary input of this endpoint. | ||
|
|
||
| ### 5.2 `data.messages` | ||
|
|
||
| `data.messages` is the conversation as an array of `UIMessage` objects (Appendix B). It is | ||
| REQUIRED. The last element is the new user turn. | ||
|
|
||
| In this version the client **MUST** send the full conversation in `data.messages`. Each | ||
| element uses the parts-based `UIMessage` shape (Appendix B), not the `{role, content}` shape | ||
| of `/invoke`. | ||
|
|
||
| ### 5.3 `data.inputs` | ||
|
|
||
| `data.inputs` carries the agent's named input variables for the turn: the workflow's declared | ||
| inputs and any per-turn context the caller supplies (for example a retrieved document or a | ||
| record id). Keys are input names; values are arbitrary JSON. This is the same `inputs` as the | ||
| workflow contract, with the conversation no longer nested inside it. | ||
|
|
||
| `data.inputs` is OPTIONAL and MAY be sent on every turn, since its values can change between | ||
| turns. | ||
|
|
||
| ### 5.4 `data.parameters` and `references` | ||
|
|
||
| The agent configuration (instructions, model, tools, harness, sandbox, permission policy) | ||
| travels as on `/invoke`: inline in `data.parameters.agent`, or resolved by the platform from | ||
| `references` when the request targets a stored revision. This protocol does not change that | ||
| resolution. | ||
|
|
||
| ### 5.5 Content negotiation | ||
|
|
||
| The response mode is selected by the `Accept` request header: | ||
|
|
||
| | `Accept` | Response | | ||
| | --- | --- | | ||
| | `application/json` (or absent) | Single JSON response (Section 6.1) | | ||
| | `text/event-stream` | UI Message Stream over SSE (Section 6.2) | | ||
|
|
||
| A server that cannot satisfy the `Accept` header **MUST** respond `406 Not Acceptable`. | ||
|
|
||
| ## 6. Response formats | ||
|
|
||
| ### 6.1 Single JSON response | ||
|
|
||
| For `Accept: application/json`, the server returns `200 OK` with a body extending | ||
| `WorkflowServiceResponse`: | ||
|
|
||
| ```jsonc | ||
| { | ||
| "trace_id": "...", | ||
| "span_id": "...", | ||
| "session_id": "sess_123", // the resolved id (minted or echoed) | ||
| "status": { "code": 200 }, | ||
| "data": { "outputs": { "role": "assistant", "content": "Berlin." } } | ||
| } | ||
| ``` | ||
|
|
||
| The response **MUST** include `session_id`, set to the resolved session (Section 4). The | ||
| assistant answer rides in `data.outputs` as today. Token usage is not in the body; it is | ||
| recorded on the trace. | ||
|
|
||
| ### 6.2 UI Message Stream (SSE) | ||
|
|
||
| For `Accept: text/event-stream`, the server returns `200 OK` and streams the run in the | ||
| Vercel UI Message Stream format (AI SDK v5/v6). | ||
|
|
||
| #### 6.2.1 Response headers | ||
|
|
||
| The response **MUST** set: | ||
|
|
||
| ``` | ||
| content-type: text/event-stream | ||
| x-vercel-ai-ui-message-stream: v1 | ||
| ``` | ||
|
|
||
| and **SHOULD** set: | ||
|
|
||
| ``` | ||
| cache-control: no-cache | ||
| connection: keep-alive | ||
| x-accel-buffering: no | ||
| ``` | ||
|
|
||
| `x-accel-buffering: no` disables proxy buffering so parts flush immediately. | ||
|
|
||
| #### 6.2.2 Framing | ||
|
|
||
| Each part is one SSE event: the literal bytes `data: `, followed by the part as compact JSON | ||
| (no insignificant whitespace), followed by `\n\n`. | ||
|
|
||
| ``` | ||
| data: {"type":"text-delta","id":"t1","delta":"Hello"}\n\n | ||
| ``` | ||
|
|
||
| The stream **MUST** terminate with the literal line `data: [DONE]\n\n`. | ||
|
|
||
| #### 6.2.3 Part registry | ||
|
|
||
| The parts a server emits, with their REQUIRED fields. Fields not listed are OPTIONAL and MAY | ||
| be omitted. | ||
|
|
||
| | `type` | Required fields | Meaning | | ||
| | --- | --- | --- | | ||
| | `start` | none | Begin a message. Carries `messageId` and `messageMetadata` (Section 6.2.4). | | ||
| | `start-step` | none | Begin a step of the agent loop. | | ||
| | `finish-step` | none | End the current step. | | ||
| | `finish` | none | End the message. Carries `finishReason`, `messageMetadata`. | | ||
| | `text-start` | `id` | Begin a text block. | | ||
| | `text-delta` | `id`, `delta` | Append `delta` to the text block `id`. | | ||
| | `text-end` | `id` | End the text block. | | ||
| | `reasoning-start` | `id` | Begin a reasoning block. | | ||
| | `reasoning-delta` | `id`, `delta` | Append to the reasoning block. | | ||
| | `reasoning-end` | `id` | End the reasoning block. | | ||
| | `tool-input-start` | `toolCallId`, `toolName` | A tool call begins. | | ||
| | `tool-input-delta` | `toolCallId`, `inputTextDelta` | Append a fragment of the tool arguments (note: `inputTextDelta`, not `delta`). | | ||
| | `tool-input-available` | `toolCallId`, `toolName`, `input` | The full tool arguments are known. | | ||
| | `tool-output-available` | `toolCallId`, `output` | The tool result. | | ||
| | `tool-output-error` | `toolCallId`, `errorText` | The tool failed. | | ||
| | `file` | `url`, `mediaType` | A file or image. `url` MAY be an `https:` or `data:` URL. | | ||
| | `data-<name>` | `data` | An application-defined part (generative UI). MAY carry `id` and `transient`. | | ||
| | `error` | `errorText` | A stream-level error (Section 8.2). | | ||
|
|
||
| A server **MUST** order parts so that for any `id` or `toolCallId`, a `*-start` precedes its | ||
| deltas, which precede its `*-end` or `*-available`. Text and reasoning deltas are | ||
| concatenated by `id`. Tool parts are keyed by `toolCallId`. | ||
|
|
||
| #### 6.2.4 Session id in the stream | ||
|
|
||
| The server **MUST** convey the resolved `session_id` as `messageMetadata.sessionId` on the | ||
| `start` part, which is the first part of the stream: | ||
|
|
||
| ``` | ||
| data: {"type":"start","messageId":"msg_1","messageMetadata":{"sessionId":"sess_123"}} | ||
| ``` | ||
|
|
||
| A server **MAY** additionally mirror `session_id` to a response header. The body remains the | ||
| normative source. | ||
|
|
||
| #### 6.2.5 Mapping from agent events | ||
|
|
||
| The streaming edge consumes the agent's internal `AgentEvent` stream | ||
| (`services/agent/src/protocol.ts:74`) and emits parts as follows: | ||
|
|
||
| | `AgentEvent` | Parts | | ||
| | --- | --- | | ||
| | run start (synthesized) | `start` (with `messageId`, `messageMetadata.sessionId`), then `start-step` | | ||
| | `message` | `text-start`, one or more `text-delta`, `text-end` | | ||
| | `thought` | `reasoning-start`, `reasoning-delta`, `reasoning-end` | | ||
| | `tool_call` | `tool-input-start`, then `tool-input-available` | | ||
| | `tool_result` with `isError=false` | `tool-output-available` | | ||
| | `tool_result` with `isError=true` | `tool-output-error` | | ||
| | `usage` | `messageMetadata` on the `finish` part | | ||
| | `error` | `error` (Section 8.2) | | ||
| | `done` | `finish-step`, then `finish` (`finishReason` = `stopReason`), then `[DONE]` | | ||
|
|
||
| A harness that reports `capabilities.streamingDeltas` produces token-level `text-delta` | ||
| parts. A harness that does not produces one `text-delta` carrying the whole text. The wire | ||
| shape is identical, so the client does not distinguish them. | ||
|
|
||
| The protocol streams deltas only. There is no full-message snapshot part. The client | ||
| assembles the final `UIMessage` from the parts. The server **SHOULD** record the assembled | ||
| turn on the trace (`ag.session.id`), which is the source `load-session` reads. | ||
|
|
||
| ## 7. The `load-session` endpoint (`POST /load-session`) | ||
|
|
||
| Returns the history of a session so a client can rebuild a conversation it does not hold | ||
| locally. | ||
|
|
||
| ### 7.1 Request | ||
|
|
||
| ```jsonc | ||
| { "session_id": "sess_123" } | ||
| ``` | ||
|
|
||
| `session_id` is REQUIRED. The server **MUST** apply the ownership rule of Section 4.2: if the | ||
| session does not exist for the caller's project, the server **MUST** respond `404 Not Found` | ||
| and **MUST NOT** reveal a session owned by another project. | ||
|
|
||
| ### 7.2 Response (default, `Accept: application/json`) | ||
|
|
||
| The server returns `200 OK` with the conversation as `UIMessage` objects, the shape `useChat` | ||
| accepts as its initial `messages`: | ||
|
|
||
| ```jsonc | ||
| { | ||
| "session_id": "sess_123", | ||
| "messages": [ | ||
| { "id": "m1", "role": "user", "parts": [ { "type": "text", "text": "capital of France?" } ] }, | ||
| { "id": "m2", "role": "assistant", "parts": [ { "type": "text", "text": "Paris." } ] } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| ### 7.3 Response (negotiated replay, `Accept: text/event-stream`) | ||
|
|
||
| A server **MAY** support a delta replay of the stored history under | ||
| `Accept: text/event-stream`, re-emitting the session as a UI Message Stream (Section 6.2). | ||
| This is OPTIONAL. Whether the folded form or the replay is the primary form is left open by | ||
| this draft; a conformant client **SHOULD** request `application/json` for rebuilding a static | ||
| view. | ||
|
|
||
| ## 8. Error handling | ||
|
|
||
| ### 8.1 Request and endpoint errors (JSON) | ||
|
|
||
| Before a stream begins, the server reports errors with an HTTP status and the existing | ||
| `status` envelope (`WorkflowServiceStatus`: `code`, `message`, `type`, `stacktrace`): | ||
|
|
||
| | Status | Condition | | ||
| | --- | --- | | ||
| | `400 Bad Request` | Malformed body, or a `session_id` that violates Section 4.1. | | ||
| | `401 Unauthorized` / `403 Forbidden` | Missing or invalid credentials. | | ||
| | `404 Not Found` | `load-session` on a session the caller does not own. | | ||
| | `406 Not Acceptable` | The `Accept` header cannot be satisfied. | | ||
| | `5xx` | Server failure before streaming starts. | | ||
|
|
||
| ### 8.2 In-stream errors | ||
|
|
||
| A failure after the stream has started **MUST** be reported as an `error` part: | ||
|
|
||
| ``` | ||
| data: {"type":"error","errorText":"the agent run failed: ..."} | ||
| ``` | ||
|
|
||
| After emitting an `error` part, the server **SHOULD** terminate the stream. It **MAY** omit | ||
| the `finish` part. It **SHOULD** still emit `[DONE]` to close the SSE channel cleanly. The | ||
| client surfaces the error to the user. | ||
|
|
||
| ## 9. Security considerations | ||
|
|
||
| - **Session ownership.** Section 4.2 rule 4 is a security requirement, not a convenience. | ||
| Because a client may supply a `session_id` for an unknown id (case 2), a server that keys | ||
| sessions on `session_id` alone would let a caller read or extend another tenant's | ||
| conversation. Servers **MUST** key on `(project_id, session_id)` and scope every resume, | ||
| every `load-session`, and every existence check to the caller's project. | ||
| - **Opaque ids.** A client-supplied `session_id` is untrusted input. See Section 4.1. | ||
| - **Secrets.** Provider keys and tool credentials travel and resolve as in the current | ||
| contract. This protocol adds no new secret-bearing field. `inputs` is caller-supplied | ||
| input and **MUST NOT** be used to smuggle credentials in place of the existing `secrets` | ||
| and signed-credential mechanisms. | ||
| - **Content negotiation and buffering.** A streaming response disables proxy buffering | ||
| (Section 6.2.1). Operators **MUST** ensure intermediaries do not re-buffer `text/event- | ||
| stream` responses, or streaming degrades to a single delayed flush. | ||
|
|
||
| ## 10. Interaction sequences | ||
|
|
||
| ### 10.1 New session, streaming turn | ||
|
|
||
| ``` | ||
| client server | ||
| │ POST /messages │ | ||
| │ Accept: text/event-stream │ | ||
| │ { data:{ messages:[...] } } │ (no session_id) | ||
| │───────────────────────────────────────▶│ | ||
| │ │ mint sess_123 | ||
| │ 200 text/event-stream │ | ||
| │ data: {"type":"start", │ | ||
| │ "messageMetadata": │ | ||
| │ {"sessionId":"sess_123"}} │ | ||
| │◀───────────────────────────────────────│ | ||
| │ data: {"type":"start-step"} ... │ | ||
| │ ... tool / text parts ... │ | ||
| │ data: {"type":"finish"} │ | ||
| │ data: [DONE] │ | ||
| │◀───────────────────────────────────────│ | ||
| │ (client stores sess_123 for next turn) │ | ||
| ``` | ||
|
|
||
| ### 10.2 Returning to a known session | ||
|
|
||
| ``` | ||
| client server | ||
| │ POST /load-session │ | ||
| │ { "session_id": "sess_123" } │ | ||
| │───────────────────────────────────────▶│ check ownership | ||
| │ 200 { messages: [ UIMessage, ... ] } │ | ||
| │◀───────────────────────────────────────│ | ||
| │ (render history; hold it) │ | ||
| │ │ | ||
| │ POST /messages │ | ||
| │ Accept: text/event-stream │ | ||
| │ { session_id:"sess_123", │ | ||
| │ data:{ messages:[...full] } } │ | ||
| │───────────────────────────────────────▶│ resolve existing sess_123 | ||
| │ 200 text/event-stream → parts → [DONE] │ | ||
| │◀───────────────────────────────────────│ | ||
| ``` | ||
|
|
||
| ## Appendix A: Full stream transcript | ||
|
|
||
| One turn: the agent calls a weather tool, reads the result, and answers. Every `data:` line | ||
| in order, each followed by a blank line. | ||
|
|
||
| ``` | ||
| data: {"type":"start","messageId":"msg_1","messageMetadata":{"sessionId":"sess_123"}} | ||
|
|
||
| data: {"type":"start-step"} | ||
|
|
||
| data: {"type":"tool-input-start","toolCallId":"call_1","toolName":"getWeather"} | ||
|
|
||
| data: {"type":"tool-input-available","toolCallId":"call_1","toolName":"getWeather","input":{"city":"Paris"}} | ||
|
|
||
| data: {"type":"tool-output-available","toolCallId":"call_1","output":{"weather":"sunny","temp":24}} | ||
|
|
||
| data: {"type":"finish-step"} | ||
|
|
||
| data: {"type":"start-step"} | ||
|
|
||
| data: {"type":"text-start","id":"t1"} | ||
|
|
||
| data: {"type":"text-delta","id":"t1","delta":"It is sunny "} | ||
|
|
||
| data: {"type":"text-delta","id":"t1","delta":"and 24°C in Paris."} | ||
|
|
||
| data: {"type":"text-end","id":"t1"} | ||
|
|
||
| data: {"type":"finish-step"} | ||
|
|
||
| data: {"type":"finish","messageMetadata":{"usage":{"input":820,"output":36,"cost":0.004}}} | ||
|
|
||
| data: [DONE] | ||
| ``` | ||
|
|
||
| ## Appendix B: `UIMessage` schema | ||
|
|
||
| A message accumulated by the client and accepted by `load-session`: | ||
|
|
||
| ```jsonc | ||
| { | ||
| "id": "m2", | ||
| "role": "user | assistant | system", | ||
| "parts": [ | ||
| { "type": "text", "text": "..." }, | ||
| { "type": "reasoning", "text": "..." }, | ||
| { "type": "tool-<name>", "toolCallId": "...", "state": "output-available", "input": {}, "output": {} }, | ||
| { "type": "file", "url": "...", "mediaType": "image/png" }, | ||
| { "type": "data-<name>", "data": { } }, | ||
| { "type": "step-start" } | ||
| ], | ||
| "metadata": { } | ||
| } | ||
| ``` | ||
|
|
||
| A `UIMessage` carries no top-level `content` string in v5/v6. All content lives in `parts`. | ||
|
|
||
| ## Appendix C: References | ||
|
|
||
| - RFC 2119, RFC 8174: requirement keywords. | ||
| - RFC 8259: JSON. | ||
| - WHATWG HTML, Server-Sent Events: `text/event-stream`. | ||
| - Vercel AI SDK UI Message Stream (v5/v6): https://ai-sdk.dev, and the chunk schema at | ||
| https://github.com/vercel/ai/blob/main/packages/ai/src/ui-message-stream/ui-message-chunks.ts | ||
| - Current contract: `sdks/python/agenta/sdk/models/workflows.py`, | ||
| `sdks/python/agenta/sdk/decorators/routing.py` (Accept negotiation at `:236`). | ||
| - Agent events and session id: `services/agent/src/protocol.ts:74`, | ||
| `sdks/python/agenta/sdk/agents/dtos.py`, `services/oss/src/agent/app.py`. | ||
| - Design rationale and trade-offs: [streaming-and-sessions.md](streaming-and-sessions.md). | ||
| ``` |
There was a problem hiding this comment.
Fix markdownlint violations for fenced blocks and code span formatting.
This file currently triggers MD040 across multiple fences and MD038 around the code span near Line 256. Please add explicit fence languages (text, json, jsonc) and remove extra spaces inside the affected code span.
Example fixes
-```
+```text
┌─────────────────────────── client (useChat) ───────────────────────────┐
...
-```
+```
-```
+```text
content-type: text/event-stream
x-vercel-ai-ui-message-stream: v1
-```
+```
-Each part is one SSE event: the literal bytes `data: `, followed by ...
+Each part is one SSE event: the literal bytes `data:`, followed by ...🧰 Tools
🪛 LanguageTool
[grammar] ~283-~283: Ensure spelling is correct
Context: ..., toolName| A tool call begins. | |tool-input-delta|toolCallId, inputTextDelta`...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~283-~283: Ensure spelling is correct
Context: ... | | tool-input-delta | toolCallId, inputTextDelta | Append a fragment of the tool argumen...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~283-~283: Ensure spelling is correct
Context: ...a fragment of the tool arguments (note: inputTextDelta, not delta). | | `tool-input-availabl...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.22.1)
[warning] 66-66: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 239-239: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 246-246: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 256-256: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 259-259: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 300-300: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 389-389: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 417-417: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 439-439: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 462-462: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 526-526: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Source: Linters/SAST tools
| AGENTA_HOST=http://144.76.237.122:8280/ | ||
| AGENTA_API_KEY=your-agenta-project-api-key |
There was a problem hiding this comment.
Avoid publishing an API-key example over plaintext HTTP to a fixed public IP.
Line 2 currently encourages sending AGENTA_API_KEY over http://..., which can leak credentials in transit and creates a brittle environment default.
🔒 Suggested fix
-AGENTA_HOST=http://144.76.237.122:8280/
+AGENTA_HOST=https://cloud.agenta.ai
AGENTA_API_KEY=your-agenta-project-api-key📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| AGENTA_HOST=http://144.76.237.122:8280/ | |
| AGENTA_API_KEY=your-agenta-project-api-key | |
| AGENTA_HOST=https://cloud.agenta.ai | |
| AGENTA_API_KEY=your-agenta-project-api-key |
🧰 Tools
🪛 dotenv-linter (4.0.0)
[warning] 3-3: [UnorderedKey] The AGENTA_API_KEY key should go before the AGENTA_HOST key
(UnorderedKey)
| const toolSpans = new Map<string, Span>(); | ||
|
|
There was a problem hiding this comment.
Close tool spans even when toolCallId is absent.
At Line 362 a tool span is always started, but at Line 371/375 it is only tracked and closed when toolCallId exists. Missing IDs leave spans open and skew the exported trace.
🛠️ Suggested fix
const toolSpans = new Map<string, Span>();
+const anonymousToolSpans: Span[] = [];
@@
pi.on("tool_execution_start", async (event: any) => {
@@
- if (event?.toolCallId) toolSpans.set(event.toolCallId, span);
+ if (event?.toolCallId) toolSpans.set(event.toolCallId, span);
+ else anonymousToolSpans.push(span);
});
@@
pi.on("tool_execution_end", async (event: any) => {
- const span = event?.toolCallId ? toolSpans.get(event.toolCallId) : undefined;
+ const span = event?.toolCallId
+ ? toolSpans.get(event.toolCallId)
+ : anonymousToolSpans.shift();
if (!span) return;
setOutput(span, toolResultText(event?.result));
if (event?.isError) span.setStatus({ code: SpanStatusCode.ERROR });
span.end();
- toolSpans.delete(event.toolCallId);
+ if (event?.toolCallId) toolSpans.delete(event.toolCallId);
});Also applies to: 362-380
| t = time.monotonic() | ||
| sb = daytona.create( | ||
| CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0), | ||
| timeout=120, | ||
| ) | ||
| dt = time.monotonic() - t | ||
| times.append(dt) | ||
| print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s state={sb.state}", flush=True) | ||
| daytona.delete(sb) | ||
| results[snap] = times |
There was a problem hiding this comment.
Guarantee sandbox teardown per run to prevent leaks.
daytona.delete(sb) only runs on the happy path. Any exception after create() can leave sandboxes running and stop the benchmark early.
Suggested fix
for snap in SNAPSHOTS:
times: list[float] = []
for i in range(N):
- t = time.monotonic()
- sb = daytona.create(
- CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0),
- timeout=120,
- )
- dt = time.monotonic() - t
- times.append(dt)
- print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s state={sb.state}", flush=True)
- daytona.delete(sb)
+ sb = None
+ try:
+ t = time.monotonic()
+ sb = daytona.create(
+ CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0),
+ timeout=120,
+ )
+ dt = time.monotonic() - t
+ times.append(dt)
+ print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s state={sb.state}", flush=True)
+ finally:
+ if sb is not None:
+ daytona.delete(sb)
results[snap] = times📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| t = time.monotonic() | |
| sb = daytona.create( | |
| CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0), | |
| timeout=120, | |
| ) | |
| dt = time.monotonic() - t | |
| times.append(dt) | |
| print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s state={sb.state}", flush=True) | |
| daytona.delete(sb) | |
| results[snap] = times | |
| for snap in SNAPSHOTS: | |
| times: list[float] = [] | |
| for i in range(N): | |
| sb = None | |
| try: | |
| t = time.monotonic() | |
| sb = daytona.create( | |
| CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0), | |
| timeout=120, | |
| ) | |
| dt = time.monotonic() - t | |
| times.append(dt) | |
| print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s state={sb.state}", flush=True) | |
| finally: | |
| if sb is not None: | |
| daytona.delete(sb) | |
| results[snap] = times |
| def arg(name: str, default: str) -> str: | ||
| return sys.argv[sys.argv.index(name) + 1] if name in sys.argv else default |
There was a problem hiding this comment.
arg() can crash on missing flag values.
If a user passes --model (or --auth) without a value, sys.argv[index + 1] raises IndexError and exits with a traceback instead of a clear CLI error.
Suggested fix
def arg(name: str, default: str) -> str:
- return sys.argv[sys.argv.index(name) + 1] if name in sys.argv else default
+ if name not in sys.argv:
+ return default
+ idx = sys.argv.index(name) + 1
+ if idx >= len(sys.argv) or sys.argv[idx].startswith("--"):
+ raise ValueError(f"Missing value for {name}")
+ return sys.argv[idx]| pi_cmd = ( | ||
| f"cd {run_dir} && TMPDIR={run_dir}/tmp " | ||
| f"pi -p {json.dumps(PROMPT)} " | ||
| f"--mode json --approve --provider {provider} --model {model} " | ||
| f"-t read,bash,edit,write,ls " | ||
| f"--session-dir {run_dir}/.pi-sessions --name {session_id} " | ||
| f"< /dev/null" | ||
| ) |
There was a problem hiding this comment.
Shell command assembly is vulnerable to argument injection via --model.
model is user-controlled and interpolated unquoted into pi_cmd. A crafted value can append shell operators and execute unintended commands in the sandbox.
Suggested fix
+import shlex
...
pi_cmd = (
- f"cd {run_dir} && TMPDIR={run_dir}/tmp "
+ f"cd {shlex.quote(run_dir)} && TMPDIR={shlex.quote(run_dir)}/tmp "
f"pi -p {json.dumps(PROMPT)} "
- f"--mode json --approve --provider {provider} --model {model} "
+ f"--mode json --approve --provider {shlex.quote(provider)} --model {shlex.quote(model)} "
f"-t read,bash,edit,write,ls "
- f"--session-dir {run_dir}/.pi-sessions --name {session_id} "
+ f"--session-dir {shlex.quote(run_dir)}/.pi-sessions --name {shlex.quote(session_id)} "
f"< /dev/null"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pi_cmd = ( | |
| f"cd {run_dir} && TMPDIR={run_dir}/tmp " | |
| f"pi -p {json.dumps(PROMPT)} " | |
| f"--mode json --approve --provider {provider} --model {model} " | |
| f"-t read,bash,edit,write,ls " | |
| f"--session-dir {run_dir}/.pi-sessions --name {session_id} " | |
| f"< /dev/null" | |
| ) | |
| import shlex | |
| pi_cmd = ( | |
| f"cd {shlex.quote(run_dir)} && TMPDIR={shlex.quote(run_dir)}/tmp " | |
| f"pi -p {json.dumps(PROMPT)} " | |
| f"--mode json --approve --provider {shlex.quote(provider)} --model {shlex.quote(model)} " | |
| f"-t read,bash,edit,write,ls " | |
| f"--session-dir {shlex.quote(run_dir)}/.pi-sessions --name {shlex.quote(session_id)} " | |
| f"< /dev/null" | |
| ) |
🧰 Tools
🪛 ast-grep (0.43.0)
[info] 260-260: use jsonify instead of json.dumps for JSON output
Context: json.dumps(PROMPT)
Note: Security best practice.
(use-jsonify)
| import os | ||
| import httpx | ||
|
|
||
| BASE = os.getenv("AGENTA_HOST", "http://144.76.237.122:8280").rstrip("/") |
There was a problem hiding this comment.
Unencrypted HTTP default with API key exposure.
Line 15 defaults BASE to http://144.76.237.122:8280 (unencrypted HTTP). When AGENTA_API_KEY is sent to this endpoint at line 28/62, credentials are transmitted in plaintext. Although the AGENTA_HOST environment variable allows override to HTTPS, the hardcoded default should be HTTPS or explicitly documented as development-only.
Recommendation: Either (a) change the default to https://..., (b) add a prominent warning in the script's docstring, or (c) validate that AGENTA_HOST is HTTPS before sending credentials.
| out = resp.json() | ||
| new = out.get("workflow_revision") or out | ||
| print("new revision id:", new.get("id"), "version:", new.get("version")) |
There was a problem hiding this comment.
Robust error handling for HTTP response.
Line 69 calls resp.json() without checking the status code first. If the POST fails (4xx/5xx), the response may not contain the expected JSON structure, causing .get() on line 70 to mask the real error.
For a POC script, consider adding:
resp.raise_for_status() # after line 67, before line 69This makes the script fail loudly if the API call fails.
Part of the agent-workflows PR set (sliced by functional area, final code only)
Most of these are independent off
main. Two short stacks exist (an indented item stacks on the line above it). Each PR shows only its own area.This PR's base is
main. Review it on its own.Context
This is the design wiki for the agent-workflows project. It is an independent docs PR off
mainand carries no runnable code. Read it before the code PRs in the stack. The pages map what the current code does, what is still missing, and how the remaining work should be sliced. Start here so the behavior PRs are easier to follow.What this adds
Current-state design pages under
docs/design/agent-workflows/.ground-truth.mdmaps every code surface to its current role and splits the work into implemented, not implemented, and planned.meeting-alignment.mdchecks the work against the June 18 design discussion.implementation-review.mdlists cleanup risks.pr-stack.mdproposes the reviewable PR breakpoints.agent-template.md,protocol.md,triggers.md,status.md,ports-and-adapters.md, andsessions.mdeach cover one axis of the design.adapters/holds the Pi, Claude Code, and Agenta harness notes.Two scoped design folders.
sdk-local-tools/covers standalone SDK tool resolution, which is partly built and blocked onLocalBackend.docs/design/vault-named-secrets/covers user-named project secrets in the vault, storage and management UI only for this iteration.Archived POC material under
trash/. This holds the original work-package notes, research spikes, proof-of-concept code, and superseded RFCs. It is kept for provenance. It is not design truth.How to review this PR
Read
ground-truth.mdfirst. It is the source of truth that the other pages defer to, so it frames everything else. Then readpr-stack.mdto see how the remaining work splits into reviewable PRs. Treattrash/as an archive, not files to read line by line. Open atrash/note only when a current page points you there for history or rationale.