diff --git a/docs/docs.json b/docs/docs.json index dc0fc8d2..c097177f 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -148,6 +148,7 @@ "root": "rfds/v2/overview", "pages": [ "rfds/v2/prompt", + "rfds/v2/session-inject", "rfds/message-id", "rfds/streamable-http-websocket-transport" ] diff --git a/docs/rfds/v2/session-inject.mdx b/docs/rfds/v2/session-inject.mdx new file mode 100644 index 00000000..6fd5c9de --- /dev/null +++ b/docs/rfds/v2/session-inject.mdx @@ -0,0 +1,318 @@ +--- +title: "Mid-turn input: queue and steer" +--- + +Author(s): [@kennethsinder](https://github.com/kennethsinder) + +## Elevator pitch + +> What are you proposing to change? + +Add a single `session/inject` method, with a `mode` of `queue` or `steer`, so clients can hand the agent another user-role message without ending the current turn. The agent advertises which modes it supports via a capability. Delivery is echoed as a `user_message` notification so multi-client observers and `session/load` replay see one consistent history. Before delivery, the returned `messageId` is also the handle clients use to revoke the pending message, and optionally replace its content. + +This is a v2 follow-on. It rides on the [v2 prompt lifecycle](./prompt) (short-lived `session/prompt`, agent-owned `messageId`, `state_change` notifications) and replaces the standalone [prompt queueing RFD (#484)](https://github.com/agentclientprotocol/agent-client-protocol/pull/484), which predates that substrate and stops at the capability bit. + +## Status quo + +> How do things work today and what problems does this cause? Why would we change things? + +Coding-agent UIs have converged on two adjacent input affordances on top of "send a prompt and wait": + +- **Queue.** The user types while the agent is mid-turn; the text is held and delivered once the agent goes idle. Shipped in Cursor since 1.2 as a FIFO ([forum thread](https://forum.cursor.com/t/queue-agent-messages/110883)). Shipped in Windsurf Cascade: type while Cascade is working, press Enter, and it lands after the current task ([docs](https://docs.windsurf.com/plugins/cascade/cascade-overview)). Shipped in Gemini CLI with explicit `wait_for_idle` vs `wait_for_response` settings ([PR #7867](https://github.com/google-gemini/gemini-cli/pull/7867)), and with documented ordering bugs when tool calls are involved ([#17282](https://github.com/google-gemini/gemini-cli/issues/17282), [#17719](https://github.com/google-gemini/gemini-cli/issues/17719)). Shipped in Claude Code, with [known boundary bugs](https://github.com/anthropics/claude-code/issues/49373) where it flushes at every LLM pause instead of true end-of-turn ([related](https://github.com/anthropics/claude-code/issues/36326)). +- **Steer.** The user types while the agent is mid-turn; the text is delivered at the next safe break-point, typically after the in-flight tool call. Codex CLI has the most fleshed-out implementation, opt-in via `/experimental`. The Codex code base has had to evolve its own pending-steers buffer, requeue-on-turn-complete, pause-on-usage-limits, and a replay edge-case surface. See PRs [#12569](https://github.com/openai/codex/pull/12569), [#15235](https://github.com/openai/codex/pull/15235), [#22226](https://github.com/openai/codex/pull/22226), [#22879](https://github.com/openai/codex/pull/22879) and issues [#19975](https://github.com/openai/codex/issues/19975), [#22815](https://github.com/openai/codex/issues/22815). The original Codex request issues ([#4312](https://github.com/openai/codex/issues/4312), [#9096](https://github.com/openai/codex/issues/9096)) frame this as "queued corrections while the current turn is still running" — the same problem. Gemini CLI has an open proposal to add the same shape under a different name ([`/inject` for asynchronous mid-stream steering, #17197](https://github.com/google-gemini/gemini-cli/issues/17197)). + +The "queue vs. steer is two distinct things" framing has surfaced explicitly across ecosystems: see `langchain-ai/deepagents` [#1390 "Disambiguate steer/queue"](https://github.com/langchain-ai/deepagents/issues/1390) and `anomalyco/opencode` [#21388 "Allow messages to be sent mid-turn — interrupt, queue, or inject"](https://github.com/anomalyco/opencode/issues/21388). The convergence is visible, but each tool is reinventing the semantics from scratch. + +ACP today has one mid-turn lever: `session/cancel` followed by a fresh `session/prompt`. That discards in-flight work and forces tool calls to surface as cancelled even when they were milliseconds from completing. There is no way to express "let this tool call finish, then add this." There is no way to express "buffer this for after the turn." The `stopReason` enum has no value for a turn interrupted because a new message landed. + +The v2 prompt lifecycle RFD already flags this gap in its motivation: + +> Queueing messages (still an ongoing discussion) would also fit much nicer in this pattern... potentially the agent could decide whether it cancels the current turn and inserts it immediately, or inserts it at the next convenient break point. + +The missing piece is not queueing alone. It is the wire shape that distinguishes "deliver soon" from "deliver later," gives each delivery a stable handle for ack and revocation, and echoes the delivery into the session history so multi-client and replay stay coherent. The v2 prompt lifecycle gives us that base. This RFD adds the call sites on top. + +PR [#484](https://github.com/agentclientprotocol/agent-client-protocol/pull/484) (open since February 2026) takes a thinner cut: a `promptQueueing` capability plus an `end_turn` early-finish signal, with prompts queued via a parallel `session/prompt` call. It doesn't address steer, doesn't define delivery semantics, and predates the v2 prompt lifecycle's `state_change` and `user_message` notifications. The follow-up thread also asks how clients edit a queued message. This RFD answers that directly: editing is a pending-inject operation before `user_message`, not a transcript rewrite after delivery. The author also flags that the Claude Agent SDK had no public hook for "when was my queued message inserted," which is precisely what `user_message` echoing solves at the protocol level. The proposal below is intended to supersede #484, with credit to [@SteffenDE](https://github.com/SteffenDE) for raising the question first. + +## What we propose to do about it + +> What are you proposing to improve the situation? + +One method, two modes, one capability. + +### The method + +```jsonc +{ + "jsonrpc": "2.0", + "id": 42, + "method": "session/inject", + "params": { + "sessionId": "sess_abc", + "mode": "steer", + "content": [ + { + "type": "text", + "text": "stop using the legacy auth path, see auth_v2.go", + }, + ], + }, +} +``` + +The agent responds when the inject has been accepted for pending delivery, not when the model has processed it. The response carries the agent-assigned `messageId`, matching the v2 prompt lifecycle pattern: + +```jsonc +{ + "jsonrpc": "2.0", + "id": 42, + "result": { "messageId": "msg_inj_001" }, +} +``` + +At delivery (which happens later for both modes), the agent emits a `user_message` session update with the same `messageId`. This is the same notification the v2 prompt lifecycle defines for `session/prompt`, so clients, observers attached via [multi-client attach (#533)](https://github.com/agentclientprotocol/agent-client-protocol/pull/533), and `session/load` replay all see the message land in the same shape regardless of whether it arrived as a prompt or an inject. + +Until that `user_message` notification is emitted, the message is pending. Pending messages are not transcript history yet. They can be revoked by `messageId`, and agents may also support replacing their content while preserving the original mode and queue position. + +### The capability + +```jsonc +{ + "agentCapabilities": { + "session": { + "inject": { + "modes": ["queue", "steer"], + "pending": { "replace": true }, + }, + }, + }, +} +``` + +An agent that only supports buffering at end-of-turn declares `["queue"]`. A tool-loop agent that can break safely between tool calls declares `["queue", "steer"]`. A streaming-only agent that can't do either declares the absence of the capability and clients fall back to `session/cancel`. + +Revoke is part of the pending-inject contract. `pending.replace` is optional. Clients should only offer edit-in-place for already-sent pending messages when it is advertised; otherwise they can keep drafts client-side longer or revoke and send a new inject when losing queue position is acceptable. + +### Semantics + +**`queue`** is the simple one. The agent buffers the content. The agent delivers it as a `user_message` notification once `state_change: idle` fires for the current turn. FIFO across multiple queued injects. If a queue lands on an already-idle session, the agent treats it as a normal user message and starts a turn, though clients should prefer `session/prompt` in that case. + +**`steer`** is the interesting one. The agent delivers the content at the next safe break-point during the running turn: + +- **Mid tool call**: complete the tool call, then deliver before the next LLM call. +- **Mid LLM stream with no tool call pending**: agent-defined. The agent may either interrupt the stream and re-prompt with the steer text included, or let the stream finish and deliver before the next iteration. The choice is a quality-of-implementation concern; both behaviors are spec-conformant. Clients should not assume one or the other. +- **Mid model-process pause** (provider rate limit, usage-limit cap): hold pending steers until the pause clears. This mirrors the shape Codex landed on in [PR #22226](https://github.com/openai/codex/pull/22226). + +Both modes are fire-and-forget in the sense that the response confirms acceptance, not model-process. Both deliveries are observable via the same `user_message` notification. While an inject is pending, the `messageId` returned from `session/inject` is the only stable handle clients should use for pending operations. + +## Shiny future + +> How will things will play out once this feature exists? + +A user typing into a Zed prompt box during a long agent turn no longer has to choose between "wait for the turn to finish" and "blow it up with cancel." If they want to add context for the next iteration, they queue. If they want to redirect the agent now without losing the tool call it just kicked off, they steer. + +A dashboard client attached to three concurrent sessions can route a steer into whichever session needs course-correcting without disturbing the other two. The `user_message` echo means every observer sees the steer land at the same point in history, so the side-by-side panels stay consistent. + +A `session/load` replay reconstructs the same transcript regardless of how messages originally arrived. The replay doesn't need to know that message N was a steer that interrupted a tool call vs. a prompt that started a turn. It sees `user_message`, `agent_message`, `tool_call`, `state_change` in delivery order, and renders. + +For agents, this also means the existing `cancelled` stop reason starts carrying a clearer signal. Today, "cancelled" can mean either "user gave up" or "user wanted to redirect." Once steer exists, "cancelled" goes back to meaning "user gave up." Steer-interrupted turns are not cancelled. They end normally and the steer becomes the next user message. + +## Implementation details and plan + +> Tell me more about your implementation. What is your detailed implementation plan? + +### Schema additions + +In `schema/schema.json`: + +1. Add `SessionInjectMode` enum: `"queue" | "steer"`. +2. Add `SessionInjectRequest`: + - `sessionId: SessionId` + - `mode: SessionInjectMode` + - `content: ContentBlock[]` (same shape as `PromptRequest.prompt`) +3. Add `SessionInjectResponse`: + - `messageId: string` (opaque, agent-owned) +4. Add `agentCapabilities.session.inject`: + - `modes: SessionInjectMode[]` + - `pending.replace: boolean` (optional; default `false`) +5. Add `SessionRevokeInjectRequest`: + - `sessionId: SessionId` + - `messageId: string` +6. Add `SessionRevokeInjectResponse`: empty object. +7. Add `SessionReplaceInjectRequest`: + - `sessionId: SessionId` + - `messageId: string` + - `content: ContentBlock[]` +8. Add `SessionReplaceInjectResponse`: + - `messageId: string` (same value, returned for symmetry with `session/inject`) +9. Register `session/inject`, `session/revoke_inject`, and `session/replace_inject` as request methods. + +No changes to `session/update` or to existing notification types. The `user_message` notification introduced by the v2 prompt lifecycle is what carries the delivered content, with the same `messageId` that was returned from the inject response. + +### Pending injects: revoke and replace + +A pending inject can be revoked by `messageId`: + +```jsonc +{ + "jsonrpc": "2.0", + "id": 43, + "method": "session/revoke_inject", + "params": { + "sessionId": "sess_abc", + "messageId": "msg_inj_001", + }, +} +``` + +A successful response means the agent will not emit a future `user_message` for that `messageId`: + +```jsonc +{ + "jsonrpc": "2.0", + "id": 43, + "result": {}, +} +``` + +Revoke is ordered against delivery at the agent: + +- If revoke wins, the pending inject is dropped. No later `user_message` is emitted for that `messageId`. +- If delivery wins, the agent returns an error. The recommended shape is `-32602 Invalid params` with `error.data.reason: "already_delivered"` and the `messageId`. The client should keep the delivered message in the transcript. +- If the `messageId` is unknown for that session, return `-32602 Invalid params` with `error.data.reason: "unknown_message_id"`. +- If the message was already revoked and the agent still remembers the tombstone, return success. This makes revoke safe to retry after transport loss. + +Normal session errors still apply. If the session is unknown, closed, or no longer accepts input, return the same error the agent uses for other session requests. + +Delivery is the point where the agent commits the inject to session history and emits, or has irrevocably queued, the matching `user_message`. From that point, the content may already be in the next model input. Revoke must return `already_delivered`, and the client should expect the `user_message` if it has not seen it yet. There is no separate "revoke a delivered message" surface; that is [session rewind (#1214)](https://github.com/agentclientprotocol/agent-client-protocol/pull/1214) territory. + +Agents may also support replacing pending content. Replacement keeps the same `messageId`, mode, and queue position; only `content` changes. The eventual `user_message` carries the final content, not the edit history. + +```jsonc +{ + "jsonrpc": "2.0", + "id": 44, + "method": "session/replace_inject", + "params": { + "sessionId": "sess_abc", + "messageId": "msg_inj_001", + "content": [ + { + "type": "text", + "text": "actually use auth_v3.go", + }, + ], + }, +} +``` + +```jsonc +{ + "jsonrpc": "2.0", + "id": 44, + "result": { "messageId": "msg_inj_001" }, +} +``` + +The same failure cases apply to replace: `already_delivered`, `unknown_message_id`, and unsupported capability. If `pending.replace` was not advertised, clients should not call the method; agents may still return `Method not found` or `-32602 Invalid params` with `error.data.reason: "replace_not_supported"`. If replace races with delivery, the agent either delivers the old content and returns `already_delivered`, or accepts the replacement and later emits `user_message` with the new content. It must not report success and then deliver the old content. + +`$/cancel_request` still applies to an in-flight `session/inject` request before the accept response is sent. Once `session/inject` has returned a `messageId`, request cancellation is no longer the right tool; the request is complete, and the message is managed by `session/revoke_inject` or `session/replace_inject`. + +### Persistence and replay + +Delivered injected messages are part of session history. They appear in `session/load` replay as `user_message` notifications, ordered by delivery time. + +Pending messages are not replayed as history. Revoked messages are not replayed. Replaced messages replay once, as the final `user_message` content that was actually delivered. + +This means a session loaded from disk doesn't carry "this was a steer" provenance. That's deliberate. Once delivered, an injected message is just a user message in the transcript. The modality is observable in real time via the gap between response and `user_message`, not as a persistent attribute. + +### Multi-client interaction + +With [multi-client attach (#533)](https://github.com/agentclientprotocol/agent-client-protocol/pull/533), multiple controllers can inject into the same session. Ordering across controllers is by arrival time at the agent (FIFO from each controller; interleaved across controllers). Observers cannot inject, revoke, or replace. + +Every controller and observer sees the same `user_message` notification at delivery time, so transcripts converge. A revoked inject has no transcript update. This RFD does not add a shared pending-queue list or pending-message broadcast; clients can manage the pending handles returned to them, and delivery remains the shared signal. If an agent restricts revocation or replacement to the controller that created the pending inject, it should reject other controllers with a permission error instead of silently doing nothing. + +### Interaction with other in-flight requests + +- **`session/cancel` during a pending steer.** Codex [#22815](https://github.com/openai/codex/issues/22815) is a real bug surface here, and we should resolve it in the spec rather than leave it implicit. Recommended behavior: cancel applies to the in-flight turn only. Pending injects (both queue and steer) survive `session/cancel` and deliver as normal once the agent reaches idle. Clients that want to clear everything can call `session/revoke_inject` for pending injects first, then `session/cancel`. Agents should document if they deviate. +- **`session/request_permission` blocking the turn.** While the agent is awaiting a permission decision, the turn is paused but not idle. Queue accumulates as normal. Steer is held until the permission resolves, then delivered at the next break-point (which may be immediately if the permission decision is the break-point). Agents that prefer to drop steers during permission waits may do so, capability-advertised. +- **Elicitation in flight.** [Elicitation](./elicitation) is a separate request from the agent to the client. An inject is not an answer to an elicitation; they're orthogonal channels. Agents must not satisfy a pending `elicitation/create` from inject content. + +### Sub-agents + +Inject lands on the root session (the one the client knows about). If the agent has spawned sub-agents or delegated tool calls to nested workers, propagation is the parent agent's choice. The protocol stays out of sub-agent topology. + +### `stopReason` + +No new variant. A steer-interrupted LLM call that stops cleanly is `end_turn`. A steer that arrived while the user had also pressed cancel is `cancelled`. Existing semantics cover both. + +### Versioning + +This is a v2-only addition. It depends on the v2 prompt lifecycle's `user_message` notification and `state_change` semantics, neither of which exists in v1. v1 clients that want a forward-compatible story keep using `session/cancel` + new `session/prompt`. There is no proposed v1 backport. + +### Phased rollout + +1. Land schema and Rust SDK behind the `unstable` feature flag on v2. +2. Reference implementation in the v2 example agent: queue at minimum, steer if the example agent can demonstrate tool-call break-points cleanly. +3. Document in `docs/protocol/`. +4. Stabilize once at least two agent implementations and one client implementation have exercised both modes in real workflows. + +## Frequently asked questions + +> What questions have arisen over the course of authoring this document or during subsequent discussions? + +### What alternative approaches did you consider, and why did you settle on this one? + +**Two separate methods (`session/queue` and `session/steer`).** Cleaner type signatures, but every implementer has to wire two methods that share most of their plumbing. The `mode` parameter is more honest about the actual factoring: same delivery channel, same echo notification, different break-point policy. + +**Roll queue into the v2 prompt lifecycle RFD itself.** Tempting, because the prompt lifecycle already mentions queueing as future work. Rejected because steer is the harder design problem and deserves its own discussion, and pinning both to the v2 prompt lifecycle RFD would either bloat that RFD or commit to a design before the steer semantics are settled. + +**Parallel `session/prompt` calls (the PR #484 approach).** Reuses the existing method but conflates "start a turn" with "add to a turn." It also can't express steer cleanly, because a parallel `session/prompt` already implies its own turn boundary. Splitting inject out keeps `session/prompt` meaning "start a turn." + +**`session/prompt` with a `priority` field.** Same problem as parallel `session/prompt`: overloads a method whose semantics are about turn ownership, not insertion policy. + +### Why one method with a mode parameter instead of one method that always queues, where the agent decides whether to steer? + +Because the agent doesn't know the user's intent. A user typing "actually skip the migration, do it later" expects steer. A user typing "also, when you're done, run the tests" expects queue. The mode is the user's choice expressed by the client. The agent only chooses whether it supports each mode. + +### What if a client requests `steer` from an agent that only supports `queue`? + +The agent returns an error. Clients should check the capability before offering steer in the UI. Agents may, as a quality-of-implementation choice, downgrade to `queue` and return success, but the spec recommends erroring so clients don't silently lose the user's stated intent. + +### How does this interact with the [turn-complete signal RFD (#644)](https://github.com/agentclientprotocol/agent-client-protocol/pull/644)? + +The turn-complete signal gives clients a deterministic barrier for "all updates for this turn have been delivered." `queue` mode delivers after that barrier. Either the `state_change: idle` notification or the explicit `turn_complete` signal works as the trigger; they're the same point in time. The two RFDs are complementary, not competing. + +### How does this differ from [session rewind (#1214)](https://github.com/agentclientprotocol/agent-client-protocol/pull/1214)? + +Rewind is destructive: it truncates history and changes what the agent sees as context. Inject is additive: it adds a new user message without touching what came before. They're siblings on the "mid-conversation user intervention" axis: inject for "add this," rewind for "undo that." + +### What about non-text content? Images, file references, resource links? + +`content` is a full `ContentBlock[]`, identical to `PromptRequest.prompt`. Anything you can send in a prompt, you can send in an inject. Whether the agent does something useful with an image steered mid-tool-call is up to the agent. + +### Is this related to discussion [#1224 `session/remind`](https://github.com/orgs/agentclientprotocol/discussions/1224)? + +Sibling but separate. `session/inject` carries user-role content; the agent treats it as the user speaking. `session/remind` (proposed separately) carries system-role context that the host wants the agent to see without faking a user turn. They could share the underlying break-point machinery, but the role contract differs at call sites and conflating them muddies what the agent sees in its message list. + +### Why fire-and-forget on acceptance instead of waiting for delivery? + +Because delivery for `queue` can be arbitrarily delayed (a long-running turn might mean the queue waits minutes), and forcing the JSON-RPC request to stay open for that duration is wasteful and brittle on stateful transports. The `messageId` returned on acceptance is the handle clients need; the `user_message` notification at delivery time is the signal. + +### Can a queued message be edited? + +Yes, while it is pending. If the agent advertises `pending.replace`, clients can call `session/replace_inject` and keep the same queue position. If replacement is not advertised, clients can still keep unsent drafts editable on their side, or revoke and send a new inject when appending at the end is acceptable. + +After `user_message` fires, the message is transcript history. Editing or removing it is a rewind or fork operation, not an inject operation. + +### Should the agent be allowed to drop an injected message? + +Only via revocation. Once accepted (the response has gone out with a `messageId`), the agent commits to delivering. If the agent crashes mid-turn and loses the queue, that's a durability bug, not a protocol-allowed behavior. + +### What about ordering when a queue inject lands after a steer inject during the same turn? + +Steer delivers at the next break-point; queue delivers at idle. The steer arrives first in the model's eyes even though both injects might have been accepted in the opposite order. This is intentional: mode determines delivery time, arrival order only breaks ties within the same mode. + +## Revision history + +- 2026-05-20: Replaced request-cancellation-based revocation with `messageId`-based pending revoke/replace semantics, including race outcomes for already-delivered messages. +- 2026-05-19: Initial draft, lifted from [discussion #1220](https://github.com/orgs/agentclientprotocol/discussions/1220) with two changes: explicit acknowledgment of and supersession over PR #484, and tightened `messageId` ownership to match the v2 prompt lifecycle (agent-owned, returned in the inject response).