fips-agents · rdwj · May 7, 2026 · May 8, 2026
@@ -26,7 +26,8 @@ These are settled. Do not revisit without explicit discussion.
 - **FastMCP v3** is the MCP client -- not v2
 - **Platform mode** is opt-in delegation of LLM orchestration to OGX (LlamaStack rebrand) via `client.responses.create()` instead of `chat.completions.create()`. Set `platform.enabled: true` in `agent.yaml` and OGX takes over MCP tool calls, shield enforcement, and the inference loop server-side; the framework skips its own `connect_mcp()` startup loop. New `LLMClient.call_model_responses()` / `call_model_responses_stream()` / `moderate()` plus `BaseAgent` wrappers; new `GuardrailFiredEvent` `StreamEvent` variant. `PlatformMcpServer` accepts both `connector_id` (pre-registered in OGX's stack YAML) and inline `url` reference modes; `name` always maps to `server_label`. `guardrails` travels via `extra_body` because the OpenAI Python SDK rejects unknown top-level kwargs (this is an SDK behaviour, not OGX-specific). Decoupled from #81 (observability) and #35 (expose `/v1/responses` from the agent's HTTP server). Full design in `docs/architecture.md` ("Platform Mode" subsection).
 - **Two tool planes**: agent-code tools (plane 1, invisible to LLM) and LLM-callable tools (plane 2). Both go through BaseAgent for logging/RBAC/retry. Visibility per tool: `agent_only`, `llm_only`, `both`.
-- **Subagent-as-tool** (per `planning/subagent-tool-design.md`, #165) — register peer agents in `agent.yaml` under `subagents:` and BaseAgent auto-registers a stock `delegate_to_agent(agent_name, task, context)` tool. Two transports: `remote` (HTTP to another agent's `/v1/chat/completions`, with W3C `traceparent` propagation) and `inprocess` (same-process BaseAgent class via `class_path`). `SubagentResult` carries `content`, `tokens_used`, `tool_calls_made`, `cost_usd`, `span_id`; cost rolls up into the parent's session via `OpenAIChatServer._persist_cost_data` draining `agent._subagent_token_usage`. Stream events: `SubagentInvoked` / `SubagentCompleted` / `SubagentFailed` (`SubagentDelta` is forward-compat for v2 nested streaming). Errors: `SubagentTimeoutError`, `SubagentRemoteError`, `MaxDelegationDepthError`, `SubagentCrashedError`; `BudgetExceededError` reuses the existing server-layer class. v1 scope cuts: `permission_scope` is parsed but not enforced (logs WARNING; gated on #164), streaming is buffered, registry is static (no kagenti discovery), `identity: service_account` is forbidden on inprocess transport, depth enforcement is parent-side only.
+- **Subagent-as-tool** (per `planning/subagent-tool-design.md`, #165) — register peer agents in `agent.yaml` under `subagents:` and BaseAgent auto-registers a stock `delegate_to_agent(agent_name, task, context)` tool. Two transports: `remote` (HTTP to another agent's `/v1/chat/completions`, with W3C `traceparent` propagation) and `inprocess` (same-process BaseAgent class via `class_path`). `SubagentResult` carries `content`, `tokens_used`, `tool_calls_made`, `cost_usd`, `span_id`; cost rolls up into the parent's session via `OpenAIChatServer._persist_cost_data` draining `agent._subagent_token_usage`. Stream events: `SubagentInvoked` / `SubagentCompleted` / `SubagentFailed` (`SubagentDelta` is forward-compat for v2 nested streaming). Errors: `SubagentTimeoutError`, `SubagentRemoteError`, `MaxDelegationDepthError`, `SubagentCrashedError`; `BudgetExceededError` reuses the existing server-layer class. v1 scope cuts: `permission_scope` is parsed but not enforced (logs WARNING; gated on #164), streaming is buffered, registry is static (no kagenti discovery), `identity: service_account` is forbidden on inprocess transport, depth enforcement is parent-side only. v2 follow-ups tracked separately: streaming nested deltas (#179), kagenti-driven dynamic registry (#180), remote-side delegation depth enforcement (#181).
+- **Session state foundation** (per `planning/session-state-compaction-design.md`, #182) — shared contract for #163 (Question tool), #164 (per-tool permissions), #166 (auto-compaction), and #168 (session fork). Adds stable ULID `id` on every `self.messages` entry plus new `SessionRecord` columns: `pending_question`, `open_tool_calls`, `pending_subagent_calls`, `permission_scope_active`, `parent_session_id`, `forked_at_message_id`, `compaction_state`. New server-layer ABCs alongside `SessionStore` / `TraceStore`: `Compactor` (default `NullCompactor`; `LLMSummarizer` lands in #166) and `PermissionSource` with three pluggable implementations — `StaticPermissionSource` (yaml; vanilla RHOAI default), `KagentiPermissionSource` (per-tenant identity/policy), `OGXPermissionSource` (LlamaStack shield deferral). Compaction is client-side and marker-based (rolling summary, frozen prefix, pinned tail) — matches Anthropic `compact_2026_01_12` / Claude Code / Codex CLI shape. Tool-call/tool-result pairs survive compaction together; orphaned `tool_use_ids` are a documented LLM-failure class. New stream events: `CompactionStarted` / `CompactionCompleted` / `CompactionSkipped` plus `PermissionDecisionMade`. Pending state must either block compaction or be explicitly carried forward. BaseAgent stays unaware — all new ABCs are server-layer. Phased rollout: Phase 0 = #182 (foundation), Phase 1 = #163, Phase 2 = #164, Phase 3 = #166, Phase 4 = #168.
 - **@tool decorator** for local tools, same convention as FastMCP. Auto-discovered from `tools/` directory.
 - **Prompts** are Markdown with YAML frontmatter, one file per prompt in `prompts/`
 - **Skills** follow the agentskills.io spec exactly -- directory per skill, SKILL.md with frontmatter, progressive disclosure