Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ These are settled. Do not revisit without explicit discussion.
- **FastMCP v3** is the MCP client -- not v2
- **Platform mode** is opt-in delegation of LLM orchestration to OGX (LlamaStack rebrand) via `client.responses.create()` instead of `chat.completions.create()`. Set `platform.enabled: true` in `agent.yaml` and OGX takes over MCP tool calls, shield enforcement, and the inference loop server-side; the framework skips its own `connect_mcp()` startup loop. New `LLMClient.call_model_responses()` / `call_model_responses_stream()` / `moderate()` plus `BaseAgent` wrappers; new `GuardrailFiredEvent` `StreamEvent` variant. `PlatformMcpServer` accepts both `connector_id` (pre-registered in OGX's stack YAML) and inline `url` reference modes; `name` always maps to `server_label`. `guardrails` travels via `extra_body` because the OpenAI Python SDK rejects unknown top-level kwargs (this is an SDK behaviour, not OGX-specific). Decoupled from #81 (observability) and #35 (expose `/v1/responses` from the agent's HTTP server). Full design in `docs/architecture.md` ("Platform Mode" subsection).
- **Two tool planes**: agent-code tools (plane 1, invisible to LLM) and LLM-callable tools (plane 2). Both go through BaseAgent for logging/RBAC/retry. Visibility per tool: `agent_only`, `llm_only`, `both`.
- **Subagent-as-tool** (per `planning/subagent-tool-design.md`, #165) — register peer agents in `agent.yaml` under `subagents:` and BaseAgent auto-registers a stock `delegate_to_agent(agent_name, task, context)` tool. Two transports: `remote` (HTTP to another agent's `/v1/chat/completions`, with W3C `traceparent` propagation) and `inprocess` (same-process BaseAgent class via `class_path`). `SubagentResult` carries `content`, `tokens_used`, `tool_calls_made`, `cost_usd`, `span_id`; cost rolls up into the parent's session via `OpenAIChatServer._persist_cost_data` draining `agent._subagent_token_usage`. Stream events: `SubagentInvoked` / `SubagentCompleted` / `SubagentFailed` (`SubagentDelta` is forward-compat for v2 nested streaming). Errors: `SubagentTimeoutError`, `SubagentRemoteError`, `MaxDelegationDepthError`, `SubagentCrashedError`; `BudgetExceededError` reuses the existing server-layer class. v1 scope cuts: `permission_scope` is parsed but not enforced (logs WARNING; gated on #164), streaming is buffered, registry is static (no kagenti discovery), `identity: service_account` is forbidden on inprocess transport, depth enforcement is parent-side only.
- **Subagent-as-tool** (per `planning/subagent-tool-design.md`, #165) — register peer agents in `agent.yaml` under `subagents:` and BaseAgent auto-registers a stock `delegate_to_agent(agent_name, task, context)` tool. Two transports: `remote` (HTTP to another agent's `/v1/chat/completions`, with W3C `traceparent` propagation) and `inprocess` (same-process BaseAgent class via `class_path`). `SubagentResult` carries `content`, `tokens_used`, `tool_calls_made`, `cost_usd`, `span_id`; cost rolls up into the parent's session via `OpenAIChatServer._persist_cost_data` draining `agent._subagent_token_usage`. Stream events: `SubagentInvoked` / `SubagentCompleted` / `SubagentFailed` (`SubagentDelta` is forward-compat for v2 nested streaming). Errors: `SubagentTimeoutError`, `SubagentRemoteError`, `MaxDelegationDepthError`, `SubagentCrashedError`; `BudgetExceededError` reuses the existing server-layer class. v1 scope cuts: `permission_scope` is parsed but not enforced (logs WARNING; gated on #164), streaming is buffered, registry is static (no kagenti discovery), `identity: service_account` is forbidden on inprocess transport, depth enforcement is parent-side only. v2 follow-ups tracked separately: streaming nested deltas (#179), kagenti-driven dynamic registry (#180), remote-side delegation depth enforcement (#181).
- **Session state foundation** (per `planning/session-state-compaction-design.md`, #182) — shared contract for #163 (Question tool), #164 (per-tool permissions), #166 (auto-compaction), and #168 (session fork). Adds stable ULID `id` on every `self.messages` entry plus new `SessionRecord` columns: `pending_question`, `open_tool_calls`, `pending_subagent_calls`, `permission_scope_active`, `parent_session_id`, `forked_at_message_id`, `compaction_state`. New server-layer ABCs alongside `SessionStore` / `TraceStore`: `Compactor` (default `NullCompactor`; `LLMSummarizer` lands in #166) and `PermissionSource` with three pluggable implementations — `StaticPermissionSource` (yaml; vanilla RHOAI default), `KagentiPermissionSource` (per-tenant identity/policy), `OGXPermissionSource` (LlamaStack shield deferral). Compaction is client-side and marker-based (rolling summary, frozen prefix, pinned tail) — matches Anthropic `compact_2026_01_12` / Claude Code / Codex CLI shape. Tool-call/tool-result pairs survive compaction together; orphaned `tool_use_ids` are a documented LLM-failure class. New stream events: `CompactionStarted` / `CompactionCompleted` / `CompactionSkipped` plus `PermissionDecisionMade`. Pending state must either block compaction or be explicitly carried forward. BaseAgent stays unaware — all new ABCs are server-layer. Phased rollout: Phase 0 = #182 (foundation), Phase 1 = #163, Phase 2 = #164, Phase 3 = #166, Phase 4 = #168.
- **@tool decorator** for local tools, same convention as FastMCP. Auto-discovered from `tools/` directory.
- **Prompts** are Markdown with YAML frontmatter, one file per prompt in `prompts/`
- **Skills** follow the agentskills.io spec exactly -- directory per skill, SKILL.md with frontmatter, progressive disclosure
Expand Down
Loading