feat: agentic-core public API — composable step functions per ADR-03

# Proposal: `agentic-core` public API — composable step functions per ADR-03

## Context

ADR-03 defines the core crate's public API as a set of composable async functions that implement the agentic loop. Each function is a standalone step that can be:
- Called directly in standalone mode (`execute()` composes them)
- Wrapped in a Praxis `HttpFilter` for gateway mode
- Tested in isolation without HTTP infrastructure

The ADR intentionally left the function parameters as `(...)` placeholders. This issue proposes concrete type signatures, informed by:
- The ADR-02 store schema (three-table model: Conversation/Response/Item)
- PR #34's working agentic loop implementation (tool dispatch pattern)
- vLLM's own `ConversationContext` pattern in `responses/serving.py`
- RFC #18's session-aware KV cache requirements (future extensibility)

## Design Goals

1. **Each step is independently testable** — no hidden coupling between steps
2. **`serde_json::Value` at boundaries, typed internally** — the gateway is model-agnostic; we don't enforce a fixed item schema at the API level
3. **Trait-based backends** — store, inference client, and tool dispatch are swappable
4. **Streaming-first** — inference returns a stream, not a collected response
5. **Tool loop is explicit** — the caller decides iteration policy, not the library
6. **Future-proof for RFC #18** — `ExecutionContext` carries optional cache hints without breaking the API

## Proposed Types

### `ExecutionContext` — shared state across steps

```rust
pub struct ExecutionContext {
    /// Pluggable response/conversation store (ADR-02).
    pub store: Arc<dyn ResponseStore>,

    /// Inference backend caller.
    pub inference: Arc<dyn InferenceClient>,

    /// Tool registry for dispatch.
    pub tools: Arc<dyn ToolRegistry>,

    /// Gateway configuration (LLM base URL, timeouts, etc.)
    pub config: CoreConfig,
}
```

### Store trait (ADR-02 alignment)

```rust
#[async_trait]
pub trait ResponseStore: Send + Sync {
    /// Load a response by ID, returning its metadata and history_item_ids.
    async fn get_response(&self, response_id: &str) -> Result<StoredResponse, Error>;

    /// Bulk-fetch items by IDs. Caller is responsible for re-ordering.
    async fn get_items(&self, item_ids: &[String]) -> Result<Vec<StoredItem>, Error>;

    /// Persist a completed response with its items.
    async fn persist(
        &self,
        response_id: &str,
        conversation_id: Option<&str>,
        previous_response_id: Option<&str>,
        items: &[Item],
        metadata: &ResponseMetadata,
    ) -> Result<(), Error>;
}
```

### Inference trait

```rust
#[async_trait]
pub trait InferenceClient: Send + Sync {
    /// Send a request to the LLM and return a stream of output chunks.
    /// The stream yields typed events (delta text, tool calls, done).
    async fn call(
        &self,
        request: &InferenceRequest,
    ) -> Result<Pin<Box<dyn Stream<Item = Result<InferenceEvent, Error>> + Send>>, Error>;
}
```

### Tool dispatch trait

```rust
#[async_trait]
pub trait ToolRegistry: Send + Sync {
    /// Execute a tool call, returning the result.
    async fn dispatch(&self, call: &ToolCall) -> Result<ToolResult, Error>;

    /// Check if a tool is registered and available.
    fn has_tool(&self, name: &str) -> bool;
}
```

## Proposed Step Functions

### 1. `rehydrate_conversation`

```rust
/// Reconstruct conversation history from a previous_response_id.
///
/// Loads the stored response, fetches referenced items, restores ordering,
/// and returns the full conversation ready for the next inference call.
pub async fn rehydrate_conversation(
    previous_response_id: &str,
    ctx: &ExecutionContext,
) -> Result<Conversation, Error>
```

**Returns:** `Conversation` containing ordered history items + metadata from the prior turn.

### 2. `call_inference`

```rust
/// Send the conversation to the LLM backend and collect the response.
///
/// For non-streaming consumers, collects the full response.
/// For streaming, returns a stream handle.
pub async fn call_inference(
    conversation: &Conversation,
    request: &InferenceRequest,
    ctx: &ExecutionContext,
) -> Result<InferenceResult, Error>
```

**Returns:** `InferenceResult` containing output items + any tool calls detected.

### 3. `dispatch_tools`

```rust
/// Execute all tool calls from an inference result.
///
/// Dispatches each call through the ToolRegistry, collecting results.
/// Returns the tool outputs ready to be appended to the conversation.
pub async fn dispatch_tools(
    tool_calls: &[ToolCall],
    ctx: &ExecutionContext,
) -> Result<Vec<ToolResult>, Error>
```

### 4. `assemble_response`

```rust
/// Build the final API response from inference output and tool results.
///
/// Produces the ResponsesResponse object with stable item IDs,
/// proper lifecycle events, and correct output typing.
pub async fn assemble_response(
    request_id: &str,
    output: &InferenceResult,
    model: &str,
    metadata: &ResponseMetadata,
) -> Result<ResponsesResponse, Error>
```

### 5. `persist_response`

```rust
/// Save the completed response and its items to the store.
///
/// Only persists when store is enabled and the request requires it.
pub async fn persist_response(
    response: &ResponsesResponse,
    conversation_id: Option<&str>,
    previous_response_id: Option<&str>,
    ctx: &ExecutionContext,
) -> Result<(), Error>
```

### 6. `execute` — the convenience orchestrator

```rust
/// Run the full agentic loop: rehydrate → infer → (tool loop) → assemble → persist.
///
/// This is the default composition of the step functions with standard
/// iteration policy (max_iterations capped, tool results fed back).
pub async fn execute(
    request: ResponsesRequest,
    ctx: &ExecutionContext,
) -> Result<ResponsesResponse, Error>
```

## Key Domain Types

```rust
pub struct Conversation {
    pub id: Option<String>,
    pub history: Vec<Item>,
    pub metadata: ConversationMetadata,
}

pub struct Item {
    pub id: String,
    pub data: serde_json::Value,  // model-agnostic
}

pub struct InferenceRequest {
    pub model: String,
    pub input: Vec<serde_json::Value>,  // history + new input
    pub tools: Vec<ToolConfig>,
    pub stream: bool,
    pub params: serde_json::Map<String, serde_json::Value>,  // passthrough
}

pub struct InferenceResult {
    pub output: Vec<OutputItem>,
    pub tool_calls: Vec<ToolCall>,
    pub usage: Option<Usage>,
}

pub struct ToolCall {
    pub id: String,
    pub call_id: String,
    pub name: String,
    pub arguments: String,
}

pub struct ToolResult {
    pub call_id: String,
    pub output: String,
}
```

## Composability Example

### Standalone mode (default `execute()`)
```rust
let response = agentic_core::execute(request, &ctx).await?;
```

### Custom loop with guardrails
```rust
let conversation = rehydrate_conversation(&prev_id, &ctx).await?;
let mut inference_req = build_request(&conversation, &request);

for _ in 0..max_iterations {
    let result = call_inference(&conversation, &inference_req, &ctx).await?;
    
    if result.tool_calls.is_empty() {
        return assemble_response(&req_id, &result, &model, &meta).await;
    }
    
    // Custom guardrail: validate tool calls before dispatch
    validate_tool_calls(&result.tool_calls)?;
    
    let tool_results = dispatch_tools(&result.tool_calls, &ctx).await?;
    append_tool_results(&mut inference_req, &result, &tool_results);
}
```

### Praxis filter chain
```yaml
filter_chains:
  - name: agentic-loop
    filters:
      - filter: rehydrate
      - filter: inference
      - filter: tool_dispatch
        branch_chains:
          - name: tool-loop
            on_result: { filter: tool_dispatch, key: action, result: loop }
            rejoin: inference
            max_iterations: 10
      - filter: assemble
      - filter: persist
```

## Implementation Plan

### Phase 1: Type definitions + traits (this issue)
- Define all types in `crates/agentic-core/src/types.rs`
- Define traits in `crates/agentic-core/src/traits.rs`
- Stub all step functions with `todo!()` bodies
- Ship as a PR so other PRs (#33, #34) can align their interfaces

### Phase 2: `rehydrate_conversation` + `persist_response`
- Implement against the `ResponseStore` trait
- PR #33's SQLx backend becomes one implementation of the trait
- Include in-memory store for testing

### Phase 3: `call_inference`
- Implement against `InferenceClient` trait
- Default implementation wraps the existing proxy logic (reqwest to vLLM)
- Handle streaming vs non-streaming

### Phase 4: `dispatch_tools` + `execute`
- MCP client as first `ToolRegistry` implementation
- `execute()` ties all steps together with default iteration policy
- Aligns with PR #34's loop pattern

### Phase 5: Praxis filter wrappers
- Thin `HttpFilter` impls in `agentic-praxis` that delegate to step functions

## Relationship to Existing PRs

| PR | Relationship |
|----|-------------|
| #33 (store CRUD) | Becomes a `ResponseStore` trait implementation. Types align with `StoredResponse`/`StoredItem`. |
| #34 (OGX loop) | `OgxStore` becomes a `ResponseStore` impl. The agentic loop in `agentic.rs` becomes the reference for `execute()`. |
| #35 (module reloc) | Reverted. Proxy stays in core for now; step functions live alongside it. |
| RFC #18 (KV cache) | `ExecutionContext` is extensible — a `session_cache: Option<Arc<dyn SessionCache>>` field can be added without breaking the API. |

## Open Questions

1. **Streaming + tool loop tension**: SSE streaming to clients requires forwarding chunks in real-time. But the tool loop needs to collect the full output to detect tool calls before iterating. vLLM solves this with a `ConversationContext` that accumulates output while yielding each chunk to the SSE stream — effectively "tee-ing" the stream. Should `call_inference` return a stream that the caller both forwards and accumulates, or should we have separate streaming/non-streaming paths? Proposal: `call_inference` returns a `Stream<InferenceEvent>`. For the tool-loop path, `execute()` collects internally. For direct streaming, the server layer can forward events as they arrive and buffer for tool detection simultaneously.

2. **`serde_json::Value` vs typed items**: PR #34 uses `Vec<serde_json::Value>` for input items (maximum flexibility, no schema enforcement). PR #33 defines typed `InOutItem` enum. Proposal: core API uses `serde_json::Value` at boundaries; typed wrappers are optional convenience in a `types` module.

3. **Error granularity**: Should each step have its own error type, or share a unified `Error` enum? Proposal: single `Error` enum with variants per failure domain (`Store`, `Inference`, `ToolDispatch`, `Serialization`).

4. **Where does SSE assembly live?** `assemble_response` produces the final API object, but SSE event framing (data: ...\n\n) is HTTP-level. Proposal: assembly produces the typed response; SSE framing lives in `agentic-server`.

---

cc @leseb — as discussed, happy to drive the implementation. Would appreciate feedback on the trait boundaries and type choices before I start coding.


PR	Relationship
#33 (store CRUD)	Becomes a `ResponseStore` trait implementation. Types align with `StoredResponse`/`StoredItem`.
#34 (OGX loop)	`OgxStore` becomes a `ResponseStore` impl. The agentic loop in `agentic.rs` becomes the reference for `execute()`.
#35 (module reloc)	Reverted. Proxy stays in core for now; step functions live alongside it.
RFC #18 (KV cache)	`ExecutionContext` is extensible — a `session_cache: Option<Arc<dyn SessionCache>>` field can be added without breaking the API.

feat: agentic-core public API — composable step functions per ADR-03 #42

Description

Proposal: agentic-core public API — composable step functions per ADR-03

Context

Design Goals

Proposed Types

ExecutionContext — shared state across steps

Store trait (ADR-02 alignment)

Inference trait

Tool dispatch trait

Proposed Step Functions

1. rehydrate_conversation

2. call_inference

3. dispatch_tools

4. assemble_response

5. persist_response

6. execute — the convenience orchestrator

Key Domain Types

Composability Example

Standalone mode (default execute())

Custom loop with guardrails

Praxis filter chain

Implementation Plan

Phase 1: Type definitions + traits (this issue)

Phase 2: rehydrate_conversation + persist_response

Phase 3: call_inference

Phase 4: dispatch_tools + execute

Phase 5: Praxis filter wrappers

Relationship to Existing PRs

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Proposal: `agentic-core` public API — composable step functions per ADR-03

`ExecutionContext` — shared state across steps

1. `rehydrate_conversation`

2. `call_inference`

3. `dispatch_tools`

4. `assemble_response`

5. `persist_response`

6. `execute` — the convenience orchestrator

Standalone mode (default `execute()`)

Phase 2: `rehydrate_conversation` + `persist_response`

Phase 3: `call_inference`

Phase 4: `dispatch_tools` + `execute`