From 4f970e6aa949f99df43e723d69e655618edcbb06 Mon Sep 17 00:00:00 2001
From: Yolanda Robla <info@ysoft.biz>
Date: Wed, 4 Feb 2026 14:29:41 +0100
Subject: [PATCH 1/8] Add RFC: Session-scoped MCP client lifecycle management

The RFC captures the complete architectural vision from the discussion,
providing a roadmap for simplifying the client pooling implementation
while maintaining all current functionality.
---
 ...HV-0034-session-scoped-client-lifecycle.md | 432 ++++++++++++++++++
 1 file changed, 432 insertions(+)
 create mode 100644 rfcs/THV-0034-session-scoped-client-lifecycle.md

diff --git a/rfcs/THV-0034-session-scoped-client-lifecycle.md b/rfcs/THV-0034-session-scoped-client-lifecycle.md
new file mode 100644
index 0000000..8268438
--- /dev/null
+++ b/rfcs/THV-0034-session-scoped-client-lifecycle.md
@@ -0,0 +1,432 @@
+# RFC: Session-Scoped MCP Client Lifecycle Management
+
+- **Status**: Draft
+- **Author(s)**: @yrobla, @jerm-dro
+- **Created**: 2026-02-04
+- **Last Updated**: 2026-02-04
+- **Target Repository**: `toolhive`
+- **Related Issues**: #3062
+
+## Summary
+
+Move MCP backend client lifecycle from per-request creation/destruction to session-scoped management. Clients will be created once during session initialization, reused throughout the session lifetime, and closed during session cleanup. This simplifies the client pooling architecture and ensures consistent backend state preservation.
+
+## Problem Statement
+
+### Current Limitations
+
+The current `httpBackendClient` implementation creates and closes MCP clients on every request. Each method (`CallTool`, `ReadResource`, `GetPrompt`, `ListCapabilities`) follows the same pattern:
+
+1. Create a new client via `clientFactory()`
+2. Defer client closure with `c.Close()`
+3. Initialize the client with MCP handshake
+4. Perform the requested operation
+5. Close the client when the function returns
+
+**Problems with Per-Request Client Lifecycle**:
+
+1. **Connection Overhead**: Every tool call incurs TCP handshake, TLS negotiation, and MCP protocol initialization overhead.
+
+2. **State Loss**: MCP backends that maintain stateful contexts (Playwright browser sessions, database transactions, conversation history) lose state between requests. While sessions preserve routing information, the underlying connections are recreated each time.
+
+3. **Redundant Initialization**: Capability discovery during session creation (`AfterInitialize` hook) establishes which backends exist, but clients are created fresh for every subsequent request.
+
+4. **Authentication Overhead**: Each client creation re-resolves authentication strategies and validates credentials, even though this information doesn't change within a session.
+
+5. **Resource Waste**: Creating and destroying clients repeatedly wastes CPU cycles and network bandwidth, especially problematic for high-throughput scenarios.
+
+### Affected Parties
+
+- **vMCP Server Performance**: Every tool call creates/closes connections, multiplying latency by the number of requests
+- **Backend Servers**: Repeated connection churn increases load on backend MCP servers
+- **Stateful Backends**: Backends relying on persistent state (Playwright, databases) lose context between calls
+- **High-Throughput Scenarios**: Connection overhead becomes a bottleneck for applications making many tool calls
+
+### Why This Matters
+
+MCP backends often maintain stateful contexts (browser sessions, database connections, conversation history). The current per-request pattern means:
+- A browser automation workflow must navigate to the same page repeatedly
+- Database transactions cannot span multiple tool calls
+- Conversation context in AI assistants is lost between interactions
+
+Sessions already exist and have defined lifetimes (TTL-based expiration). Aligning client lifecycle with session lifecycle is a natural fit that simplifies the architecture while enabling better state preservation.
+
+## Goals & Non-Goals
+
+### Goals
+
+1. **Align Lifecycle with Sessions**: Move client lifecycle from per-request to session-scoped
+2. **Eager Initialization**: Create all backend clients during session initialization alongside capability discovery
+3. **Enable Stateful Workflows**: Allow backends to maintain state (browser contexts, DB transactions) across multiple tool calls within a session
+4. **Reduce Overhead**: Eliminate redundant connection creation, TLS negotiation, and MCP initialization
+5. **Simplify Code**: Remove repeated client creation/closure boilerplate from every method
+
+### Non-Goals
+
+- **Connection Pooling Within Clients**: Individual MCP clients may internally pool connections, but that's outside this RFC's scope
+- **Multi-Session Client Sharing**: Clients remain session-scoped and are not shared across sessions
+- **Lazy Backend Discovery**: Backend discovery remains eager (current behavior)
+- **Client Versioning**: Handling MCP protocol version negotiation is out of scope
+
+## Proposed Solution
+
+### High-Level Design
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ Session Manager                                             │
+│                                                             │
+│  ┌────────────────────────────────────────────────────┐   │
+│  │ VMCPSession (TTL-scoped)                           │   │
+│  │                                                     │   │
+│  │  - Session ID                                      │   │
+│  │  - Routing Table                                   │   │
+│  │  - Tools List                                      │   │
+│  │  - Backend Clients Map[WorkloadID]*client.Client ◄├───┼─── Created during AfterInitialize
+│  │                                                     │   │
+│  │  Lifecycle:                                        │   │
+│  │    1. Create session (Generate)                   │   │
+│  │    2. Initialize clients (AfterInitialize)        │   │
+│  │    3. Use clients (CallTool/ReadResource/...)     │   │
+│  │    4. Close clients (TTL expiry/Delete/Stop)      │   │
+│  └────────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────────┘
+                          │
+                          ▼
+              ┌────────────────────────┐
+              │ Backend Client         │
+              │ (mcp-go/client.Client) │
+              │                        │
+              │ - Initialized once     │
+              │ - Reused per session   │
+              │ - Closed with session  │
+              └────────────────────────┘
+```
+
+**Key Changes**:
+- Add `backendClients` map to `VMCPSession` for storing initialized clients
+- Modify `AfterInitialize` hook to create and initialize clients alongside capability discovery
+- Change `httpBackendClient` methods to retrieve clients from session instead of creating them
+- Add `Close()` method to `VMCPSession` to close all clients on session expiration
+- Remove per-request `defer c.Close()` pattern from all client methods
+
+### Detailed Design
+
+#### 1. VMCPSession Structure Changes
+
+**Add Backend Clients Map**:
+- Add `backendClients map[string]*client.Client` field to `VMCPSession` struct
+- This map stores initialized MCP clients keyed by workload ID
+- Protected by existing mutex for thread-safe access
+
+**New Methods**:
+- `GetBackendClient(workloadID string)`: Retrieves an initialized client for the given backend
+- `SetBackendClients(clients map[string]*client.Client)`: Stores the client map during session initialization
+- `Close()`: Iterates through all clients and closes them, returning any errors encountered
+
+**Location**: `pkg/vmcp/session/vmcp_session.go`
+
+#### 2. Client Initialization During Session Setup
+
+**Extend AfterInitialize Hook**:
+
+The `AfterInitialize` hook in the discovery middleware already performs capability discovery for each backend. Extend this to also create and initialize MCP clients:
+
+1. After capability discovery completes, iterate through all backends in the group
+2. For each backend, create a client using the backend target configuration
+3. Initialize the client immediately (MCP handshake)
+4. Store successful clients in a map keyed by workload ID
+5. For failed initializations: log warning and continue (partial initialization is acceptable)
+6. Store the complete client map in the session via `SetBackendClients()`
+
+**Error Handling**:
+- Individual client initialization failures don't fail session creation
+- Failed backends are marked unhealthy via existing health monitoring
+- Subsequent requests to failed backends return "no client" errors
+
+**Location**: `pkg/vmcp/server/middleware/discovery/discovery.go`
+
+#### 3. Refactored BackendClient Implementation
+
+**Add Session Manager Reference**:
+- Add `sessionManager` field to `httpBackendClient` struct to enable session lookup
+- Pass session manager to constructor during server initialization
+
+**New Helper Methods**:
+- `getSessionClient(workloadID)`: Retrieves pre-initialized client from session
+  - Extracts session ID from request context
+  - Looks up VMCPSession from manager
+  - Returns the client for the requested backend
+- `createAndInitializeClient(target)`: Creates and initializes a client
+  - Uses existing `clientFactory` to create the client
+  - Immediately initializes with MCP handshake
+  - Returns initialized client ready for use
+  - Called only during session setup, not per-request
+
+**Refactor Request Methods**:
+- `CallTool`, `ReadResource`, `GetPrompt`: Replace client creation pattern with:
+  - Call `getSessionClient()` to retrieve pre-initialized client
+  - Remove `defer c.Close()` (client managed by session)
+  - Remove `initializeClient()` call (already initialized)
+  - Use client directly for the operation
+- `ListCapabilities`: Keep existing per-request pattern (only called during session init)
+
+**Location**: `pkg/vmcp/client/client.go`
+
+#### 4. Session Cleanup Integration
+
+The transport layer's `LocalStorage` implementation already calls `Close()` on sessions before deletion (implemented in a previous PR). We leverage this existing mechanism to close backend clients.
+
+**VMCPSession.Close() Implementation**:
+- Iterate through all clients in the `backendClients` map
+- Call `Close()` on each client
+- Collect any errors encountered
+- Return combined errors if any failures occurred
+
+**Cleanup Triggers**:
+
+The existing session cleanup infrastructure ensures clients are closed when:
+- **TTL Expiration**: Session manager's cleanup worker calls `DeleteExpired()`, which calls `Close()` on expired sessions
+- **Explicit Deletion**: Manual session deletion via `Delete()` calls `Close()` before removing the session
+- **Manager Shutdown**: `Stop()` calls `Close()` on all remaining sessions
+
+No changes needed to the transport layer - it already has the hooks in place.
+
+#### 5. Error Handling
+
+**Connection Failures During Session Creation**:
+- Log warning and skip that backend
+- Backend marked unhealthy via existing health monitor
+- Session creation succeeds with partial backend connectivity
+
+**Connection Failures During Tool Calls**:
+- Return error to client (existing behavior)
+- Health monitor marks backend unhealthy
+- Subsequent sessions won't attempt to initialize unhealthy backends
+
+**Client Already Closed**:
+- If session expired and client is closed, return clear error
+- Client should refresh session via `/initialize` endpoint
+
+## Security
+
+### Threat Model
+
+No new security boundaries are introduced. This is a refactoring of existing client lifecycle management.
+
+### Authentication & Authorization
+
+**No Changes**: Authentication flow remains identical:
+1. Client → vMCP: Validate incoming token (existing)
+2. vMCP → Backend: Use outgoing auth configured per backend (existing)
+3. Backend → External APIs: API-level validation (existing)
+
+The only difference is **when** clients are created (session init vs first use), not **how** they authenticate.
+
+### Data Protection
+
+**Session Isolation**: Each session has its own client map. No cross-session data leakage risk.
+
+**Connection Security**: TLS configuration and certificate validation remain unchanged.
+
+### Secrets Management
+
+**No Changes**: Outgoing auth secrets are still retrieved via `OutgoingAuthRegistry` during client creation. The timing changes (session init vs first request) but the mechanism is identical.
+
+### Audit Logging
+
+**New Log Event**: Add audit log entry for client initialization during session setup:
+```json
+{
+  "event": "backend_client_initialized",
+  "session_id": "sess-123",
+  "workload_id": "github-mcp",
+  "timestamp": "2026-02-04T10:30:00Z"
+}
+```
+
+**Existing Events Unchanged**: Tool call logs, authentication logs, and session lifecycle logs remain the same.
+
+## Alternatives Considered
+
+### Alternative 1: Keep Pooling but Decouple from httpBackendClient
+
+**Approach**: Make `pooledBackendClient` accept an interface instead of embedding `*httpBackendClient`.
+
+**Pros**:
+- Less refactoring required
+- Maintains lazy initialization pattern
+
+**Cons**:
+- Still has pooling complexity
+- Doesn't address first-call latency
+- Doesn't eliminate redundant initialization logic
+
+**Decision**: Rejected. Decoupling addresses tight coupling but not the fundamental architectural issue.
+
+### Alternative 2: Lazy Pool Initialization with Eager Client Creation
+
+**Approach**: Keep pool but create all clients immediately when pool is first accessed.
+
+**Pros**:
+- Minimal code changes
+- Backward compatible with existing pool interface
+
+**Cons**:
+- Pool abstraction still adds unnecessary indirection
+- First tool call per session still incurs initialization cost
+- Doesn't simplify architecture
+
+**Decision**: Rejected. This is a middle ground that retains most of the complexity.
+
+### Alternative 3: Global Client Pool Across Sessions
+
+**Approach**: Share clients across sessions to amortize initialization cost.
+
+**Pros**:
+- Maximum connection reuse
+- Lowest per-session overhead
+
+**Cons**:
+- **State Pollution**: Backend state (Playwright contexts, DB transactions) would leak across sessions
+- **Security Risk**: Potential for cross-session data exposure
+- **Complexity**: Requires complex client reset/sanitization logic
+
+**Decision**: Rejected. Session isolation is a core requirement for vMCP.
+
+## Backward Compatibility
+
+### Breaking Changes
+
+**None for External APIs**: The `/vmcp/v1/*` HTTP API remains unchanged. Clients see identical behavior.
+
+**Internal API Changes**:
+- `BackendClient` interface: No signature changes, but behavior changes from "create on demand" to "retrieve from session"
+- Components depending on `pooledBackendClient` or `BackendClientPool` types will need updates
+
+### Forward Compatibility
+
+Future enhancements are easier with this design:
+
+- **Connection Warming**: Pre-connect to backends during idle time
+- **Health-Based Initialization**: Skip unhealthy backends during session setup
+- **Client Version Negotiation**: Perform protocol version negotiation once per session
+
+## Implementation
+
+### Phase 1: Session Client Storage
+
+Add client storage capabilities to VMCPSession:
+
+- Add `backendClients map[string]*client.Client` field to `VMCPSession` struct
+- Implement `GetBackendClient(workloadID string)` method for retrieving clients
+- Implement `SetBackendClients(clients map[string]*client.Client)` for storing client map
+- Implement `Close()` method to iterate and close all clients
+
+**Files to modify**:
+- `pkg/vmcp/session/vmcp_session.go`
+
+**Testing**:
+- Unit tests for client storage/retrieval
+- Unit tests for Close() error handling
+
+### Phase 2: Client Initialization at Session Creation
+
+Extend the discovery middleware to initialize clients during session setup:
+
+- Modify `AfterInitialize` hook in discovery middleware
+- After capability discovery, create clients for each backend
+- Initialize each client immediately (MCP handshake)
+- Store successful clients in session via `SetBackendClients()`
+- Log warnings for failed initializations but continue (partial initialization acceptable)
+
+**Files to modify**:
+- `pkg/vmcp/server/middleware/discovery/discovery.go`
+
+**Testing**:
+- Integration tests verifying clients created during session init
+- Tests for partial backend initialization (some backends fail)
+- Tests confirming failed backends don't block session creation
+
+### Phase 3: Refactor Backend Client Methods
+
+Modify httpBackendClient to retrieve clients from session instead of creating per-request:
+
+- Add `sessionManager` field to `httpBackendClient` struct
+- Pass session manager to `NewHTTPBackendClient()` constructor
+- Create `getSessionClient(ctx, workloadID)` helper method
+- Create `createAndInitializeClient(ctx, target)` method for session init use
+- Refactor `CallTool` to use `getSessionClient()` instead of `clientFactory()`
+- Refactor `ReadResource` to use `getSessionClient()`
+- Refactor `GetPrompt` to use `getSessionClient()`
+- Remove `defer c.Close()` calls from all methods
+- Remove `initializeClient()` calls from all methods (except `ListCapabilities`)
+
+**Files to modify**:
+- `pkg/vmcp/client/client.go`
+
+**Testing**:
+- Integration tests confirming clients reused across multiple tool calls
+- Tests verifying no client leaks (clients closed on session expiration)
+- Tests for "no client found" errors when backend unavailable
+
+### Phase 4: Observability & Monitoring
+
+Add observability for client lifecycle:
+
+- Add audit log event for client initialization during session setup
+- Add metrics for client creation success/failure per backend
+- Add metrics for client lifetime (session duration)
+- Add traces spanning session init → client creation → first tool call
+- Consider health-based filtering: skip unhealthy backends during initialization
+
+**Files to create/modify**:
+- Add logging in discovery middleware
+- Add metrics in client initialization path
+
+**Testing**:
+- Verify audit logs contain session ID and backend ID
+- Verify metrics show client init success/failure rates
+- Verify traces show reduced latency for subsequent tool calls
+
+### Dependencies
+
+- Existing transport session cleanup infrastructure (sessions already call `Close()` on expiration)
+- Existing discovery middleware (`AfterInitialize` hook)
+- Existing health monitoring system
+
+### Testing Strategy
+
+**Unit Tests**:
+- Session client map operations (get, set, iterate)
+- Client closure on session cleanup with error handling
+- Error handling for missing/uninitialized clients
+
+**Integration Tests**:
+- Session creation initializes clients for all configured backends
+- Multiple tool calls within same session reuse clients
+- Session expiration triggers client closure
+- Partial initialization doesn't fail session creation
+- Failed backend initialization marked in health system
+
+**End-to-End Tests**:
+- Full vMCP workflow with multiple backends
+- Backend state preservation across tool calls (e.g., Playwright browser session)
+- Client resource cleanup on TTL expiration
+- High-throughput scenarios verify no connection leaks
+
+## Review History
+
+| Date | Reviewer | Status | Comments |
+|------|----------|--------|----------|
+| 2026-02-04 | @yrobla | Author | Initial draft |
+| 2026-02-04 | @jerm-dro | Contributor | Original proposal |
+
+## Implementation Tracking
+
+| Repository | PR | Status | Notes |
+|------------|-------|--------|-------|
+| toolhive | #TBD | Not Started | Phase 1: Session client storage |
+| toolhive | #TBD | Not Started | Phase 2: Client initialization at session creation |
+| toolhive | #TBD | Not Started | Phase 3: Refactor backend client methods |
+| toolhive | #TBD | Not Started | Phase 4: Observability & monitoring |

From ceb6adfb0d8c83daa51cdd781dda94ff67e9ec25 Mon Sep 17 00:00:00 2001
From: Yolanda Robla <info@ysoft.biz>
Date: Wed, 4 Feb 2026 14:54:48 +0100
Subject: [PATCH 2/8] fixes from review

Refactor RFC to address scattered session architecture concerns

Address architectural feedback from PR #38 review, specifically jerm-dro's
comments about the root cause being scattered session concerns rather than
just client lifecycle management.

Major changes:

1. **Expanded Problem Statement**:
   - Identified root cause: session concerns scattered across middleware,
     adapters, and hooks
   - Current VMCPSession is a passive data container, not a domain object
   - Session construction tangled with SDK lifecycle callbacks
   - Hard to create session-scoped resources (clients, optimizer features)

2. **Introduced Core Interfaces**:
   - Session interface: Domain object that owns clients, encapsulates routing,
     manages lifecycle (not just a data container)
   - SessionFactory interface: Creates fully-formed sessions with all
     dependencies (discovery, client init, routing setup)
   - SessionManager: Bridges between domain (Session) and protocol (SDK)

3. **Updated Goals**:
   - Primary goal: Encapsulate session behavior in clean interfaces
   - Decouple session creation from SDK wiring
   - Enable testing without full server (addresses toolhive#2852)
   - Enable future features through interface decoration

4. **Revised Implementation Plan**:
   - Phase 1: Introduce Session/SessionFactory interfaces
   - Phase 2: Wire SessionManager to SDK
   - Phase 3: Migrate existing code to use Session interface
   - Phase 4: Cleanup and observability

5. **Added Benefits Documentation**:
   - Concrete examples of decorator pattern (optimizer-in-vmcp)
   - Simplified capability refresh approach
   - Clear testing strategy without server dependency

6. **Fixed Formatting Issues** (from Copilot review):
   - Split Goals & Non-Goals into separate sections
   - Changed "Security" to "Security Considerations"
   - Restructured Compatibility section
   - Fixed References to use full repo-qualified format
   - Added credential refresh considerations

The RFC now addresses the architectural root cause (scattered session
concerns) rather than just the symptom (per-request client creation).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Update Last Updated date to 2026-02-09
---
 ...HV-0034-session-scoped-client-lifecycle.md |  432 ------
 ...HV-0038-session-scoped-client-lifecycle.md | 1170 +++++++++++++++++
 2 files changed, 1170 insertions(+), 432 deletions(-)
 delete mode 100644 rfcs/THV-0034-session-scoped-client-lifecycle.md
 create mode 100644 rfcs/THV-0038-session-scoped-client-lifecycle.md

diff --git a/rfcs/THV-0034-session-scoped-client-lifecycle.md b/rfcs/THV-0034-session-scoped-client-lifecycle.md
deleted file mode 100644
index 8268438..0000000
--- a/rfcs/THV-0034-session-scoped-client-lifecycle.md
+++ /dev/null
@@ -1,432 +0,0 @@
-# RFC: Session-Scoped MCP Client Lifecycle Management
-
-- **Status**: Draft
-- **Author(s)**: @yrobla, @jerm-dro
-- **Created**: 2026-02-04
-- **Last Updated**: 2026-02-04
-- **Target Repository**: `toolhive`
-- **Related Issues**: #3062
-
-## Summary
-
-Move MCP backend client lifecycle from per-request creation/destruction to session-scoped management. Clients will be created once during session initialization, reused throughout the session lifetime, and closed during session cleanup. This simplifies the client pooling architecture and ensures consistent backend state preservation.
-
-## Problem Statement
-
-### Current Limitations
-
-The current `httpBackendClient` implementation creates and closes MCP clients on every request. Each method (`CallTool`, `ReadResource`, `GetPrompt`, `ListCapabilities`) follows the same pattern:
-
-1. Create a new client via `clientFactory()`
-2. Defer client closure with `c.Close()`
-3. Initialize the client with MCP handshake
-4. Perform the requested operation
-5. Close the client when the function returns
-
-**Problems with Per-Request Client Lifecycle**:
-
-1. **Connection Overhead**: Every tool call incurs TCP handshake, TLS negotiation, and MCP protocol initialization overhead.
-
-2. **State Loss**: MCP backends that maintain stateful contexts (Playwright browser sessions, database transactions, conversation history) lose state between requests. While sessions preserve routing information, the underlying connections are recreated each time.
-
-3. **Redundant Initialization**: Capability discovery during session creation (`AfterInitialize` hook) establishes which backends exist, but clients are created fresh for every subsequent request.
-
-4. **Authentication Overhead**: Each client creation re-resolves authentication strategies and validates credentials, even though this information doesn't change within a session.
-
-5. **Resource Waste**: Creating and destroying clients repeatedly wastes CPU cycles and network bandwidth, especially problematic for high-throughput scenarios.
-
-### Affected Parties
-
-- **vMCP Server Performance**: Every tool call creates/closes connections, multiplying latency by the number of requests
-- **Backend Servers**: Repeated connection churn increases load on backend MCP servers
-- **Stateful Backends**: Backends relying on persistent state (Playwright, databases) lose context between calls
-- **High-Throughput Scenarios**: Connection overhead becomes a bottleneck for applications making many tool calls
-
-### Why This Matters
-
-MCP backends often maintain stateful contexts (browser sessions, database connections, conversation history). The current per-request pattern means:
-- A browser automation workflow must navigate to the same page repeatedly
-- Database transactions cannot span multiple tool calls
-- Conversation context in AI assistants is lost between interactions
-
-Sessions already exist and have defined lifetimes (TTL-based expiration). Aligning client lifecycle with session lifecycle is a natural fit that simplifies the architecture while enabling better state preservation.
-
-## Goals & Non-Goals
-
-### Goals
-
-1. **Align Lifecycle with Sessions**: Move client lifecycle from per-request to session-scoped
-2. **Eager Initialization**: Create all backend clients during session initialization alongside capability discovery
-3. **Enable Stateful Workflows**: Allow backends to maintain state (browser contexts, DB transactions) across multiple tool calls within a session
-4. **Reduce Overhead**: Eliminate redundant connection creation, TLS negotiation, and MCP initialization
-5. **Simplify Code**: Remove repeated client creation/closure boilerplate from every method
-
-### Non-Goals
-
-- **Connection Pooling Within Clients**: Individual MCP clients may internally pool connections, but that's outside this RFC's scope
-- **Multi-Session Client Sharing**: Clients remain session-scoped and are not shared across sessions
-- **Lazy Backend Discovery**: Backend discovery remains eager (current behavior)
-- **Client Versioning**: Handling MCP protocol version negotiation is out of scope
-
-## Proposed Solution
-
-### High-Level Design
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│ Session Manager                                             │
-│                                                             │
-│  ┌────────────────────────────────────────────────────┐   │
-│  │ VMCPSession (TTL-scoped)                           │   │
-│  │                                                     │   │
-│  │  - Session ID                                      │   │
-│  │  - Routing Table                                   │   │
-│  │  - Tools List                                      │   │
-│  │  - Backend Clients Map[WorkloadID]*client.Client ◄├───┼─── Created during AfterInitialize
-│  │                                                     │   │
-│  │  Lifecycle:                                        │   │
-│  │    1. Create session (Generate)                   │   │
-│  │    2. Initialize clients (AfterInitialize)        │   │
-│  │    3. Use clients (CallTool/ReadResource/...)     │   │
-│  │    4. Close clients (TTL expiry/Delete/Stop)      │   │
-│  └────────────────────────────────────────────────────┘   │
-└─────────────────────────────────────────────────────────────┘
-                          │
-                          ▼
-              ┌────────────────────────┐
-              │ Backend Client         │
-              │ (mcp-go/client.Client) │
-              │                        │
-              │ - Initialized once     │
-              │ - Reused per session   │
-              │ - Closed with session  │
-              └────────────────────────┘
-```
-
-**Key Changes**:
-- Add `backendClients` map to `VMCPSession` for storing initialized clients
-- Modify `AfterInitialize` hook to create and initialize clients alongside capability discovery
-- Change `httpBackendClient` methods to retrieve clients from session instead of creating them
-- Add `Close()` method to `VMCPSession` to close all clients on session expiration
-- Remove per-request `defer c.Close()` pattern from all client methods
-
-### Detailed Design
-
-#### 1. VMCPSession Structure Changes
-
-**Add Backend Clients Map**:
-- Add `backendClients map[string]*client.Client` field to `VMCPSession` struct
-- This map stores initialized MCP clients keyed by workload ID
-- Protected by existing mutex for thread-safe access
-
-**New Methods**:
-- `GetBackendClient(workloadID string)`: Retrieves an initialized client for the given backend
-- `SetBackendClients(clients map[string]*client.Client)`: Stores the client map during session initialization
-- `Close()`: Iterates through all clients and closes them, returning any errors encountered
-
-**Location**: `pkg/vmcp/session/vmcp_session.go`
-
-#### 2. Client Initialization During Session Setup
-
-**Extend AfterInitialize Hook**:
-
-The `AfterInitialize` hook in the discovery middleware already performs capability discovery for each backend. Extend this to also create and initialize MCP clients:
-
-1. After capability discovery completes, iterate through all backends in the group
-2. For each backend, create a client using the backend target configuration
-3. Initialize the client immediately (MCP handshake)
-4. Store successful clients in a map keyed by workload ID
-5. For failed initializations: log warning and continue (partial initialization is acceptable)
-6. Store the complete client map in the session via `SetBackendClients()`
-
-**Error Handling**:
-- Individual client initialization failures don't fail session creation
-- Failed backends are marked unhealthy via existing health monitoring
-- Subsequent requests to failed backends return "no client" errors
-
-**Location**: `pkg/vmcp/server/middleware/discovery/discovery.go`
-
-#### 3. Refactored BackendClient Implementation
-
-**Add Session Manager Reference**:
-- Add `sessionManager` field to `httpBackendClient` struct to enable session lookup
-- Pass session manager to constructor during server initialization
-
-**New Helper Methods**:
-- `getSessionClient(workloadID)`: Retrieves pre-initialized client from session
-  - Extracts session ID from request context
-  - Looks up VMCPSession from manager
-  - Returns the client for the requested backend
-- `createAndInitializeClient(target)`: Creates and initializes a client
-  - Uses existing `clientFactory` to create the client
-  - Immediately initializes with MCP handshake
-  - Returns initialized client ready for use
-  - Called only during session setup, not per-request
-
-**Refactor Request Methods**:
-- `CallTool`, `ReadResource`, `GetPrompt`: Replace client creation pattern with:
-  - Call `getSessionClient()` to retrieve pre-initialized client
-  - Remove `defer c.Close()` (client managed by session)
-  - Remove `initializeClient()` call (already initialized)
-  - Use client directly for the operation
-- `ListCapabilities`: Keep existing per-request pattern (only called during session init)
-
-**Location**: `pkg/vmcp/client/client.go`
-
-#### 4. Session Cleanup Integration
-
-The transport layer's `LocalStorage` implementation already calls `Close()` on sessions before deletion (implemented in a previous PR). We leverage this existing mechanism to close backend clients.
-
-**VMCPSession.Close() Implementation**:
-- Iterate through all clients in the `backendClients` map
-- Call `Close()` on each client
-- Collect any errors encountered
-- Return combined errors if any failures occurred
-
-**Cleanup Triggers**:
-
-The existing session cleanup infrastructure ensures clients are closed when:
-- **TTL Expiration**: Session manager's cleanup worker calls `DeleteExpired()`, which calls `Close()` on expired sessions
-- **Explicit Deletion**: Manual session deletion via `Delete()` calls `Close()` before removing the session
-- **Manager Shutdown**: `Stop()` calls `Close()` on all remaining sessions
-
-No changes needed to the transport layer - it already has the hooks in place.
-
-#### 5. Error Handling
-
-**Connection Failures During Session Creation**:
-- Log warning and skip that backend
-- Backend marked unhealthy via existing health monitor
-- Session creation succeeds with partial backend connectivity
-
-**Connection Failures During Tool Calls**:
-- Return error to client (existing behavior)
-- Health monitor marks backend unhealthy
-- Subsequent sessions won't attempt to initialize unhealthy backends
-
-**Client Already Closed**:
-- If session expired and client is closed, return clear error
-- Client should refresh session via `/initialize` endpoint
-
-## Security
-
-### Threat Model
-
-No new security boundaries are introduced. This is a refactoring of existing client lifecycle management.
-
-### Authentication & Authorization
-
-**No Changes**: Authentication flow remains identical:
-1. Client → vMCP: Validate incoming token (existing)
-2. vMCP → Backend: Use outgoing auth configured per backend (existing)
-3. Backend → External APIs: API-level validation (existing)
-
-The only difference is **when** clients are created (session init vs first use), not **how** they authenticate.
-
-### Data Protection
-
-**Session Isolation**: Each session has its own client map. No cross-session data leakage risk.
-
-**Connection Security**: TLS configuration and certificate validation remain unchanged.
-
-### Secrets Management
-
-**No Changes**: Outgoing auth secrets are still retrieved via `OutgoingAuthRegistry` during client creation. The timing changes (session init vs first request) but the mechanism is identical.
-
-### Audit Logging
-
-**New Log Event**: Add audit log entry for client initialization during session setup:
-```json
-{
-  "event": "backend_client_initialized",
-  "session_id": "sess-123",
-  "workload_id": "github-mcp",
-  "timestamp": "2026-02-04T10:30:00Z"
-}
-```
-
-**Existing Events Unchanged**: Tool call logs, authentication logs, and session lifecycle logs remain the same.
-
-## Alternatives Considered
-
-### Alternative 1: Keep Pooling but Decouple from httpBackendClient
-
-**Approach**: Make `pooledBackendClient` accept an interface instead of embedding `*httpBackendClient`.
-
-**Pros**:
-- Less refactoring required
-- Maintains lazy initialization pattern
-
-**Cons**:
-- Still has pooling complexity
-- Doesn't address first-call latency
-- Doesn't eliminate redundant initialization logic
-
-**Decision**: Rejected. Decoupling addresses tight coupling but not the fundamental architectural issue.
-
-### Alternative 2: Lazy Pool Initialization with Eager Client Creation
-
-**Approach**: Keep pool but create all clients immediately when pool is first accessed.
-
-**Pros**:
-- Minimal code changes
-- Backward compatible with existing pool interface
-
-**Cons**:
-- Pool abstraction still adds unnecessary indirection
-- First tool call per session still incurs initialization cost
-- Doesn't simplify architecture
-
-**Decision**: Rejected. This is a middle ground that retains most of the complexity.
-
-### Alternative 3: Global Client Pool Across Sessions
-
-**Approach**: Share clients across sessions to amortize initialization cost.
-
-**Pros**:
-- Maximum connection reuse
-- Lowest per-session overhead
-
-**Cons**:
-- **State Pollution**: Backend state (Playwright contexts, DB transactions) would leak across sessions
-- **Security Risk**: Potential for cross-session data exposure
-- **Complexity**: Requires complex client reset/sanitization logic
-
-**Decision**: Rejected. Session isolation is a core requirement for vMCP.
-
-## Backward Compatibility
-
-### Breaking Changes
-
-**None for External APIs**: The `/vmcp/v1/*` HTTP API remains unchanged. Clients see identical behavior.
-
-**Internal API Changes**:
-- `BackendClient` interface: No signature changes, but behavior changes from "create on demand" to "retrieve from session"
-- Components depending on `pooledBackendClient` or `BackendClientPool` types will need updates
-
-### Forward Compatibility
-
-Future enhancements are easier with this design:
-
-- **Connection Warming**: Pre-connect to backends during idle time
-- **Health-Based Initialization**: Skip unhealthy backends during session setup
-- **Client Version Negotiation**: Perform protocol version negotiation once per session
-
-## Implementation
-
-### Phase 1: Session Client Storage
-
-Add client storage capabilities to VMCPSession:
-
-- Add `backendClients map[string]*client.Client` field to `VMCPSession` struct
-- Implement `GetBackendClient(workloadID string)` method for retrieving clients
-- Implement `SetBackendClients(clients map[string]*client.Client)` for storing client map
-- Implement `Close()` method to iterate and close all clients
-
-**Files to modify**:
-- `pkg/vmcp/session/vmcp_session.go`
-
-**Testing**:
-- Unit tests for client storage/retrieval
-- Unit tests for Close() error handling
-
-### Phase 2: Client Initialization at Session Creation
-
-Extend the discovery middleware to initialize clients during session setup:
-
-- Modify `AfterInitialize` hook in discovery middleware
-- After capability discovery, create clients for each backend
-- Initialize each client immediately (MCP handshake)
-- Store successful clients in session via `SetBackendClients()`
-- Log warnings for failed initializations but continue (partial initialization acceptable)
-
-**Files to modify**:
-- `pkg/vmcp/server/middleware/discovery/discovery.go`
-
-**Testing**:
-- Integration tests verifying clients created during session init
-- Tests for partial backend initialization (some backends fail)
-- Tests confirming failed backends don't block session creation
-
-### Phase 3: Refactor Backend Client Methods
-
-Modify httpBackendClient to retrieve clients from session instead of creating per-request:
-
-- Add `sessionManager` field to `httpBackendClient` struct
-- Pass session manager to `NewHTTPBackendClient()` constructor
-- Create `getSessionClient(ctx, workloadID)` helper method
-- Create `createAndInitializeClient(ctx, target)` method for session init use
-- Refactor `CallTool` to use `getSessionClient()` instead of `clientFactory()`
-- Refactor `ReadResource` to use `getSessionClient()`
-- Refactor `GetPrompt` to use `getSessionClient()`
-- Remove `defer c.Close()` calls from all methods
-- Remove `initializeClient()` calls from all methods (except `ListCapabilities`)
-
-**Files to modify**:
-- `pkg/vmcp/client/client.go`
-
-**Testing**:
-- Integration tests confirming clients reused across multiple tool calls
-- Tests verifying no client leaks (clients closed on session expiration)
-- Tests for "no client found" errors when backend unavailable
-
-### Phase 4: Observability & Monitoring
-
-Add observability for client lifecycle:
-
-- Add audit log event for client initialization during session setup
-- Add metrics for client creation success/failure per backend
-- Add metrics for client lifetime (session duration)
-- Add traces spanning session init → client creation → first tool call
-- Consider health-based filtering: skip unhealthy backends during initialization
-
-**Files to create/modify**:
-- Add logging in discovery middleware
-- Add metrics in client initialization path
-
-**Testing**:
-- Verify audit logs contain session ID and backend ID
-- Verify metrics show client init success/failure rates
-- Verify traces show reduced latency for subsequent tool calls
-
-### Dependencies
-
-- Existing transport session cleanup infrastructure (sessions already call `Close()` on expiration)
-- Existing discovery middleware (`AfterInitialize` hook)
-- Existing health monitoring system
-
-### Testing Strategy
-
-**Unit Tests**:
-- Session client map operations (get, set, iterate)
-- Client closure on session cleanup with error handling
-- Error handling for missing/uninitialized clients
-
-**Integration Tests**:
-- Session creation initializes clients for all configured backends
-- Multiple tool calls within same session reuse clients
-- Session expiration triggers client closure
-- Partial initialization doesn't fail session creation
-- Failed backend initialization marked in health system
-
-**End-to-End Tests**:
-- Full vMCP workflow with multiple backends
-- Backend state preservation across tool calls (e.g., Playwright browser session)
-- Client resource cleanup on TTL expiration
-- High-throughput scenarios verify no connection leaks
-
-## Review History
-
-| Date | Reviewer | Status | Comments |
-|------|----------|--------|----------|
-| 2026-02-04 | @yrobla | Author | Initial draft |
-| 2026-02-04 | @jerm-dro | Contributor | Original proposal |
-
-## Implementation Tracking
-
-| Repository | PR | Status | Notes |
-|------------|-------|--------|-------|
-| toolhive | #TBD | Not Started | Phase 1: Session client storage |
-| toolhive | #TBD | Not Started | Phase 2: Client initialization at session creation |
-| toolhive | #TBD | Not Started | Phase 3: Refactor backend client methods |
-| toolhive | #TBD | Not Started | Phase 4: Observability & monitoring |
diff --git a/rfcs/THV-0038-session-scoped-client-lifecycle.md b/rfcs/THV-0038-session-scoped-client-lifecycle.md
new file mode 100644
index 0000000..c96a25f
--- /dev/null
+++ b/rfcs/THV-0038-session-scoped-client-lifecycle.md
@@ -0,0 +1,1170 @@
+# THV-0038: Session-Scoped MCP Client Lifecycle Management
+
+- **Status**: Draft
+- **Author(s)**: @yrobla, @jerm-dro
+- **Created**: 2026-02-04
+- **Last Updated**: 2026-02-09
+- **Target Repository**: toolhive
+- **Related Issues**: [toolhive#3062](https://github.com/stacklok/toolhive/issues/3062)
+
+## Summary
+
+Refactor vMCP session architecture to encapsulate behavior in well-defined interfaces (`Session` and `SessionFactory`), decoupling session creation (domain logic) from SDK integration (protocol concerns). This restructuring naturally enables session-scoped MCP client lifecycle management, where clients are created once during initialization, reused throughout the session, and closed during cleanup. The new architecture simplifies testing, reduces connection overhead, enables stateful backend workflows, and makes future features (optimizer-in-vmcp, capability refresh) straightforward to implement through interface decoration.
+
+## Problem Statement
+
+### The Symptom: Per-Request Client Lifecycle
+
+The current `httpBackendClient` implementation creates and closes MCP clients on every request. Each method (`CallTool`, `ReadResource`, `GetPrompt`, `ListCapabilities`) follows the same pattern:
+
+1. Create a new client via `clientFactory()`
+2. Defer client closure with `c.Close()`
+3. Initialize the client with MCP handshake
+4. Perform the requested operation
+5. Close the client when the function returns
+
+This creates multiple problems:
+
+1. **Connection Overhead**: Every tool call incurs TCP handshake, TLS negotiation, and MCP protocol initialization overhead
+2. **State Loss**: Backends maintaining stateful contexts (Playwright browser sessions, database transactions) lose state between requests
+3. **Redundant Initialization**: Capability discovery establishes which backends exist, but clients are recreated for every request
+4. **Resource Waste**: Repeated client creation/destruction wastes CPU and bandwidth
+
+### The Root Cause: Scattered Session Architecture
+
+The per-request client pattern is a symptom of a deeper architectural problem: **session concerns are scattered throughout the codebase without clear encapsulation**.
+
+#### Current Architecture Problems
+
+**1. Session Creation is Tangled with SDK Integration**
+
+Session construction is spread across middleware, adapters, and hooks—tightly coupled to the MCP SDK's lifecycle callbacks:
+
+- **Discovery middleware** triggers capability discovery before the session exists, then stuffs results into request context ([discovery/middleware.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/discovery/middleware.go#L109-L110))
+- **Session ID adapter** creates an empty session when the SDK calls `Generate()`, with no access to discovered capabilities ([session_adapter.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/server/session_adapter.go#L50-L71))
+- **OnRegisterSession hook** fishes capabilities back out of context and populates the session ([server.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/server/server.go#L777-L850))
+
+This creates cognitive load: to understand "how does a session get created?", you must trace through middleware, adapter callbacks, SDK hooks, and context threading across multiple files.
+
+**2. VMCPSession is a Passive Data Container, Not a Domain Object**
+
+The existing `VMCPSession` is a struct with getters and setters, not an object with encapsulated behavior:
+
+```go
+type VMCPSession struct {
+    routingTable *vmcp.RoutingTable
+    tools        []vmcp.Tool
+    mu           sync.RWMutex
+}
+
+func (s *VMCPSession) SetRoutingTable(rt *vmcp.RoutingTable) { ... }
+func (s *VMCPSession) RoutingTable() *vmcp.RoutingTable      { ... }
+```
+
+Problems with this approach:
+
+- **Data written in one place, read in another**: Routing table set in `OnRegisterSession`, but read by router via context, not from the session object
+- **No single source of truth**: Session state scattered across context values, `VMCPSession` struct, and transport layer's `StreamableSession`
+- **Objects don't do anything**: Router, handler factory, and backend client are separate stateless components operating on data pulled from context
+
+**3. Routing Logic is Context-Based, Not Session-Based**
+
+The current design routes requests via **context**, not session object lookup:
+
+- Middleware stuffs capabilities into context before SDK sees request ([discovery/middleware.go:109-110](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/discovery/middleware.go#L109-L110))
+- Router is stateless - extracts routing table from context on every request ([router/default_router.go:54](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/router/default_router.go#L54)):
+  ```go
+  capabilities, ok := discovery.DiscoveredCapabilitiesFromContext(ctx)
+  target := capabilities.RoutingTable.Tools[toolName]
+  ```
+- Handler factory uses router to find backend target, then calls shared backend client ([adapter/handler_factory.go:103-125](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/server/adapter/handler_factory.go#L103-L125))
+
+**Flow**: `request → middleware (stuff ctx) → handler → router (read ctx) → backend client`
+
+There's no `sessions.Load(sessionID)` - routing data flows through context. The backend client is shared across all sessions and creates new MCP connections per request.
+
+**4. Hard to Create Session-Scoped Resources**
+
+Because session construction is tangled with SDK lifecycle hooks and routing logic flows through context instead of session objects, it's difficult to create objects with session lifetimes (like persistent MCP clients). This is the immediate problem we're solving, but it's symptomatic of the larger architectural issue.
+
+### Consequences
+
+These architectural problems have cascading effects:
+
+1. **Client lifecycle complexity**: Per-request client creation is the path of least resistance when you can't easily create session-scoped resources
+2. **Missing integration tests**: Can't create sessions without spinning up the whole vMCP server ([toolhive#2852](https://github.com/stacklok/toolhive/issues/2852))
+3. **Difficult feature additions**: Adding optimizer-in-vmcp ([PR #3517](https://github.com/stacklok/toolhive/pull/3517), [PR #3312](https://github.com/stacklok/toolhive/pull/3312)) required extensive changes to `server.go` hooks
+4. **Capability refresh complexity**: Refreshing session capabilities requires coordinating updates across middleware, router, and SDK registration ([PR #3642 discussion](https://github.com/stacklok/toolhive/pull/3642#issuecomment-3861182622))
+
+### Affected Parties
+
+- **vMCP Server Performance**: Every tool call creates/closes connections, multiplying latency by the number of requests
+- **Backend Servers**: Repeated connection churn increases load on backend MCP servers
+- **Stateful Backends**: Backends relying on persistent state (Playwright, databases) lose context between calls
+- **Developers**: Scattered session concerns make the codebase hard to understand, test, and extend
+- **Feature Development**: Adding session-related features (optimizer, capability refresh) requires changes across multiple layers
+
+## Goals
+
+1. **Encapsulate Session Behavior**: Introduce a `Session` interface as a domain object that owns its resources, encapsulates routing logic, and manages its own lifecycle
+2. **Decouple Session Creation from SDK Wiring**: Separate the concern of *building a session* (domain logic) from *integrating with the MCP SDK* (protocol concerns) using a `SessionFactory` interface
+3. **Session-Scoped Client Lifecycle**: Move MCP backend client lifecycle from per-request to session-scoped, enabling stateful workflows
+4. **Simplify Testing**: Enable unit and integration tests of session logic without requiring a full vMCP server
+5. **Enable Future Features**: Make it straightforward to add features like optimizer-in-vmcp, capability refresh, and connection warming through clean interfaces
+
+## Non-Goals
+
+- **Rewrite All Session Types**: Initial focus is on `VMCPSession`; other session types (ProxySession, SSESession) remain unchanged except for interface compliance
+- **Connection Pooling Within Clients**: Individual MCP clients may internally pool connections, but that's outside this RFC's scope
+- **Multi-Session Client Sharing**: Clients remain session-scoped and are not shared across sessions
+- **Lazy Backend Discovery**: Backend discovery remains eager (current behavior)
+- **Client Versioning**: Handling MCP protocol version negotiation is out of scope
+
+## Proposed Solution
+
+### High-Level Design
+
+This RFC proposes restructuring session architecture around two key interfaces:
+
+1. **SessionFactory**: Creates fully-formed sessions with all dependencies (capability discovery, client initialization, resource allocation)
+2. **Session**: A domain object that owns its resources, encapsulates routing logic, and manages its own lifecycle
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│ SessionManager (implements SDK SessionIdManager)                │
+│                                                                  │
+│  - Uses SessionFactory to create sessions                       │
+│  - Stores sessions in sync.Map                                  │
+│  - Adapts Session interface to SDK tool/resource registration   │
+│                                                                  │
+│  Generate(ctx) string:                                          │
+│    1. Extract identity, backends from context                   │
+│    2. session := factory.MakeSession(ctx, identity, backends)   │
+│    3. Store session by ID                                       │
+│    4. Return session ID to SDK                                  │
+│                                                                  │
+│  Terminate(sessionID) error:                                    │
+│    1. Load and delete session                                   │
+│    2. Call session.Close()                                      │
+└─────────────────────────────────────────────────────────────────┘
+                          │
+                          │ creates
+                          ▼
+┌─────────────────────────────────────────────────────────────────┐
+│ SessionFactory.MakeSession(ctx, identity, backends)             │
+│                                                                  │
+│  1. Discover capabilities (aggregator.AggregateCapabilities)    │
+│  2. Create MCP clients for all backends (clientFactory)         │
+│  3. Return fully-formed Session with:                           │
+│     - ID, identity                                              │
+│     - Routing table, tools, resources                           │
+│     - Pre-initialized backend clients map                       │
+└─────────────────────────────────────────────────────────────────┘
+                          │
+                          │ returns
+                          ▼
+┌─────────────────────────────────────────────────────────────────┐
+│ Session (domain object)                                         │
+│                                                                  │
+│  - ID() string                                                  │
+│  - Tools() []Tool, Resources() []Resource                       │
+│                                                                  │
+│  - CallTool(ctx, name, args) (*ToolResult, error)               │
+│    → Routes to backend via routing table                        │
+│    → Uses pre-initialized client from clients map               │
+│                                                                  │
+│  - ReadResource(ctx, uri) (*ResourceResult, error)              │
+│  - GetPrompt(ctx, name, args) (*PromptResult, error)            │
+│                                                                  │
+│  - Close() error                                                │
+│    → Closes all backend clients                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### Terminology Clarification
+
+This RFC uses the following terms consistently:
+
+- **Session creation**: Building a fully-formed session with all resources (domain concern) - handled by `SessionFactory`
+- **SDK wiring**: Integrating sessions with the MCP SDK lifecycle callbacks and tool/resource registration (protocol concern) - handled by `SessionManager`
+- **Domain object**: An object that encapsulates business logic and owns its resources, not just a data container
+- **Backend client**: An MCP client connected to a specific backend workload (instance of `mcp-go/client.Client`)
+- **Session-scoped**: Resources that live for the duration of a session (created on init, closed on expiration)
+
+### Key Architectural Changes
+
+**1. Session as a Domain Object**
+
+Today's `VMCPSession` is a passive data container:
+```go
+// Current: passive data container
+type VMCPSession struct {
+    routingTable *vmcp.RoutingTable
+    tools        []vmcp.Tool
+}
+func (s *VMCPSession) SetRoutingTable(rt *vmcp.RoutingTable) { ... }
+func (s *VMCPSession) RoutingTable() *vmcp.RoutingTable      { ... }
+```
+
+Proposed `Session` interface is an active domain object:
+```go
+// Proposed: active domain object
+type Session interface {
+    ID() string
+    Tools() []Tool
+    Resources() []Resource
+
+    // Routing logic encapsulated here
+    CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error)
+    ReadResource(ctx context.Context, uri string) (*ResourceResult, error)
+    GetPrompt(ctx context.Context, name string, arguments map[string]any) (*PromptResult, error)
+
+    Close() error
+}
+```
+
+**2. Decoupling Session Creation from SDK Wiring**
+
+Today: Session construction is spread across middleware (discovery), adapter callbacks (`Generate()`), and SDK hooks (`OnRegisterSession`).
+
+Proposed: Clean separation:
+- **SessionFactory**: Builds complete sessions (discovery + client initialization + routing setup)
+- **SessionManager**: Bridges between `SessionFactory` (domain) and MCP SDK (protocol)
+
+**3. Pre-initialized Backend Clients**
+
+Clients are created once during `SessionFactory.MakeSession()` and owned by the session. The same client used for capability discovery becomes the session-scoped client (no redundant connections).
+
+### Detailed Design
+
+#### 1. Core Interfaces
+
+**Session Interface** (`pkg/vmcp/session/session.go`):
+
+```go
+// Session represents an active MCP session with all its capabilities and resources.
+// This is a pure domain object - no protocol concerns. Can be tested without spinning up a server.
+type Session interface {
+    ID() string
+    Identity() *auth.Identity
+
+    // Capabilities - returns discovered tools/resources for this session
+    Tools() []Tool
+    Resources() []Resource
+    Prompts() []Prompt
+
+    // MCP operations - routing logic is encapsulated here
+    CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error)
+    ReadResource(ctx context.Context, uri string) (*ResourceResult, error)
+    GetPrompt(ctx context.Context, name string, arguments map[string]any) (*PromptResult, error)
+
+    // Lifecycle
+    Close() error
+}
+```
+
+**SessionFactory Interface** (`pkg/vmcp/session/factory.go`):
+
+```go
+// SessionFactory creates fully-formed sessions from configuration and runtime inputs.
+type SessionFactory interface {
+    // MakeSession constructs a session with all its dependencies.
+    // This is where capability discovery, client creation, and resource allocation happen.
+    MakeSession(
+        ctx context.Context,
+        identity *auth.Identity,
+        backends []Backend,
+    ) (Session, error)
+}
+```
+
+#### 2. SessionFactory Implementation
+
+**defaultSessionFactory** (`pkg/vmcp/session/factory.go`):
+
+```go
+type defaultSessionFactory struct {
+    aggregator    Aggregator       // Capability discovery
+    clientFactory ClientFactory    // Backend client creation
+    config        *Config
+}
+
+func (f *defaultSessionFactory) MakeSession(
+    ctx context.Context,
+    identity *auth.Identity,
+    backends []Backend,
+) (Session, error) {
+    // 1. Discover capabilities AND create clients (single pass)
+    //    The aggregator creates clients, queries capabilities, and returns both
+    result, err := f.aggregator.AggregateCapabilities(ctx, backends)
+    if err != nil {
+        return nil, fmt.Errorf("capability discovery failed: %w", err)
+    }
+
+    // 2. Return fully-formed session with clients from discovery
+    //    No need to create new clients - we reuse the ones from capability discovery
+    return &defaultSession{
+        id:           uuid.New().String(),
+        identity:     identity,
+        routingTable: result.RoutingTable,
+        tools:        result.Tools,
+        resources:    result.Resources,
+        prompts:      result.Prompts,
+        clients:      result.Clients,  // ← Reused from discovery
+    }, nil
+}
+```
+
+**Key Design Decision**: Reuse the same client for capability discovery and session operations. The `aggregator.AggregateCapabilities()` creates clients to query backend capabilities and returns both the aggregated capabilities AND the initialized clients. The factory passes these clients directly to the session, avoiding redundant connection setup.
+
+**Aggregator Return Type** (`pkg/vmcp/aggregator/types.go`):
+```go
+type AggregationResult struct {
+    RoutingTable *RoutingTable
+    Tools        []Tool
+    Resources    []Resource
+    Prompts      []Prompt
+    Clients      map[string]*Client  // Initialized clients keyed by workload ID
+}
+```
+
+**Why this matters**: Creating a client requires TCP handshake, TLS negotiation, and MCP protocol initialization. By reusing clients from discovery, we eliminate this overhead and session creation is significantly faster.
+
+#### 3. Session Implementation
+
+**defaultSession** (`pkg/vmcp/session/default_session.go`):
+
+```go
+type defaultSession struct {
+    id           string
+    identity     *auth.Identity
+    routingTable *RoutingTable
+    tools        []Tool
+    resources    []Resource
+    prompts      []Prompt
+    clients      map[string]*Client // backendID -> client
+    mu           sync.RWMutex
+}
+
+func (s *defaultSession) ID() string            { return s.id }
+func (s *defaultSession) Identity() *auth.Identity { return s.identity }
+func (s *defaultSession) Tools() []Tool         { return s.tools }
+func (s *defaultSession) Resources() []Resource { return s.resources }
+func (s *defaultSession) Prompts() []Prompt     { return s.prompts }
+
+func (s *defaultSession) CallTool(ctx context.Context, name string, args map[string]any) (*ToolResult, error) {
+    s.mu.RLock()
+    defer s.mu.RUnlock()
+
+    // Route to backend
+    target, ok := s.routingTable.Tools[name]
+    if !ok {
+        return nil, fmt.Errorf("tool not found: %s", name)
+    }
+
+    // Get pre-initialized client
+    client, ok := s.clients[target.WorkloadID]
+    if !ok {
+        return nil, fmt.Errorf("no client found for backend: %s", target.WorkloadID)
+    }
+
+    // Call backend with original (un-prefixed) tool name
+    return client.CallTool(ctx, target.OriginalName, args)
+}
+
+func (s *defaultSession) ReadResource(ctx context.Context, uri string) (*ResourceResult, error) {
+    s.mu.RLock()
+    defer s.mu.RUnlock()
+
+    target, ok := s.routingTable.Resources[uri]
+    if !ok {
+        return nil, fmt.Errorf("resource not found: %s", uri)
+    }
+
+    client, ok := s.clients[target.WorkloadID]
+    if !ok {
+        return nil, fmt.Errorf("no client found for backend: %s", target.WorkloadID)
+    }
+
+    return client.ReadResource(ctx, target.OriginalURI)
+}
+
+func (s *defaultSession) Close() error {
+    s.mu.Lock()
+    defer s.mu.Unlock()
+
+    var errs []error
+    for id, c := range s.clients {
+        if err := c.Close(); err != nil {
+            errs = append(errs, fmt.Errorf("closing client %s: %w", id, err))
+        }
+    }
+    s.clients = nil
+    return errors.Join(errs...)
+}
+```
+
+#### 4. Wiring into the MCP SDK
+
+##### SDK Interface Constraint
+
+⚠️ **Critical Design Constraint**: The MCP SDK's `SessionIdManager` interface has a limitation:
+
+```go
+// From mark3labs/mcp-go SDK
+type SessionIdManager interface {
+    Generate() string  // ← No context parameter!
+    Validate(sessionID string) (isTerminated bool, err error)
+    Terminate(sessionID string) (isNotAllowed bool, err error)
+}
+```
+
+**Problem**: `Generate()` has no context, so it cannot access:
+- Request context (for identity, backends, etc.)
+- Discovered capabilities from middleware
+- Authentication information
+
+**Current workaround**: The existing `sessionIDAdapter.Generate()` creates an **empty session** (just a UUID), then `OnRegisterSession` hook populates it later. This is the "tangled flow" we're trying to fix.
+
+**Proposed Solution**: Discovery middleware creates the session, SDK retrieves the ID:
+
+**Flow**:
+1. **Discovery middleware** (has context) calls `SessionManager.CreateSession(ctx, identity, backends)`
+2. SessionManager uses SessionFactory to create fully-formed session and stores it
+3. SessionManager stores session ID in request context: `ctx = WithSessionID(ctx, sessionID)`
+4. Request proceeds to SDK
+5. **SDK calls `Generate()`** (no context) on SessionManager
+6. SessionManager retrieves session ID from thread-local storage or goroutine-local context (set by middleware)
+7. Returns the session ID - session is already fully initialized and stored
+
+**Implementation Options**:
+
+The SDK constraint requires one of these approaches:
+
+**Option A: Hybrid approach (minimal changes to current flow)**
+- Discovery middleware discovers capabilities and stores in context (current behavior)
+- `Generate()` creates session ID, stores empty session
+- `OnRegisterSession` hook retrieves capabilities from context and calls `SessionFactory.MakeSession()`
+- Replace empty session with fully-formed session in SessionManager
+- **Pros**: Works with SDK as-is, minimal risk
+- **Cons**: Still uses context-passing and hook pattern (not fully encapsulated)
+
+**Option B: Custom HTTP middleware wrapper**
+- Add HTTP middleware before SDK handler that has access to request context
+- Middleware calls `SessionManager.CreateSession(ctx, identity, backends)` to create fully-formed session
+- Stores session ID in request-scoped storage (e.g., goroutine-local or HTTP request metadata)
+- SDK calls `Generate()` which retrieves pre-created session ID from request-scoped storage
+- **Pros**: Session fully created before SDK sees request, better encapsulation
+- **Cons**: Requires careful goroutine/request scoping, potential race conditions
+
+**Option C: Fork/patch SDK** (not recommended)
+- Modify mark3labs/mcp-go SDK to pass context to `Generate(ctx context.Context)`
+- **Pros**: Clean interface, no workarounds
+- **Cons**: Upstream dependency, maintenance burden, delays implementation
+
+**Recommended**: **Option A** for Phase 1-2 implementation (pragmatic), consider **Option B** for Phase 3-4 (better encapsulation).
+
+**SessionManager** (showing Option B - idealized version):
+
+```go
+// SessionManager implements SDK session lifecycle and provides adapted capabilities.
+type SessionManager struct {
+    factory  SessionFactory
+    sessions sync.Map // map[string]Session
+}
+
+// --- Public API (called by HTTP middleware before SDK) ---
+
+// CreateSession is called by HTTP middleware (has full context) to create a fully-formed session
+func (m *SessionManager) CreateSession(ctx context.Context, identity *auth.Identity, backends []Backend) (string, error) {
+    session, err := m.factory.MakeSession(ctx, identity, backends)
+    if err != nil {
+        return "", err
+    }
+
+    m.sessions.Store(session.ID(), session)
+
+    // Store session ID in request-scoped storage for Generate() to retrieve
+    // Implementation detail: use context.WithValue or goroutine-local storage
+    setRequestSessionID(ctx, session.ID())
+
+    return session.ID(), nil
+}
+
+// --- SDK SessionIdManager interface ---
+
+func (m *SessionManager) Generate() string {  // ← Must match SDK signature (no ctx)
+    // Retrieve session ID from request-scoped storage
+    // (set by CreateSession in HTTP middleware)
+    sessionID := getRequestSessionID()
+    if sessionID == "" {
+        logger.Errorf("Generate() called but no session ID in request scope")
+        return ""
+    }
+
+    // Session already exists in m.sessions (created by CreateSession)
+    return sessionID
+}
+
+func (m *SessionManager) Terminate(sessionID string) (isNotAllowed bool, err error) {
+    val, ok := m.sessions.LoadAndDelete(sessionID)
+    if !ok {
+        return false, nil
+    }
+
+    session := val.(Session)
+    if err := session.Close(); err != nil {
+        logger.Warnf("error closing session %s: %v", sessionID, err)
+    }
+    return false, nil
+}
+
+// --- Adapt Session to SDK tool/resource registration ---
+
+func (m *SessionManager) GetAdaptedTools(sessionID string) []mcp.Tool {
+    val, ok := m.sessions.Load(sessionID)
+    if !ok {
+        return nil
+    }
+    session := val.(Session)
+
+    var sdkTools []mcp.Tool
+    for _, tool := range session.Tools() {
+        sdkTools = append(sdkTools, mcp.NewTool(
+            tool.Name,
+            m.createToolHandler(sessionID, tool.Name),
+            mcp.WithDescription(tool.Description),
+            mcp.WithToolInputSchema(tool.InputSchema),
+        ))
+    }
+    return sdkTools
+}
+
+func (m *SessionManager) createToolHandler(sessionID, toolName string) mcp.ToolHandler {
+    return func(ctx context.Context, req mcp.CallToolRequest) (*mcp.CallToolResult, error) {
+        val, ok := m.sessions.Load(sessionID)
+        if !ok {
+            return nil, fmt.Errorf("session not found: %s", sessionID)
+        }
+        session := val.(Session)
+
+        // Validate arguments type - must be map[string]any per MCP spec
+        args, ok := req.Params.Arguments.(map[string]any)
+        if !ok {
+            return mcp.NewToolResultError(fmt.Sprintf(
+                "invalid arguments type: expected map[string]any, got %T",
+                req.Params.Arguments,
+            )), nil
+        }
+
+        result, err := session.CallTool(ctx, toolName, args)
+        if err != nil {
+            return mcp.NewToolResultError(err.Error()), nil
+        }
+
+        return toSDKResult(result), nil
+    }
+}
+```
+
+**SDK Registration** (`pkg/vmcp/server/server.go`):
+
+```go
+// 1. Session lifecycle management
+streamableServer := server.NewStreamableHTTPServer(
+    mcpServer,
+    server.WithSessionIdManager(sessionManager),
+)
+
+// 2. In OnRegisterSession hook, register adapted tools/resources
+hooks.AddOnRegisterSession(func(ctx context.Context, sdkSession server.ClientSession) {
+    sessionID := sdkSession.SessionID()
+
+    mcpServer.AddSessionTools(sessionID, sessionManager.GetAdaptedTools(sessionID)...)
+    mcpServer.AddSessionResources(sessionID, sessionManager.GetAdaptedResources(sessionID)...)
+})
+```
+
+#### 5. Migration from Current Architecture
+
+**Phase 1**: Introduce interfaces and new implementation alongside existing code
+**Phase 2**: Update server initialization to use new SessionManager
+**Phase 3**: Remove old discovery middleware and httpBackendClient patterns
+**Phase 4**: Clean up deprecated code paths
+
+Details in Implementation Plan section below.
+
+#### 6. Error Handling
+
+**Session Creation Failures**:
+
+If `SessionFactory.MakeSession()` fails completely (e.g., all backends unreachable):
+- `SessionManager.Generate()` logs error and returns empty string `""`
+- SDK interprets empty session ID as failure (will not send `Mcp-Session-Id` header in response)
+- Client receives MCP initialization response without session ID, knows initialization failed
+- Client must retry initialization or report error to user
+
+**Rationale**: The SDK's `SessionIdManager.Generate()` interface returns `string`, not `(string, error)`. Returning an empty string is the signal for failure. This matches the existing `sessionIDAdapter` behavior (see `pkg/vmcp/server/session_adapter.go:64`).
+
+**Partial Backend Initialization**:
+
+If some backends fail during `MakeSession()`:
+- Log warnings for failed backends
+- Continue with successfully initialized backends
+- Session creation succeeds with partial backend set
+- Failed backends are not added to the clients map
+- Subsequent tool calls to failed backends return "no client found for backend X"
+- Health monitoring marks failed backends as unhealthy (existing behavior)
+
+**Tool Call Failures**:
+
+If a tool call fails after successful session creation:
+- Return error to client (existing behavior)
+- Client remains usable for subsequent requests
+- No automatic client re-initialization (clients live for session lifetime)
+- Health monitoring tracks backend health (existing behavior)
+
+**Client Closed Mid-Request**:
+
+Race condition exists: session may be terminated while a request is in flight:
+- Session `Close()` closes all backend clients
+- In-flight requests receive "client closed" error from MCP library
+- Mitigation: Session `Touch()` extends TTL on every request, reducing race window
+- Future enhancement: Add reference counting to delay `Close()` until in-flight requests complete
+
+**Session Not Found**:
+
+If client uses expired/invalid session ID:
+- Return clear error: "session not found" or "session expired"
+- Client should re-initialize via `/mcp/initialize` endpoint to create new session
+
+## Security Considerations
+
+### Threat Model
+
+No new security boundaries are introduced. This is a refactoring of existing session and client lifecycle management to encapsulate behavior in well-defined interfaces.
+
+### Authentication & Authorization
+
+**Incoming Authentication (Client → vMCP)**: No changes
+- Validate incoming token (existing)
+- Store identity in session (existing)
+
+**Outgoing Authentication (vMCP → Backend)**: Timing changes, not mechanism
+- Credentials resolved during session creation instead of per-request
+- Uses existing `OutgoingAuthRegistry` (same code path)
+- Identity context passed to `SessionFactory.MakeSession()`
+
+**Credential Lifecycle Considerations**:
+
+⚠️ **Short-lived credentials**: With session-scoped clients, short-lived outgoing credentials (e.g., expiring OAuth tokens) resolved at client creation could become stale mid-session.
+
+**Mitigation strategies** (implementation details for future work):
+1. **Per-request header injection**: Resolve credentials fresh for each request, inject into MCP client headers
+2. **Token refresh hooks**: Backend client library refreshes tokens automatically (if supported by `mcp-go`)
+3. **Client recreation on auth failure**: Detect 401/403 errors, recreate client with refreshed credentials
+4. **Session TTL alignment**: Set session TTL shorter than credential lifetime
+
+For initial implementation, we assume:
+- Most backends use long-lived credentials (API keys, client certificates)
+- Session TTLs (typically 30 minutes) are shorter than credential lifetimes
+- Per-request credential resolution is future enhancement if needed
+
+### Data Protection
+
+**Session Isolation**: Each session has its own client map. No cross-session data leakage risk.
+
+**Connection Security**: TLS configuration and certificate validation remain unchanged.
+
+### Concurrency & Resource Safety
+
+**Client Usage During Cleanup**:
+- Race condition exists: request may use client while session is being closed
+- Mitigation: Session `Touch()` extends TTL on every request, reducing race window
+- MCP client library handles `Close()` on active connections gracefully (returns errors)
+- Handlers should catch and handle "client closed" errors appropriately
+- Future Enhancement: Add reference counting to delay `Close()` until all in-flight requests complete
+
+### Secrets Management
+
+**Storage and Retrieval**: Outgoing auth secrets are retrieved via `OutgoingAuthRegistry` during client creation. The timing changes (session init vs first request) but the mechanism and storage are identical. No secrets are stored in session objects—only references to auth configurations.
+
+### Audit Logging
+
+**New Log Event**: Add audit log entry for client initialization during session setup:
+```json
+{
+  "event": "backend_client_initialized",
+  "session_id": "sess-123",
+  "workload_id": "github-mcp",
+  "timestamp": "2026-02-04T10:30:00Z"
+}
+```
+
+**Existing Events Unchanged**: Tool call logs, authentication logs, and session lifecycle logs remain the same.
+
+## Alternatives Considered
+
+### Alternative 1: Keep Per-Request Pattern with Connection Pooling
+
+**Approach**: Continue creating/closing clients per request but add a connection pool underneath to reuse TCP connections.
+
+**Pros**:
+- Minimal changes to existing code
+- Reduces TCP handshake overhead
+
+**Cons**:
+- Doesn't address MCP protocol initialization overhead (still happens per request)
+- Doesn't solve state preservation problem (each request still gets fresh MCP client)
+- Authentication still resolved and validated per request
+- Adds complexity at wrong layer
+
+**Decision**: Rejected. This addresses symptoms but not the root cause.
+
+### Alternative 2: Lazy Client Creation on First Use
+
+**Approach**: Create clients on first tool call to a backend, then store in session for reuse.
+
+**Pros**:
+- Avoids creating clients for unused backends
+- Delays initialization until needed
+
+**Cons**:
+- First tool call to each backend still has initialization latency
+- More complex state management (need to track which clients exist)
+- Doesn't leverage existing capability discovery phase
+- Inconsistent performance (first vs subsequent calls)
+
+**Decision**: Rejected. Complexity outweighs benefits. Capability discovery already knows which backends exist.
+
+### Alternative 3: Global Client Cache Across Sessions
+
+**Approach**: Share clients across all sessions using a global cache, keyed by backend + auth identity.
+
+**Pros**:
+- Maximum connection reuse
+- Lowest initialization overhead
+
+**Cons**:
+- **State Pollution**: Backend state (Playwright contexts, DB transactions) would leak across sessions
+- **Security Risk**: Potential for cross-session data exposure if keying is incorrect
+- **Complexity**: Requires complex cache eviction, client sanitization, and identity tracking
+- **Violates Session Isolation**: Sessions should be independent
+
+**Decision**: Rejected. Session isolation is a core requirement for vMCP. The security and complexity risks outweigh performance gains.
+
+## Compatibility
+
+### Backward Compatibility
+
+**External APIs**: No breaking changes.
+- The `/vmcp/v1/*` HTTP API remains unchanged
+- Clients see identical behavior (tools, resources, prompts work the same way)
+- Session lifecycle (initialize, tool calls, expiration) unchanged from client perspective
+
+**Internal APIs**:
+
+⚠️ **Breaking Changes**:
+
+1. **New `Session` interface replaces passive data container pattern**:
+   - Old: `VMCPSession` with getters/setters (`SetRoutingTable()`, `RoutingTable()`)
+   - New: `Session` interface with behavior methods (`CallTool()`, `Close()`)
+   - **Impact**: Code that directly accesses `VMCPSession` fields or calls setters must migrate
+   - **Migration**: Update callsites to use `Session` interface methods instead
+
+2. **Session creation flow changes**:
+   - Old: Discovery middleware → context → `OnRegisterSession` hook → session population
+   - New: `SessionFactory.MakeSession()` → fully-formed session
+   - **Impact**: Code that relies on `OnRegisterSession` hook timing or context-based capability passing must migrate
+   - **Migration**: Move logic into `SessionFactory` or `Session` implementation
+
+3. **Backend client retrieval changes**:
+   - Old: `httpBackendClient` creates clients per-request via `clientFactory()`
+   - New: Session owns clients, retrieved via `session.CallTool()` (encapsulated)
+   - **Impact**: Direct users of `httpBackendClient` must migrate (though most code should use `Session` interface)
+   - **Migration**: Replace direct backend client calls with session method calls
+
+**Migration Path**:
+- Phase 1: Introduce new interfaces alongside existing code
+- Phase 2: Update server initialization to use new `SessionManager` and `SessionFactory`
+- Phase 3: Migrate callsites from old patterns to new `Session` interface
+- Phase 4: Remove deprecated code (old `httpBackendClient` pattern, discovery middleware)
+
+**No external packages affected**: All changes are internal to ToolHive.
+
+### Forward Compatibility
+
+The interface-based design enables future enhancements:
+
+**Session Features**:
+- **Capability refresh**: `Session.RefreshCapabilities()` method for updating tools/resources mid-session
+- **Connection warming**: Pre-connect to backends during idle time
+- **Health-based initialization**: Skip unhealthy backends in `SessionFactory.MakeSession()`
+- **Credential refresh**: Add token refresh hooks in session implementation
+
+**Decorators and Composition**:
+
+The `Session` interface enables clean feature composition through decoration. This addresses complexity issues seen in previous feature additions:
+
+**Example: Optimizer-in-vMCP** (simplified approach vs [PR #3517](https://github.com/stacklok/toolhive/pull/3517)):
+```go
+// sessionWithOptimizer implements Session interface
+// It only exposes find_tool and call_tool, with semantic search over underlying tools
+type sessionWithOptimizer struct {
+    inner Session
+    vectorDB *VectorDB  // In-memory semantic search index
+}
+
+func (s *sessionWithOptimizer) CallTool(ctx context.Context, name string, args map[string]any) (*ToolResult, error) {
+    // If tool is "find_tool", perform semantic search over s.inner.Tools()
+    if name == "find_tool" {
+        return s.semanticSearch(args["query"].(string))
+    }
+
+    // Otherwise delegate to underlying session
+    return s.inner.CallTool(ctx, name, args)
+}
+
+// Assert that sessionWithOptimizer implements Session interface
+var _ Session = (*sessionWithOptimizer)(nil)
+```
+
+**Benefits**:
+1. **No SDK integration concerns**: Optimizer doesn't touch MCP protocol, just implements `Session` interface
+2. **Testable**: Write unit tests for optimizer without spinning up vMCP server
+3. **Minimal cognitive load**: Familiar decorator pattern, clear separation of concerns
+
+**Example: Session Capability Refresh** (simplified vs [PR #3642](https://github.com/stacklok/toolhive/pull/3642)):
+```go
+func (s *defaultSession) RefreshCapabilities(ctx context.Context) error {
+    // Re-discover capabilities from backends
+    capabilities, err := s.aggregator.AggregateCapabilities(ctx, s.backends)
+    if err != nil {
+        return err
+    }
+
+    // Update session state (encapsulated, no middleware coordination needed)
+    s.mu.Lock()
+    defer s.mu.Unlock()
+    s.routingTable = capabilities.RoutingTable
+    s.tools = capabilities.Tools
+
+    // Note: SDK tool registration would need separate refresh mechanism
+    // but session state update is simple and encapsulated
+    return nil
+}
+```
+
+**Other Decoration Examples**:
+- **Caching layer**: Wrap session to cache tool results for repeated calls
+- **Rate limiting**: Wrap session to enforce per-session rate limits
+- **Audit logging**: Wrap session to log all tool calls with detailed context
+- **Circuit breaker**: Wrap session to implement circuit breaker pattern per backend
+
+**Testing**:
+- Unit test session logic without spinning up full vMCP server
+- Mock `Session` interface for testing code that depends on sessions
+- Integration test `SessionFactory` with fake backends
+
+## Implementation Plan
+
+This implementation introduces new interfaces and gradually migrates from the current architecture to the proposed design. All phases are additive until Phase 4, ensuring low risk and easy rollback.
+
+### Phase 1: Introduce Core Interfaces and Factory
+
+**Goal**: Establish the `Session` and `SessionFactory` interfaces and provide a working implementation without changing existing flows.
+
+**New Files**:
+- `pkg/vmcp/session/session.go` - Define `Session` interface
+- `pkg/vmcp/session/factory.go` - Define `SessionFactory` interface and `defaultSessionFactory` implementation
+- `pkg/vmcp/session/default_session.go` - Implement `defaultSession` with client ownership and routing logic
+
+**Implementation Details**:
+```go
+// Session interface with behavior methods (not just getters/setters)
+type Session interface {
+    ID() string
+    Tools() []Tool
+    CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error)
+    // ... other methods
+    Close() error
+}
+
+// SessionFactory creates fully-formed sessions
+type SessionFactory interface {
+    MakeSession(ctx context.Context, identity *auth.Identity, backends []Backend) (Session, error)
+}
+
+// defaultSession owns clients and encapsulates routing
+type defaultSession struct {
+    id           string
+    routingTable *RoutingTable
+    clients      map[string]*Client
+    // ...
+}
+```
+
+**Key Design**: The `defaultSession` owns backend clients. When `SessionFactory.MakeSession()` runs:
+1. Aggregate capabilities from all backends
+2. Create and initialize one MCP client per backend
+3. Return session with pre-initialized clients map
+
+**Testing**:
+- Unit tests for `Session` interface methods (CallTool, ReadResource, Close)
+- Unit tests for `SessionFactory.MakeSession()` with mocked aggregator and client factory
+- Test partial backend initialization (some backends fail, session still created)
+- Test session closure closes all clients
+
+**Files Modified**: None (purely additive)
+
+### Phase 2: Introduce SessionManager and Wire to SDK
+
+**Goal**: Create `SessionManager` that uses `SessionFactory` and implements the SDK's `SessionIdManager` interface.
+
+**New Files**:
+- `pkg/vmcp/server/session_manager.go` - Implement `SessionManager` bridging domain (Session) to protocol (SDK)
+
+**Implementation Details**:
+```go
+type SessionManager struct {
+    factory  SessionFactory
+    sessions sync.Map // map[string]Session
+}
+
+// SDK SessionIdManager interface
+func (m *SessionManager) Generate(ctx context.Context) string {
+    identity, _ := auth.IdentityFromContext(ctx)
+    backends := getBackendsFromContext(ctx)
+
+    session, err := m.factory.MakeSession(ctx, identity, backends)
+    if err != nil {
+        logger.Errorf("session creation failed: %v", err)
+        return ""  // SDK will handle empty session ID gracefully
+    }
+
+    m.sessions.Store(session.ID(), session)
+    return session.ID()
+}
+
+func (m *SessionManager) Terminate(sessionID string) (isNotAllowed bool, err error) {
+    val, ok := m.sessions.LoadAndDelete(sessionID)
+    if !ok {
+        return false, nil
+    }
+
+    session := val.(Session)
+    if err := session.Close(); err != nil {
+        logger.Warnf("error closing session %s: %v", sessionID, err)
+    }
+    return false, nil
+}
+
+// Adapt Session to SDK tool registration
+func (m *SessionManager) GetAdaptedTools(sessionID string) []mcp.Tool {
+    val, ok := m.sessions.Load(sessionID)
+    if !ok {
+        logger.Warnf("GetAdaptedTools called for non-existent session: %s", sessionID)
+        return nil
+    }
+    session := val.(Session)
+    return adaptToolsForSDK(session.Tools(), m.createToolHandler(sessionID))
+}
+```
+
+**Integration**: Update `pkg/vmcp/server/server.go`:
+- Create `SessionManager` with `SessionFactory`
+- Pass `SessionManager` to SDK's `NewStreamableHTTPServer()` as `WithSessionIdManager()`
+- In `OnRegisterSession` hook, call `sessionManager.GetAdaptedTools()` to register tools
+
+**Testing**:
+- Integration tests: Create session via `SessionManager.Generate()`, verify session stored
+- Test tool handler routing: Call tool, verify session's `CallTool()` is invoked
+- Test session termination: Call `Terminate()`, verify session closed
+- Test SDK integration: Initialize session via HTTP `/mcp/initialize`, verify tools registered
+
+**Files Modified**:
+- `pkg/vmcp/server/server.go` - Add SessionManager creation and wiring
+
+### Phase 3: Migrate Existing Code to Use Session Interface
+
+**Goal**: Update callsites to use the new `Session` interface instead of accessing `VMCPSession` directly or using `httpBackendClient`.
+
+**Changes**:
+- Remove discovery middleware's context-passing pattern (capabilities no longer passed via context)
+- Remove `httpBackendClient` per-request client creation pattern (session owns clients now)
+- Update any code that called `VMCPSession.SetRoutingTable()` / `SetTools()` to use `Session` interface methods
+
+**Testing**:
+- End-to-end tests: Full vMCP workflow with multiple backends
+- Verify backend state preservation across tool calls (e.g., Playwright browser session maintains context)
+- High-throughput tests: Verify no connection leaks, clients reused across tool calls
+- Session expiration test: Verify TTL cleanup closes all clients
+
+**Files Modified**:
+- `pkg/vmcp/discovery/middleware.go` - Remove context-based capability passing (deprecated by SessionFactory)
+- `pkg/vmcp/client/client.go` - Remove `httpBackendClient` per-request creation (deprecated by Session ownership)
+- Any code directly accessing `VMCPSession` fields - Migrate to `Session` interface
+
+### Phase 4: Cleanup and Observability
+
+**Goal**: Remove deprecated code and add observability for session lifecycle.
+
+**Cleanup**:
+- Remove old `VMCPSession` data container implementation (replaced by `defaultSession`)
+- Remove old discovery middleware hooks (replaced by `SessionFactory`)
+- Remove old `httpBackendClient` patterns (replaced by session-owned clients)
+
+**Observability**:
+- Add audit log events for session creation with backend initialization results
+  ```json
+  {
+    "event": "session_created",
+    "session_id": "sess-123",
+    "backends_initialized": 3,
+    "backends_failed": 1
+  }
+  ```
+- Add metrics:
+  - `vmcp_session_backend_init_duration_seconds` (histogram by backend)
+  - `vmcp_session_backend_init_success_total` (counter by backend)
+  - `vmcp_session_backend_init_failure_total` (counter by backend, with reason label)
+  - `vmcp_session_tool_call_duration_seconds` (histogram, shows latency improvement)
+- Add distributed traces spanning session creation → first tool call
+
+**Testing**:
+- Verify audit logs contain session ID and backend initialization status
+- Verify metrics show client init success/failure rates per backend
+- Verify traces show reduced latency for subsequent tool calls (no handshake overhead)
+
+**Files Modified**:
+- `pkg/vmcp/session/default_session.go` - Add telemetry hooks
+- `pkg/vmcp/session/factory.go` - Add audit logging
+
+### Dependencies
+
+- Existing capability aggregation (`pkg/vmcp/aggregator`)
+- Existing client factory (`pkg/vmcp/client`)
+- Existing health monitoring system
+- MCP SDK (`mark3labs/mcp-go`)
+
+### Rollback Plan
+
+Each phase is independently testable and can be rolled back:
+
+- **Phase 1**: New interfaces unused, no behavior change
+- **Phase 2**: SessionManager wired but old code paths still present
+- **Phase 3**: Can revert to Phase 2 state (both implementations coexist)
+- **Phase 4**: Can keep deprecated code if needed (cleanup optional)
+
+### Testing Strategy
+
+**Unit Tests** (each phase):
+- Session interface methods (CallTool, ReadResource, Close)
+- SessionFactory capability discovery and client initialization
+- SessionManager SDK integration (Generate, Terminate)
+- Error handling (partial initialization, client closed, session not found)
+
+**Integration Tests** (Phase 2+):
+- Session creation via HTTP `/mcp/initialize` endpoint
+- Tool calls routed to correct backends via session
+- Session expiration triggers client closure
+- Multiple tool calls reuse same clients (no reconnection overhead)
+
+**End-to-End Tests** (Phase 3+):
+- Full vMCP workflow with multiple backends
+- Backend state preservation across tool calls (Playwright browser context, database transaction)
+- High-throughput scenarios (verify no connection leaks, reduced latency)
+- Session lifecycle (create, use, expire, cleanup)
+
+## Documentation
+
+**Architecture Documentation** (in `toolhive` repository):
+- Update `docs/arch/10-virtual-mcp-architecture.md`:
+  - Add section on session architecture with `Session` interface and `SessionFactory`
+  - Document the separation between domain logic (session creation) and protocol concerns (SDK wiring)
+  - Add sequence diagram showing new session creation flow: HTTP `/initialize` → `SessionFactory.MakeSession()` → `SessionManager.Generate()` → SDK registration
+  - Document client lifecycle: created during session init, reused for all requests, closed on session expiration
+- Update `docs/arch/02-core-concepts.md`:
+  - Add "Session" as a core concept with its responsibilities (owns clients, encapsulates routing, manages lifecycle)
+  - Clarify that sessions are domain objects, not just data containers
+
+**Code Documentation**:
+- Add package-level comments to `pkg/vmcp/session/`:
+  ```go
+  // Package session provides the core Session domain object and SessionFactory
+  // for vMCP. Sessions own backend MCP clients, encapsulate routing logic,
+  // and manage their own lifecycle.
+  //
+  // Key interfaces:
+  //   - Session: Domain object representing an active MCP session
+  //   - SessionFactory: Creates fully-formed sessions with all dependencies
+  //
+  // The SessionManager bridges between Session (domain) and the MCP SDK (protocol).
+  ```
+- Document `Session` interface methods with usage examples
+- Document `SessionFactory.MakeSession()` with capability discovery and client initialization details
+
+**Developer Guides**:
+- Add "Session Architecture" guide explaining:
+  - How sessions are created (SessionFactory)
+  - How sessions integrate with SDK (SessionManager)
+  - How to test session logic without spinning up a server
+  - How to decorate Session interface (e.g., optimizer-in-vmcp pattern)
+
+**Operational Guides**:
+- Update troubleshooting guide:
+  - Debugging session creation failures
+  - Investigating backend initialization errors
+  - Understanding "no client found" errors
+- Document observability:
+  - Audit logs: `session_created` event with backend init status
+  - Metrics: `vmcp_session_backend_init_duration_seconds`, `vmcp_session_tool_call_duration_seconds`
+  - Traces: Session creation span → first tool call span (shows reduced latency)
+- Add runbook for investigating client initialization failures
+
+## Open Questions
+
+Most major design questions have been resolved during RFC review. One implementation detail requires resolution:
+
+1. ~~Should clients be created eagerly or lazily?~~ → **Resolved: Eager initialization during `SessionFactory.MakeSession()`**
+2. ~~How should session concerns be organized?~~ → **Resolved: Encapsulate in `Session` interface (domain object), separate from SDK wiring (`SessionManager`)**
+3. ~~What happens if backend client initialization fails during session creation?~~ → **Resolved: Log warning, continue with partial initialization, failed backends not in clients map**
+4. ~~Should we share clients across sessions for efficiency?~~ → **Resolved: No, session isolation is critical (prevents state leakage, security risks)**
+5. ~~How to handle short-lived credentials with session-scoped clients?~~ → **Resolved: Document limitation, provide mitigation strategies (per-request header injection, token refresh hooks), assume most backends use long-lived credentials for initial implementation**
+6. ~~How should client closure be triggered?~~ → **Resolved: Session owns clients, `Session.Close()` called on expiration/termination via `SessionManager.Terminate()`**
+7. **How to work around SDK `Generate()` having no context parameter?** → **Requires resolution during Phase 2 implementation**
+   - SDK's `SessionIdManager.Generate()` has no context, cannot access identity/backends directly
+   - Three options documented: (A) Hybrid with OnRegisterSession hook, (B) Custom HTTP middleware wrapper, (C) Fork SDK
+   - **Recommendation**: Start with Option A (pragmatic), evaluate Option B for better encapsulation
+   - See "SDK Interface Constraint" section in Detailed Design for full analysis
+
+## References
+
+- [MCP Specification](https://modelcontextprotocol.io/specification/2025-06-18) - Model Context Protocol specification
+- [toolhive#3062](https://github.com/stacklok/toolhive/issues/3062) - Original issue: Per-request client creation overhead
+- [toolhive#2852](https://github.com/stacklok/toolhive/issues/2852) - Missing integration tests below server level
+- [mark3labs/mcp-go SDK](https://github.com/mark3labs/mcp-go) - MCP Go SDK used for backend clients
+- [Virtual MCP Architecture Documentation](https://github.com/stacklok/toolhive/blob/main/docs/arch/10-virtual-mcp-architecture.md) - Current vMCP architecture
+- Related PRs in ToolHive:
+  - [toolhive#1989](https://github.com/stacklok/toolhive/pull/1989) - Session management infrastructure (provides storage and cleanup hooks)
+  - [toolhive#3517](https://github.com/stacklok/toolhive/pull/3517) - Optimizer-in-vMCP implementation (motivates Session interface design)
+  - [toolhive#3312](https://github.com/stacklok/toolhive/pull/3312) - Optimizer-in-vMCP changes to server.go
+  - [toolhive#3642](https://github.com/stacklok/toolhive/pull/3642) - Discussion on session capability refresh (simplified by Session encapsulation)
+
+---
+
+## RFC Lifecycle
+
+<!-- This section is maintained by RFC reviewers -->
+
+### Review History
+
+| Date | Reviewer | Decision | Notes |
+|------|----------|----------|-------|
+| 2026-02-04 | @yrobla, @jerm-dro | Draft | Initial submission |
+
+### Implementation Tracking
+
+| Repository | PR | Status |
+|------------|-----|--------|
+| toolhive | - | Pending |

From ebf97f55f2d2798ca410d0b54e9d5402f6bbfb82 Mon Sep 17 00:00:00 2001
From: Yolanda Robla <info@ysoft.biz>
Date: Mon, 9 Feb 2026 17:02:15 +0100
Subject: [PATCH 3/8] rewrite style

---
 ...HV-0038-session-scoped-client-lifecycle.md | 419 ++++++------------
 1 file changed, 127 insertions(+), 292 deletions(-)

diff --git a/rfcs/THV-0038-session-scoped-client-lifecycle.md b/rfcs/THV-0038-session-scoped-client-lifecycle.md
index c96a25f..be40d31 100644
--- a/rfcs/THV-0038-session-scoped-client-lifecycle.md
+++ b/rfcs/THV-0038-session-scoped-client-lifecycle.md
@@ -129,57 +129,42 @@ This RFC proposes restructuring session architecture around two key interfaces:
 1. **SessionFactory**: Creates fully-formed sessions with all dependencies (capability discovery, client initialization, resource allocation)
 2. **Session**: A domain object that owns its resources, encapsulates routing logic, and manages its own lifecycle
 
+```mermaid
+sequenceDiagram
+    participant Client
+    participant DiscoveryMW as Discovery Middleware
+    participant SessionMgr as SessionManager
+    participant Factory as SessionFactory
+    participant Aggregator
+    participant Session
+    participant SDK as MCP SDK
+
+    Client->>DiscoveryMW: POST /mcp/initialize
+    DiscoveryMW->>SessionMgr: CreateSession(ctx, identity, backends)
+    SessionMgr->>Factory: MakeSession(ctx, identity, backends)
+    Factory->>Aggregator: AggregateCapabilities(ctx, backends)
+    Aggregator-->>Factory: {Capabilities, Clients}
+    Factory->>Session: new Session(capabilities, clients)
+    Session-->>Factory: session
+    Factory-->>SessionMgr: session
+    SessionMgr->>SessionMgr: Store session by ID
+    SessionMgr-->>DiscoveryMW: sessionID
+    DiscoveryMW->>SDK: Continue to SDK handler
+    SDK->>SessionMgr: Generate() // Returns pre-created session ID
+    SessionMgr-->>SDK: sessionID
+    SDK->>SessionMgr: OnRegisterSession(sessionID)
+    SessionMgr->>Session: Tools(), Resources()
+    Session-->>SessionMgr: capabilities
+    SessionMgr->>SDK: AddSessionTools(sessionID, tools)
+    SDK-->>Client: InitializeResult with Mcp-Session-Id header
 ```
-┌─────────────────────────────────────────────────────────────────┐
-│ SessionManager (implements SDK SessionIdManager)                │
-│                                                                  │
-│  - Uses SessionFactory to create sessions                       │
-│  - Stores sessions in sync.Map                                  │
-│  - Adapts Session interface to SDK tool/resource registration   │
-│                                                                  │
-│  Generate(ctx) string:                                          │
-│    1. Extract identity, backends from context                   │
-│    2. session := factory.MakeSession(ctx, identity, backends)   │
-│    3. Store session by ID                                       │
-│    4. Return session ID to SDK                                  │
-│                                                                  │
-│  Terminate(sessionID) error:                                    │
-│    1. Load and delete session                                   │
-│    2. Call session.Close()                                      │
-└─────────────────────────────────────────────────────────────────┘
-                          │
-                          │ creates
-                          ▼
-┌─────────────────────────────────────────────────────────────────┐
-│ SessionFactory.MakeSession(ctx, identity, backends)             │
-│                                                                  │
-│  1. Discover capabilities (aggregator.AggregateCapabilities)    │
-│  2. Create MCP clients for all backends (clientFactory)         │
-│  3. Return fully-formed Session with:                           │
-│     - ID, identity                                              │
-│     - Routing table, tools, resources                           │
-│     - Pre-initialized backend clients map                       │
-└─────────────────────────────────────────────────────────────────┘
-                          │
-                          │ returns
-                          ▼
-┌─────────────────────────────────────────────────────────────────┐
-│ Session (domain object)                                         │
-│                                                                  │
-│  - ID() string                                                  │
-│  - Tools() []Tool, Resources() []Resource                       │
-│                                                                  │
-│  - CallTool(ctx, name, args) (*ToolResult, error)               │
-│    → Routes to backend via routing table                        │
-│    → Uses pre-initialized client from clients map               │
-│                                                                  │
-│  - ReadResource(ctx, uri) (*ResourceResult, error)              │
-│  - GetPrompt(ctx, name, args) (*PromptResult, error)            │
-│                                                                  │
-│  - Close() error                                                │
-│    → Closes all backend clients                                 │
-└─────────────────────────────────────────────────────────────────┘
-```
+
+**Key Flow:**
+1. Discovery middleware intercepts initialize request, calls `SessionManager.CreateSession()`
+2. SessionManager uses SessionFactory to create fully-formed session (capabilities + clients)
+3. Session stored before SDK sees request
+4. SDK calls `Generate()` which returns pre-created session ID
+5. SDK registers session's tools/resources via `OnRegisterSession` hook
 
 ### Terminology Clarification
 
@@ -239,12 +224,13 @@ Clients are created once during `SessionFactory.MakeSession()` and owned by the
 
 #### 1. Core Interfaces
 
-**Session Interface** (`pkg/vmcp/session/session.go`):
+**Session Interface** - A domain object representing an active MCP session:
 
 ```go
 // Session represents an active MCP session with all its capabilities and resources.
 // This is a pure domain object - no protocol concerns. Can be tested without spinning up a server.
 type Session interface {
+    // Identity and metadata
     ID() string
     Identity() *auth.Identity
 
@@ -263,7 +249,13 @@ type Session interface {
 }
 ```
 
-**SessionFactory Interface** (`pkg/vmcp/session/factory.go`):
+**Key behaviors:**
+- **Owns backend clients**: Session stores pre-initialized MCP clients in an internal map
+- **Encapsulates routing**: `CallTool()` looks up tool in routing table, routes to correct backend client
+- **Manages lifecycle**: `Close()` iterates through all clients and closes them
+- **Thread-safe**: All methods protected by mutex for concurrent access
+
+**SessionFactory Interface** - Creates fully-formed sessions:
 
 ```go
 // SessionFactory creates fully-formed sessions from configuration and runtime inputs.
@@ -278,61 +270,53 @@ type SessionFactory interface {
 }
 ```
 
+**Responsibilities:**
+- Calls aggregator to discover capabilities from all backends
+- Receives both capabilities AND initialized clients from aggregator (reuse for efficiency)
+- Constructs session with routing table, tools, resources, clients
+- Returns fully-formed session ready for use
+
 #### 2. SessionFactory Implementation
 
-**defaultSessionFactory** (`pkg/vmcp/session/factory.go`):
+**Implementation approach** (`pkg/vmcp/session/factory.go`):
 
-```go
-type defaultSessionFactory struct {
-    aggregator    Aggregator       // Capability discovery
-    clientFactory ClientFactory    // Backend client creation
-    config        *Config
-}
+The default factory implementation follows this pattern:
 
-func (f *defaultSessionFactory) MakeSession(
-    ctx context.Context,
-    identity *auth.Identity,
-    backends []Backend,
-) (Session, error) {
-    // 1. Discover capabilities AND create clients (single pass)
-    //    The aggregator creates clients, queries capabilities, and returns both
-    result, err := f.aggregator.AggregateCapabilities(ctx, backends)
-    if err != nil {
-        return nil, fmt.Errorf("capability discovery failed: %w", err)
-    }
+1. **Single-pass discovery**: Call `aggregator.AggregateCapabilities(ctx, backends)`
+   - Aggregator creates MCP clients for each backend
+   - Queries capabilities (tools, resources, prompts) from each backend
+   - Resolves conflicts using configured strategy (prefix, priority, manual)
+   - Returns both aggregated capabilities AND initialized clients
 
-    // 2. Return fully-formed session with clients from discovery
-    //    No need to create new clients - we reuse the ones from capability discovery
-    return &defaultSession{
-        id:           uuid.New().String(),
-        identity:     identity,
-        routingTable: result.RoutingTable,
-        tools:        result.Tools,
-        resources:    result.Resources,
-        prompts:      result.Prompts,
-        clients:      result.Clients,  // ← Reused from discovery
-    }, nil
-}
-```
+2. **Client reuse**: Use clients from aggregator (don't create new ones)
+   - Avoids redundant TCP handshake + TLS negotiation + MCP initialization
+   - Clients used for discovery are already working and initialized
+
+3. **Return fully-formed session**: Construct session with all dependencies
+   - Session ID (UUID)
+   - User identity
+   - Routing table (maps tool names to backend workload IDs)
+   - Tools, resources, prompts (aggregated from all backends)
+   - Pre-initialized clients map (keyed by workload ID)
 
-**Key Design Decision**: Reuse the same client for capability discovery and session operations. The `aggregator.AggregateCapabilities()` creates clients to query backend capabilities and returns both the aggregated capabilities AND the initialized clients. The factory passes these clients directly to the session, avoiding redundant connection setup.
+**Key Design Decision**: The aggregator returns both capabilities and clients together. This enables the factory to pass initialized clients directly to the session without additional connection overhead.
 
-**Aggregator Return Type** (`pkg/vmcp/aggregator/types.go`):
+**Updated Aggregator Return Type**:
 ```go
 type AggregationResult struct {
     RoutingTable *RoutingTable
     Tools        []Tool
     Resources    []Resource
     Prompts      []Prompt
-    Clients      map[string]*Client  // Initialized clients keyed by workload ID
+    Clients      map[string]*Client  // ← Initialized clients from discovery
 }
 ```
 
-**Why this matters**: Creating a client requires TCP handshake, TLS negotiation, and MCP protocol initialization. By reusing clients from discovery, we eliminate this overhead and session creation is significantly faster.
+**Performance impact**: Reusing discovery clients eliminates ~100-500ms of connection overhead per backend (depending on network latency and TLS handshake time).
 
 #### 3. Session Implementation
 
-**defaultSession** (`pkg/vmcp/session/default_session.go`):
+**Internal structure** (`pkg/vmcp/session/default_session.go`):
 
 ```go
 type defaultSession struct {
@@ -345,64 +329,28 @@ type defaultSession struct {
     clients      map[string]*Client // backendID -> client
     mu           sync.RWMutex
 }
+```
 
-func (s *defaultSession) ID() string            { return s.id }
-func (s *defaultSession) Identity() *auth.Identity { return s.identity }
-func (s *defaultSession) Tools() []Tool         { return s.tools }
-func (s *defaultSession) Resources() []Resource { return s.resources }
-func (s *defaultSession) Prompts() []Prompt     { return s.prompts }
-
-func (s *defaultSession) CallTool(ctx context.Context, name string, args map[string]any) (*ToolResult, error) {
-    s.mu.RLock()
-    defer s.mu.RUnlock()
-
-    // Route to backend
-    target, ok := s.routingTable.Tools[name]
-    if !ok {
-        return nil, fmt.Errorf("tool not found: %s", name)
-    }
-
-    // Get pre-initialized client
-    client, ok := s.clients[target.WorkloadID]
-    if !ok {
-        return nil, fmt.Errorf("no client found for backend: %s", target.WorkloadID)
-    }
-
-    // Call backend with original (un-prefixed) tool name
-    return client.CallTool(ctx, target.OriginalName, args)
-}
-
-func (s *defaultSession) ReadResource(ctx context.Context, uri string) (*ResourceResult, error) {
-    s.mu.RLock()
-    defer s.mu.RUnlock()
+**Method behaviors:**
 
-    target, ok := s.routingTable.Resources[uri]
-    if !ok {
-        return nil, fmt.Errorf("resource not found: %s", uri)
-    }
+- **`CallTool(ctx, name, args)`**:
+  1. Look up tool in routing table to find target backend workload ID
+  2. Retrieve pre-initialized client from `clients` map
+  3. Call `client.CallTool(ctx, originalToolName, args)` on backend
+  4. Return result or error
 
-    client, ok := s.clients[target.WorkloadID]
-    if !ok {
-        return nil, fmt.Errorf("no client found for backend: %s", target.WorkloadID)
-    }
+  **Key**: Tool names may be prefixed (e.g., `github__create_pr`), but backend receives original name (`create_pr`)
 
-    return client.ReadResource(ctx, target.OriginalURI)
-}
+- **`ReadResource(ctx, uri)`**: Similar pattern - route via routing table, use pre-initialized client
 
-func (s *defaultSession) Close() error {
-    s.mu.Lock()
-    defer s.mu.Unlock()
+- **`Close()`**:
+  1. Lock session (exclusive access)
+  2. Iterate through all clients in `clients` map
+  3. Call `Close()` on each client
+  4. Collect errors and return combined error if any failures
+  5. Set `clients` to nil
 
-    var errs []error
-    for id, c := range s.clients {
-        if err := c.Close(); err != nil {
-            errs = append(errs, fmt.Errorf("closing client %s: %w", id, err))
-        }
-    }
-    s.clients = nil
-    return errors.Join(errs...)
-}
-```
+**Thread safety**: All methods use mutex (`sync.RWMutex`) for concurrent access. Read operations (CallTool, ReadResource) use read lock, Close uses write lock.
 
 #### 4. Wiring into the MCP SDK
 
@@ -464,126 +412,45 @@ The SDK constraint requires one of these approaches:
 
 **Recommended**: **Option A** for Phase 1-2 implementation (pragmatic), consider **Option B** for Phase 3-4 (better encapsulation).
 
-**SessionManager** (showing Option B - idealized version):
+##### SessionManager Design
+
+**SessionManager** (`pkg/vmcp/server/session_manager.go`) bridges domain logic to SDK protocol:
 
 ```go
-// SessionManager implements SDK session lifecycle and provides adapted capabilities.
 type SessionManager struct {
     factory  SessionFactory
     sessions sync.Map // map[string]Session
 }
+```
 
-// --- Public API (called by HTTP middleware before SDK) ---
-
-// CreateSession is called by HTTP middleware (has full context) to create a fully-formed session
-func (m *SessionManager) CreateSession(ctx context.Context, identity *auth.Identity, backends []Backend) (string, error) {
-    session, err := m.factory.MakeSession(ctx, identity, backends)
-    if err != nil {
-        return "", err
-    }
-
-    m.sessions.Store(session.ID(), session)
-
-    // Store session ID in request-scoped storage for Generate() to retrieve
-    // Implementation detail: use context.WithValue or goroutine-local storage
-    setRequestSessionID(ctx, session.ID())
-
-    return session.ID(), nil
-}
-
-// --- SDK SessionIdManager interface ---
-
-func (m *SessionManager) Generate() string {  // ← Must match SDK signature (no ctx)
-    // Retrieve session ID from request-scoped storage
-    // (set by CreateSession in HTTP middleware)
-    sessionID := getRequestSessionID()
-    if sessionID == "" {
-        logger.Errorf("Generate() called but no session ID in request scope")
-        return ""
-    }
-
-    // Session already exists in m.sessions (created by CreateSession)
-    return sessionID
-}
-
-func (m *SessionManager) Terminate(sessionID string) (isNotAllowed bool, err error) {
-    val, ok := m.sessions.LoadAndDelete(sessionID)
-    if !ok {
-        return false, nil
-    }
+**Key responsibilities:**
 
-    session := val.(Session)
-    if err := session.Close(); err != nil {
-        logger.Warnf("error closing session %s: %v", sessionID, err)
-    }
-    return false, nil
-}
+1. **Session creation**: `CreateSession(ctx, identity, backends)` - Used by discovery middleware
+   - Calls `factory.MakeSession()` to build fully-formed session
+   - Stores session in map
+   - Returns session ID (or error if creation fails)
 
-// --- Adapt Session to SDK tool/resource registration ---
+2. **SDK lifecycle** (implements `SessionIdManager`):
+   - `Generate() string` - Returns session ID (pre-created or newly created depending on option)
+   - `Validate(sessionID) (bool, error)` - Checks if session exists
+   - `Terminate(sessionID) (bool, error)` - Loads session, calls `Close()`, removes from map
 
-func (m *SessionManager) GetAdaptedTools(sessionID string) []mcp.Tool {
-    val, ok := m.sessions.Load(sessionID)
-    if !ok {
-        return nil
-    }
-    session := val.(Session)
-
-    var sdkTools []mcp.Tool
-    for _, tool := range session.Tools() {
-        sdkTools = append(sdkTools, mcp.NewTool(
-            tool.Name,
-            m.createToolHandler(sessionID, tool.Name),
-            mcp.WithDescription(tool.Description),
-            mcp.WithToolInputSchema(tool.InputSchema),
-        ))
-    }
-    return sdkTools
-}
+3. **SDK adaptation**:
+   - `GetAdaptedTools(sessionID)` - Converts session's tools to SDK format
+   - Creates tool handlers that delegate to `session.CallTool()`
+   - Wraps results in SDK types (`mcp.ToolResult`)
 
-func (m *SessionManager) createToolHandler(sessionID, toolName string) mcp.ToolHandler {
-    return func(ctx context.Context, req mcp.CallToolRequest) (*mcp.CallToolResult, error) {
-        val, ok := m.sessions.Load(sessionID)
-        if !ok {
-            return nil, fmt.Errorf("session not found: %s", sessionID)
-        }
-        session := val.(Session)
-
-        // Validate arguments type - must be map[string]any per MCP spec
-        args, ok := req.Params.Arguments.(map[string]any)
-        if !ok {
-            return mcp.NewToolResultError(fmt.Sprintf(
-                "invalid arguments type: expected map[string]any, got %T",
-                req.Params.Arguments,
-            )), nil
-        }
-
-        result, err := session.CallTool(ctx, toolName, args)
-        if err != nil {
-            return mcp.NewToolResultError(err.Error()), nil
-        }
-
-        return toSDKResult(result), nil
-    }
-}
-```
+**Handler pattern**: Tool handlers are closures that:
+- Load session from map by ID
+- Validate request arguments (type assertions)
+- Call `session.CallTool()` with validated args
+- Convert domain result to SDK format
+- Return MCP-formatted response
 
-**SDK Registration** (`pkg/vmcp/server/server.go`):
-
-```go
-// 1. Session lifecycle management
-streamableServer := server.NewStreamableHTTPServer(
-    mcpServer,
-    server.WithSessionIdManager(sessionManager),
-)
-
-// 2. In OnRegisterSession hook, register adapted tools/resources
-hooks.AddOnRegisterSession(func(ctx context.Context, sdkSession server.ClientSession) {
-    sessionID := sdkSession.SessionID()
-
-    mcpServer.AddSessionTools(sessionID, sessionManager.GetAdaptedTools(sessionID)...)
-    mcpServer.AddSessionResources(sessionID, sessionManager.GetAdaptedResources(sessionID)...)
-})
-```
+**SDK Registration Flow**:
+1. Create SessionManager with SessionFactory
+2. Pass to SDK as `WithSessionIdManager(sessionManager)`
+3. In `OnRegisterSession` hook: register tools via `sessionManager.GetAdaptedTools(sessionID)`
 
 #### 5. Migration from Current Architecture
 
@@ -924,57 +791,25 @@ type defaultSession struct {
 **New Files**:
 - `pkg/vmcp/server/session_manager.go` - Implement `SessionManager` bridging domain (Session) to protocol (SDK)
 
-**Implementation Details**:
+**Core structure**:
 ```go
 type SessionManager struct {
     factory  SessionFactory
     sessions sync.Map // map[string]Session
 }
-
-// SDK SessionIdManager interface
-func (m *SessionManager) Generate(ctx context.Context) string {
-    identity, _ := auth.IdentityFromContext(ctx)
-    backends := getBackendsFromContext(ctx)
-
-    session, err := m.factory.MakeSession(ctx, identity, backends)
-    if err != nil {
-        logger.Errorf("session creation failed: %v", err)
-        return ""  // SDK will handle empty session ID gracefully
-    }
-
-    m.sessions.Store(session.ID(), session)
-    return session.ID()
-}
-
-func (m *SessionManager) Terminate(sessionID string) (isNotAllowed bool, err error) {
-    val, ok := m.sessions.LoadAndDelete(sessionID)
-    if !ok {
-        return false, nil
-    }
-
-    session := val.(Session)
-    if err := session.Close(); err != nil {
-        logger.Warnf("error closing session %s: %v", sessionID, err)
-    }
-    return false, nil
-}
-
-// Adapt Session to SDK tool registration
-func (m *SessionManager) GetAdaptedTools(sessionID string) []mcp.Tool {
-    val, ok := m.sessions.Load(sessionID)
-    if !ok {
-        logger.Warnf("GetAdaptedTools called for non-existent session: %s", sessionID)
-        return nil
-    }
-    session := val.(Session)
-    return adaptToolsForSDK(session.Tools(), m.createToolHandler(sessionID))
-}
 ```
 
-**Integration**: Update `pkg/vmcp/server/server.go`:
-- Create `SessionManager` with `SessionFactory`
-- Pass `SessionManager` to SDK's `NewStreamableHTTPServer()` as `WithSessionIdManager()`
-- In `OnRegisterSession` hook, call `sessionManager.GetAdaptedTools()` to register tools
+**Key methods** (see "Wiring into the MCP SDK" section in Detailed Design for full behavior):
+- `Generate() string` - Creates session via factory or returns pre-created ID
+- `Terminate(sessionID) (bool, error)` - Closes session, removes from map
+- `GetAdaptedTools(sessionID) []mcp.Tool` - Converts session tools to SDK format
+- `CreateSession(ctx, identity, backends) (string, error)` - Used by discovery middleware
+
+**Integration steps**:
+1. Create `SessionManager` with `SessionFactory` dependency
+2. Pass to SDK as `WithSessionIdManager(sessionManager)`
+3. In `OnRegisterSession` hook: register tools via `GetAdaptedTools()`
+4. For SDK constraint workarounds, see "SDK Interface Constraint" in Detailed Design
 
 **Testing**:
 - Integration tests: Create session via `SessionManager.Generate()`, verify session stored

From d93c8c2dc26b5202c82845c9b67095a4031f7402 Mon Sep 17 00:00:00 2001
From: Yolanda Robla <info@ysoft.biz>
Date: Mon, 9 Feb 2026 17:15:50 +0100
Subject: [PATCH 4/8] rewrite style

---
 ...HV-0038-session-scoped-client-lifecycle.md | 359 +++++++-----------
 1 file changed, 139 insertions(+), 220 deletions(-)

diff --git a/rfcs/THV-0038-session-scoped-client-lifecycle.md b/rfcs/THV-0038-session-scoped-client-lifecycle.md
index be40d31..a7b39f9 100644
--- a/rfcs/THV-0038-session-scoped-client-lifecycle.md
+++ b/rfcs/THV-0038-session-scoped-client-lifecycle.md
@@ -48,20 +48,9 @@ This creates cognitive load: to understand "how does a session get created?", yo
 
 **2. VMCPSession is a Passive Data Container, Not a Domain Object**
 
-The existing `VMCPSession` is a struct with getters and setters, not an object with encapsulated behavior:
+The existing `VMCPSession` is a passive data container with getters and setters (e.g., `SetRoutingTable()`, `SetTools()`), not an object with encapsulated behavior.
 
-```go
-type VMCPSession struct {
-    routingTable *vmcp.RoutingTable
-    tools        []vmcp.Tool
-    mu           sync.RWMutex
-}
-
-func (s *VMCPSession) SetRoutingTable(rt *vmcp.RoutingTable) { ... }
-func (s *VMCPSession) RoutingTable() *vmcp.RoutingTable      { ... }
-```
-
-Problems with this approach:
+**Problems with this approach:**
 
 - **Data written in one place, read in another**: Routing table set in `OnRegisterSession`, but read by router via context, not from the session object
 - **No single source of truth**: Session state scattered across context values, `VMCPSession` struct, and transport layer's `StreamableSession`
@@ -72,11 +61,7 @@ Problems with this approach:
 The current design routes requests via **context**, not session object lookup:
 
 - Middleware stuffs capabilities into context before SDK sees request ([discovery/middleware.go:109-110](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/discovery/middleware.go#L109-L110))
-- Router is stateless - extracts routing table from context on every request ([router/default_router.go:54](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/router/default_router.go#L54)):
-  ```go
-  capabilities, ok := discovery.DiscoveredCapabilitiesFromContext(ctx)
-  target := capabilities.RoutingTable.Tools[toolName]
-  ```
+- Router is stateless - extracts routing table from context on every request ([router/default_router.go:54](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/router/default_router.go#L54)) by reading capabilities from context
 - Handler factory uses router to find backend target, then calls shared backend client ([adapter/handler_factory.go:103-125](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/server/adapter/handler_factory.go#L103-L125))
 
 **Flow**: `request → middleware (stuff ctx) → handler → router (read ctx) → backend client`
@@ -122,13 +107,54 @@ These architectural problems have cascading effects:
 
 ## Proposed Solution
 
-### High-Level Design
+### Current State: Tangled Flow
 
-This RFC proposes restructuring session architecture around two key interfaces:
+**Today's session creation** is scattered across middleware, adapter, and hooks:
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant DiscoveryMW as Discovery Middleware
+    participant SDK as MCP SDK
+    participant Adapter as sessionIDAdapter
+    participant Hook as OnRegisterSession
+    participant VMCPSess as VMCPSession
+
+    Client->>DiscoveryMW: POST /mcp/initialize
+    Note over DiscoveryMW: Capabilities discovered<br/>BEFORE session exists
+    DiscoveryMW->>DiscoveryMW: Discover capabilities
+    DiscoveryMW->>DiscoveryMW: Stuff into request context
+    DiscoveryMW->>SDK: Forward request
+    SDK->>Adapter: Generate() // No context access!
+    Note over Adapter: Creates EMPTY session<br/>(just UUID)
+    Adapter->>VMCPSess: Create empty session
+    Adapter-->>SDK: sessionID
+    SDK->>Hook: OnRegisterSession(ctx, sessionID)
+    Note over Hook: Fish capabilities<br/>from context
+    Hook->>Hook: Get capabilities from ctx
+    Hook->>VMCPSess: SetRoutingTable(capabilities)
+    Hook->>SDK: AddSessionTools()
+    SDK-->>Client: InitializeResult
+```
+
+**Problems:** No single component owns session creation. To understand how a session is created, you must trace through:
+1. Discovery middleware (triggers discovery, stores in context)
+2. SessionIDAdapter (creates empty session)
+3. OnRegisterSession hook (populates session from context)
+
+This makes testing difficult (can't create sessions without full server) and adding features complex (optimizer-in-vmcp required extensive server.go changes).
+
+---
+
+### Proposed Design
+
+This RFC proposes restructuring around two key interfaces:
 
 1. **SessionFactory**: Creates fully-formed sessions with all dependencies (capability discovery, client initialization, resource allocation)
 2. **Session**: A domain object that owns its resources, encapsulates routing logic, and manages its own lifecycle
 
+**Proposed session creation flow:**
+
 ```mermaid
 sequenceDiagram
     participant Client
@@ -180,33 +206,9 @@ This RFC uses the following terms consistently:
 
 **1. Session as a Domain Object**
 
-Today's `VMCPSession` is a passive data container:
-```go
-// Current: passive data container
-type VMCPSession struct {
-    routingTable *vmcp.RoutingTable
-    tools        []vmcp.Tool
-}
-func (s *VMCPSession) SetRoutingTable(rt *vmcp.RoutingTable) { ... }
-func (s *VMCPSession) RoutingTable() *vmcp.RoutingTable      { ... }
-```
-
-Proposed `Session` interface is an active domain object:
-```go
-// Proposed: active domain object
-type Session interface {
-    ID() string
-    Tools() []Tool
-    Resources() []Resource
-
-    // Routing logic encapsulated here
-    CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error)
-    ReadResource(ctx context.Context, uri string) (*ResourceResult, error)
-    GetPrompt(ctx context.Context, name string, arguments map[string]any) (*PromptResult, error)
+Today's `VMCPSession` is a passive data container with getters and setters (e.g., `SetRoutingTable()`, `RoutingTable()`), not an object with encapsulated behavior.
 
-    Close() error
-}
-```
+Proposed `Session` interface is an active domain object that owns backend clients and encapsulates routing logic. Instead of exposing internal state through setters, it provides behavior methods like `CallTool()` that look up the target in the routing table and invoke the appropriate backend client.
 
 **2. Decoupling Session Creation from SDK Wiring**
 
@@ -255,6 +257,8 @@ type Session interface {
 - **Manages lifecycle**: `Close()` iterates through all clients and closes them
 - **Thread-safe**: All methods protected by mutex for concurrent access
 
+**Separation of concerns**: The Session interface focuses on domain logic (capabilities, routing, client ownership). TTL management and storage are handled by the existing session storage layer (`pkg/transport/session`) - sessions are stored in a storage backend that automatically extends TTL on access via `Get()` and expires sessions based on configured TTL.
+
 **SessionFactory Interface** - Creates fully-formed sessions:
 
 ```go
@@ -301,56 +305,34 @@ The default factory implementation follows this pattern:
 
 **Key Design Decision**: The aggregator returns both capabilities and clients together. This enables the factory to pass initialized clients directly to the session without additional connection overhead.
 
-**Updated Aggregator Return Type**:
-```go
-type AggregationResult struct {
-    RoutingTable *RoutingTable
-    Tools        []Tool
-    Resources    []Resource
-    Prompts      []Prompt
-    Clients      map[string]*Client  // ← Initialized clients from discovery
-}
-```
+**Updated Aggregator Interface**:
+
+The aggregator must be updated to return clients alongside capabilities. Currently, `AggregateCapabilities()` returns only discovered capabilities (routing table, tools, resources, prompts). The updated version returns an `AggregationResult` struct that includes both capabilities **and** the initialized clients map (keyed by workload ID).
 
-**Performance impact**: Reusing discovery clients eliminates ~100-500ms of connection overhead per backend (depending on network latency and TLS handshake time).
+**Rationale**: The aggregator already creates clients internally to query backend capabilities. This change just makes it return those clients instead of discarding them. No additional connection overhead.
+
+**Performance impact**: Reusing discovery clients eliminates redundant connection setup (TCP handshake + TLS negotiation + MCP initialization), saving ~100-500ms per backend depending on network conditions.
 
 #### 3. Session Implementation
 
 **Internal structure** (`pkg/vmcp/session/default_session.go`):
 
-```go
-type defaultSession struct {
-    id           string
-    identity     *auth.Identity
-    routingTable *RoutingTable
-    tools        []Tool
-    resources    []Resource
-    prompts      []Prompt
-    clients      map[string]*Client // backendID -> client
-    mu           sync.RWMutex
-}
-```
+The default session implementation stores:
+- Session ID and user identity
+- Routing table mapping tool/resource names to backend workload IDs
+- Discovered capabilities (tools, resources, prompts)
+- Pre-initialized backend clients map (keyed by workload ID)
+- Mutex for thread-safe access
 
 **Method behaviors:**
 
-- **`CallTool(ctx, name, args)`**:
-  1. Look up tool in routing table to find target backend workload ID
-  2. Retrieve pre-initialized client from `clients` map
-  3. Call `client.CallTool(ctx, originalToolName, args)` on backend
-  4. Return result or error
-
-  **Key**: Tool names may be prefixed (e.g., `github__create_pr`), but backend receives original name (`create_pr`)
+- **`CallTool(ctx, name, args)`**: Looks up tool in routing table to find target backend, retrieves pre-initialized client from map, calls backend with original (un-prefixed) tool name. Tool names may be prefixed for conflict resolution (e.g., `github__create_pr`), but backend receives original name (`create_pr`).
 
-- **`ReadResource(ctx, uri)`**: Similar pattern - route via routing table, use pre-initialized client
+- **`ReadResource(ctx, uri)`**: Similar routing pattern - consults routing table, uses pre-initialized client.
 
-- **`Close()`**:
-  1. Lock session (exclusive access)
-  2. Iterate through all clients in `clients` map
-  3. Call `Close()` on each client
-  4. Collect errors and return combined error if any failures
-  5. Set `clients` to nil
+- **`Close()`**: Acquires write lock, iterates through all clients, calls `Close()` on each, collects errors, returns combined error.
 
-**Thread safety**: All methods use mutex (`sync.RWMutex`) for concurrent access. Read operations (CallTool, ReadResource) use read lock, Close uses write lock.
+**Thread safety**: Read operations (CallTool, ReadResource) use read locks; Close uses write lock.
 
 #### 4. Wiring into the MCP SDK
 
@@ -374,54 +356,36 @@ type SessionIdManager interface {
 
 **Current workaround**: The existing `sessionIDAdapter.Generate()` creates an **empty session** (just a UUID), then `OnRegisterSession` hook populates it later. This is the "tangled flow" we're trying to fix.
 
-**Proposed Solution**: Discovery middleware creates the session, SDK retrieves the ID:
-
-**Flow**:
-1. **Discovery middleware** (has context) calls `SessionManager.CreateSession(ctx, identity, backends)`
-2. SessionManager uses SessionFactory to create fully-formed session and stores it
-3. SessionManager stores session ID in request context: `ctx = WithSessionID(ctx, sessionID)`
-4. Request proceeds to SDK
-5. **SDK calls `Generate()`** (no context) on SessionManager
-6. SessionManager retrieves session ID from thread-local storage or goroutine-local context (set by middleware)
-7. Returns the session ID - session is already fully initialized and stored
+**The Challenge**: Ideally, discovery middleware (which has request context) would create the fully-formed session, and `Generate()` would just return the ID. However, Go lacks goroutine-local storage, so `Generate()` can't retrieve a pre-created session ID without unsafe patterns.
 
 **Implementation Options**:
 
-The SDK constraint requires one of these approaches:
+The SDK constraint requires choosing between these approaches:
 
-**Option A: Hybrid approach (minimal changes to current flow)**
+**Option A: Hybrid approach (minimal changes to current flow)** ⭐ **Recommended for Phase 1**
 - Discovery middleware discovers capabilities and stores in context (current behavior)
 - `Generate()` creates session ID, stores empty session
 - `OnRegisterSession` hook retrieves capabilities from context and calls `SessionFactory.MakeSession()`
 - Replace empty session with fully-formed session in SessionManager
-- **Pros**: Works with SDK as-is, minimal risk
+- **Pros**: Works with SDK as-is, minimal risk, no unsafe patterns
 - **Cons**: Still uses context-passing and hook pattern (not fully encapsulated)
 
-**Option B: Custom HTTP middleware wrapper**
-- Add HTTP middleware before SDK handler that has access to request context
-- Middleware calls `SessionManager.CreateSession(ctx, identity, backends)` to create fully-formed session
-- Stores session ID in request-scoped storage (e.g., goroutine-local or HTTP request metadata)
-- SDK calls `Generate()` which retrieves pre-created session ID from request-scoped storage
-- **Pros**: Session fully created before SDK sees request, better encapsulation
-- **Cons**: Requires careful goroutine/request scoping, potential race conditions
+**Option B: Upstream SDK change**
+- Contribute patch to mark3labs/mcp-go to add context parameter: `Generate(ctx context.Context) string`
+- Once merged, migrate to clean flow where middleware creates session via `CreateSession(ctx, ...)`
+- **Pros**: Clean interface, proper context propagation, no workarounds
+- **Cons**: Requires upstream coordination, blocks implementation until merged
 
-**Option C: Fork/patch SDK** (not recommended)
-- Modify mark3labs/mcp-go SDK to pass context to `Generate(ctx context.Context)`
-- **Pros**: Clean interface, no workarounds
-- **Cons**: Upstream dependency, maintenance burden, delays implementation
+**Option C: Fork SDK** (not recommended)
+- Maintain internal fork with `Generate(ctx context.Context)` signature
+- **Pros**: Clean interface immediately available
+- **Cons**: Maintenance burden, divergence from upstream, delayed security/bug fixes
 
-**Recommended**: **Option A** for Phase 1-2 implementation (pragmatic), consider **Option B** for Phase 3-4 (better encapsulation).
+**Recommendation**: **Option A** for initial implementation (Phase 1-2), then propose **Option B** as upstream improvement for Phase 3-4.
 
 ##### SessionManager Design
 
-**SessionManager** (`pkg/vmcp/server/session_manager.go`) bridges domain logic to SDK protocol:
-
-```go
-type SessionManager struct {
-    factory  SessionFactory
-    sessions sync.Map // map[string]Session
-}
-```
+**SessionManager** (`pkg/vmcp/server/session_manager.go`) bridges domain logic to SDK protocol. It stores a `SessionFactory` reference and a concurrent map of sessions (keyed by session ID).
 
 **Key responsibilities:**
 
@@ -463,15 +427,26 @@ Details in Implementation Plan section below.
 
 #### 6. Error Handling
 
-**Session Creation Failures**:
+**Session Creation Failures (Option A - Hybrid Approach)**:
+
+With the recommended Option A implementation:
 
-If `SessionFactory.MakeSession()` fails completely (e.g., all backends unreachable):
-- `SessionManager.Generate()` logs error and returns empty string `""`
-- SDK interprets empty session ID as failure (will not send `Mcp-Session-Id` header in response)
-- Client receives MCP initialization response without session ID, knows initialization failed
-- Client must retry initialization or report error to user
+1. **`Generate()` phase**: Creates empty session (UUID only), always succeeds
+   - Stores empty session in map
+   - Returns session ID to SDK
+   - SDK sends `Mcp-Session-Id` header in `InitializeResult`
 
-**Rationale**: The SDK's `SessionIdManager.Generate()` interface returns `string`, not `(string, error)`. Returning an empty string is the signal for failure. This matches the existing `sessionIDAdapter` behavior (see `pkg/vmcp/server/session_adapter.go:64`).
+2. **`OnRegisterSession` hook phase**: Calls `SessionFactory.MakeSession()` - failure possible here
+   - If `MakeSession()` fails completely (e.g., all backends unreachable):
+     - Log error with session ID and failure details
+     - Store error state in session metadata (e.g., `session.SetMetadata("init_error", err.Error())`)
+     - Keep session in map but mark as failed
+   - Subsequent tool calls to this session:
+     - Check session metadata for `init_error`
+     - Return clear error: "Session initialization failed: [details]"
+     - Client should delete session and re-initialize
+
+**Rationale**: Option A uses a two-phase creation (empty session + populate via hook). The SDK's `Generate()` must return a session ID, so failures during `MakeSession()` are handled by storing error state in the session object itself.
 
 **Partial Backend Initialization**:
 
@@ -496,7 +471,7 @@ If a tool call fails after successful session creation:
 Race condition exists: session may be terminated while a request is in flight:
 - Session `Close()` closes all backend clients
 - In-flight requests receive "client closed" error from MCP library
-- Mitigation: Session `Touch()` extends TTL on every request, reducing race window
+- Mitigation: Session storage layer extends TTL on every request (via existing `Get()` call which touches session), reducing race window
 - Future enhancement: Add reference counting to delay `Close()` until in-flight requests complete
 
 **Session Not Found**:
@@ -547,14 +522,23 @@ For initial implementation, we assume:
 
 **Client Usage During Cleanup**:
 - Race condition exists: request may use client while session is being closed
-- Mitigation: Session `Touch()` extends TTL on every request, reducing race window
+- Mitigation: Session storage layer extends TTL on every request (via existing `Get()` call which touches session), reducing race window
 - MCP client library handles `Close()` on active connections gracefully (returns errors)
 - Handlers should catch and handle "client closed" errors appropriately
 - Future Enhancement: Add reference counting to delay `Close()` until all in-flight requests complete
 
 ### Secrets Management
 
-**Storage and Retrieval**: Outgoing auth secrets are retrieved via `OutgoingAuthRegistry` during client creation. The timing changes (session init vs first request) but the mechanism and storage are identical. No secrets are stored in session objects—only references to auth configurations.
+**Storage and Retrieval**: Outgoing auth secrets are retrieved via `OutgoingAuthRegistry` during client creation. The timing changes (session init vs first request) but the mechanism and storage are identical.
+
+**Credential Lifetime**: While session objects don't directly store secrets, they hold initialized backend clients that **do** retain credentials in memory (bearer tokens, API keys, mTLS key material, Authorization headers). This is the same as current behavior—clients have always held credentials during their lifetime. Key safeguards:
+
+- **Scoped lifetime**: Credentials held only for session duration (typically 30 minutes with TTL)
+- **No persistence**: Credentials never written to disk, only in process memory
+- **Client cleanup**: `Session.Close()` closes all clients, allowing credential cleanup by MCP client library
+- **Existing protections**: MCP client library handles credential redaction in logs and error messages (inherited behavior)
+
+**No change in security posture**: Moving from per-request clients (short-lived credentials in memory) to session-scoped clients (longer-lived credentials in memory) increases credential lifetime but does not change exposure—credentials were always in memory during client existence. Session TTL limits this window.
 
 ### Audit Logging
 
@@ -674,53 +658,23 @@ The interface-based design enables future enhancements:
 The `Session` interface enables clean feature composition through decoration. This addresses complexity issues seen in previous feature additions:
 
 **Example: Optimizer-in-vMCP** (simplified approach vs [PR #3517](https://github.com/stacklok/toolhive/pull/3517)):
-```go
-// sessionWithOptimizer implements Session interface
-// It only exposes find_tool and call_tool, with semantic search over underlying tools
-type sessionWithOptimizer struct {
-    inner Session
-    vectorDB *VectorDB  // In-memory semantic search index
-}
 
-func (s *sessionWithOptimizer) CallTool(ctx context.Context, name string, args map[string]any) (*ToolResult, error) {
-    // If tool is "find_tool", perform semantic search over s.inner.Tools()
-    if name == "find_tool" {
-        return s.semanticSearch(args["query"].(string))
-    }
+The optimizer can be implemented as a `Session` decorator that wraps an underlying session and adds semantic search capabilities. The decorator:
+- Stores a reference to the inner `Session` and maintains an in-memory vector database
+- Implements the `Session` interface
+- Intercepts `CallTool()` - if tool name is "find_tool", performs semantic search over `inner.Tools()` and returns matching tools; otherwise delegates to `inner.CallTool()`
+- Delegates all other methods (`Tools()`, `Resources()`, `Close()`) directly to the inner session
 
-    // Otherwise delegate to underlying session
-    return s.inner.CallTool(ctx, name, args)
-}
+**Benefits vs [PR #3517](https://github.com/stacklok/toolhive/pull/3517) approach**:
+1. **No SDK integration concerns**: Optimizer doesn't touch MCP protocol or server.go hooks
+2. **Testable**: Unit test optimizer with mock inner session, no vMCP server required
+3. **Minimal cognitive load**: Standard decorator pattern, clear separation of concerns
 
-// Assert that sessionWithOptimizer implements Session interface
-var _ Session = (*sessionWithOptimizer)(nil)
-```
+**Example: Session Capability Refresh** (simplified vs [PR #3642](https://github.com/stacklok/toolhive/pull/3642)):
 
-**Benefits**:
-1. **No SDK integration concerns**: Optimizer doesn't touch MCP protocol, just implements `Session` interface
-2. **Testable**: Write unit tests for optimizer without spinning up vMCP server
-3. **Minimal cognitive load**: Familiar decorator pattern, clear separation of concerns
+With Session as a domain object, refreshing capabilities becomes straightforward: add a `RefreshCapabilities(ctx)` method that calls the aggregator to re-discover capabilities from backends, then updates the session's internal routing table and tools list under a write lock. The session state update is encapsulated—no middleware coordination needed.
 
-**Example: Session Capability Refresh** (simplified vs [PR #3642](https://github.com/stacklok/toolhive/pull/3642)):
-```go
-func (s *defaultSession) RefreshCapabilities(ctx context.Context) error {
-    // Re-discover capabilities from backends
-    capabilities, err := s.aggregator.AggregateCapabilities(ctx, s.backends)
-    if err != nil {
-        return err
-    }
-
-    // Update session state (encapsulated, no middleware coordination needed)
-    s.mu.Lock()
-    defer s.mu.Unlock()
-    s.routingTable = capabilities.RoutingTable
-    s.tools = capabilities.Tools
-
-    // Note: SDK tool registration would need separate refresh mechanism
-    // but session state update is simple and encapsulated
-    return nil
-}
-```
+**Note**: SDK tool registration would need a separate refresh mechanism, but the session domain logic is simple and self-contained.
 
 **Other Decoration Examples**:
 - **Caching layer**: Wrap session to cache tool results for repeated calls
@@ -728,10 +682,15 @@ func (s *defaultSession) RefreshCapabilities(ctx context.Context) error {
 - **Audit logging**: Wrap session to log all tool calls with detailed context
 - **Circuit breaker**: Wrap session to implement circuit breaker pattern per backend
 
-**Testing**:
-- Unit test session logic without spinning up full vMCP server
-- Mock `Session` interface for testing code that depends on sessions
-- Integration test `SessionFactory` with fake backends
+**Testing Benefits** (addresses [toolhive#2852](https://github.com/stacklok/toolhive/issues/2852)):
+
+With the Session interface, you can unit test session routing logic without spinning up a server:
+
+**Test approach**: Create a session instance directly with a mock client, configure the routing table with test data, call `session.CallTool()` with a prefixed tool name, and verify the session routes to the correct backend client with the original (un-prefixed) tool name.
+
+**Before this RFC**: Testing session behavior required spinning up the full vMCP server with HTTP handlers, SDK integration, discovery middleware, and context threading.
+
+**After this RFC**: Session is a testable domain object. Instantiate it directly with mocked dependencies (clients, routing table), test the routing and client delegation logic in isolation.
 
 ## Implementation Plan
 
@@ -747,31 +706,8 @@ This implementation introduces new interfaces and gradually migrates from the cu
 - `pkg/vmcp/session/default_session.go` - Implement `defaultSession` with client ownership and routing logic
 
 **Implementation Details**:
-```go
-// Session interface with behavior methods (not just getters/setters)
-type Session interface {
-    ID() string
-    Tools() []Tool
-    CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error)
-    // ... other methods
-    Close() error
-}
-
-// SessionFactory creates fully-formed sessions
-type SessionFactory interface {
-    MakeSession(ctx context.Context, identity *auth.Identity, backends []Backend) (Session, error)
-}
 
-// defaultSession owns clients and encapsulates routing
-type defaultSession struct {
-    id           string
-    routingTable *RoutingTable
-    clients      map[string]*Client
-    // ...
-}
-```
-
-**Key Design**: The `defaultSession` owns backend clients. When `SessionFactory.MakeSession()` runs:
+The `defaultSession` implementation owns backend clients in an internal map. When `SessionFactory.MakeSession()` runs:
 1. Aggregate capabilities from all backends
 2. Create and initialize one MCP client per backend
 3. Return session with pre-initialized clients map
@@ -791,19 +727,13 @@ type defaultSession struct {
 **New Files**:
 - `pkg/vmcp/server/session_manager.go` - Implement `SessionManager` bridging domain (Session) to protocol (SDK)
 
-**Core structure**:
-```go
-type SessionManager struct {
-    factory  SessionFactory
-    sessions sync.Map // map[string]Session
-}
-```
+**Core structure**: SessionManager stores a `SessionFactory` reference and a concurrent map of sessions (keyed by session ID).
 
 **Key methods** (see "Wiring into the MCP SDK" section in Detailed Design for full behavior):
-- `Generate() string` - Creates session via factory or returns pre-created ID
-- `Terminate(sessionID) (bool, error)` - Closes session, removes from map
-- `GetAdaptedTools(sessionID) []mcp.Tool` - Converts session tools to SDK format
-- `CreateSession(ctx, identity, backends) (string, error)` - Used by discovery middleware
+- `Generate() string` - Creates session via factory or returns pre-created ID (depending on SDK constraint workaround chosen)
+- `Terminate(sessionID) (bool, error)` - Loads session, calls `Close()`, removes from map
+- `GetAdaptedTools(sessionID) []mcp.Tool` - Retrieves session, converts its tools to SDK format with handlers
+- `CreateSession(ctx, identity, backends) (string, error)` - Creates fully-formed session via factory (used by discovery middleware)
 
 **Integration steps**:
 1. Create `SessionManager` with `SessionFactory` dependency
@@ -924,18 +854,7 @@ Each phase is independently testable and can be rolled back:
   - Clarify that sessions are domain objects, not just data containers
 
 **Code Documentation**:
-- Add package-level comments to `pkg/vmcp/session/`:
-  ```go
-  // Package session provides the core Session domain object and SessionFactory
-  // for vMCP. Sessions own backend MCP clients, encapsulate routing logic,
-  // and manage their own lifecycle.
-  //
-  // Key interfaces:
-  //   - Session: Domain object representing an active MCP session
-  //   - SessionFactory: Creates fully-formed sessions with all dependencies
-  //
-  // The SessionManager bridges between Session (domain) and the MCP SDK (protocol).
-  ```
+- Add package-level comments to `pkg/vmcp/session/` explaining that Session is a domain object (owns clients, encapsulates routing, manages lifecycle) and SessionFactory creates fully-formed sessions
 - Document `Session` interface methods with usage examples
 - Document `SessionFactory.MakeSession()` with capability discovery and client initialization details
 

From 986be265a000444e85aaa1dd734486a35be238c2 Mon Sep 17 00:00:00 2001
From: Yolanda Robla <info@ysoft.biz>
Date: Tue, 10 Feb 2026 10:17:26 +0100
Subject: [PATCH 5/8] changes from review

---
 ...HV-0038-session-scoped-client-lifecycle.md | 351 +++++++++++-------
 1 file changed, 209 insertions(+), 142 deletions(-)

diff --git a/rfcs/THV-0038-session-scoped-client-lifecycle.md b/rfcs/THV-0038-session-scoped-client-lifecycle.md
index a7b39f9..428ec49 100644
--- a/rfcs/THV-0038-session-scoped-client-lifecycle.md
+++ b/rfcs/THV-0038-session-scoped-client-lifecycle.md
@@ -1,4 +1,4 @@
-# THV-0038: Session-Scoped MCP Client Lifecycle Management
+# THV-0038: Session-Scoped Architecture for vMCP
 
 - **Status**: Draft
 - **Author(s)**: @yrobla, @jerm-dro
@@ -9,7 +9,13 @@
 
 ## Summary
 
-Refactor vMCP session architecture to encapsulate behavior in well-defined interfaces (`Session` and `SessionFactory`), decoupling session creation (domain logic) from SDK integration (protocol concerns). This restructuring naturally enables session-scoped MCP client lifecycle management, where clients are created once during initialization, reused throughout the session, and closed during cleanup. The new architecture simplifies testing, reduces connection overhead, enables stateful backend workflows, and makes future features (optimizer-in-vmcp, capability refresh) straightforward to implement through interface decoration.
+Refactor vMCP session architecture to encapsulate behavior in well-defined interfaces (`Session` and `SessionFactory`), decoupling session creation (domain logic) from SDK integration (protocol concerns).
+
+Restructuring the session management enables:
+
+- **Client reuse throughout the session**: Clients are created once during initialization, reused for all requests, and closed during cleanup, enabling stateful workflows (e.g., Playwright browser sessions, database transactions)
+- **Integration testing without full server**: Test session capabilities and routing logic in isolation without spinning up the complete vMCP server
+- **Simplified feature extensions**: Add capabilities like optimizer-in-vmcp and capability refresh through interface decoration, since protocol concerns are decoupled from how capabilities are calculated
 
 ## Problem Statement
 
@@ -40,9 +46,9 @@ The per-request client pattern is a symptom of a deeper architectural problem: *
 
 Session construction is spread across middleware, adapters, and hooks—tightly coupled to the MCP SDK's lifecycle callbacks:
 
-- **Discovery middleware** triggers capability discovery before the session exists, then stuffs results into request context ([discovery/middleware.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/discovery/middleware.go#L109-L110))
-- **Session ID adapter** creates an empty session when the SDK calls `Generate()`, with no access to discovered capabilities ([session_adapter.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/server/session_adapter.go#L50-L71))
-- **OnRegisterSession hook** fishes capabilities back out of context and populates the session ([server.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/server/server.go#L777-L850))
+- **Discovery middleware** triggers capability discovery before the session exists, then stuffs results into request context ([discovery/middleware.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/discovery/middleware.go))
+- **Session ID adapter** creates an empty session when the SDK calls `Generate()`, with no access to discovered capabilities ([session_adapter.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/server/session_adapter.go))
+- **OnRegisterSession hook** fishes capabilities back out of context and populates the session ([server.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/server/server.go))
 
 This creates cognitive load: to understand "how does a session get created?", you must trace through middleware, adapter callbacks, SDK hooks, and context threading across multiple files.
 
@@ -60,9 +66,9 @@ The existing `VMCPSession` is a passive data container with getters and setters
 
 The current design routes requests via **context**, not session object lookup:
 
-- Middleware stuffs capabilities into context before SDK sees request ([discovery/middleware.go:109-110](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/discovery/middleware.go#L109-L110))
-- Router is stateless - extracts routing table from context on every request ([router/default_router.go:54](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/router/default_router.go#L54)) by reading capabilities from context
-- Handler factory uses router to find backend target, then calls shared backend client ([adapter/handler_factory.go:103-125](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/server/adapter/handler_factory.go#L103-L125))
+- Middleware stuffs capabilities into context before SDK sees request ([discovery/middleware.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/discovery/middleware.go))
+- Router is stateless - extracts routing table from context on every request ([router/default_router.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/router/default_router.go)) by reading capabilities from context
+- Handler factory uses router to find backend target, then calls shared backend client ([adapter/handler_factory.go](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/server/adapter/handler_factory.go))
 
 **Flow**: `request → middleware (stuff ctx) → handler → router (read ctx) → backend client`
 
@@ -102,7 +108,7 @@ These architectural problems have cascading effects:
 - **Rewrite All Session Types**: Initial focus is on `VMCPSession`; other session types (ProxySession, SSESession) remain unchanged except for interface compliance
 - **Connection Pooling Within Clients**: Individual MCP clients may internally pool connections, but that's outside this RFC's scope
 - **Multi-Session Client Sharing**: Clients remain session-scoped and are not shared across sessions
-- **Lazy Backend Discovery**: Backend discovery remains eager (current behavior)
+- **Lazy Capability Discovery**: Capability discovery remains eager and static (i.e., done once at session creation, current behavior)
 - **Client Versioning**: Handling MCP protocol version negotiation is out of scope
 
 ## Proposed Solution
@@ -158,15 +164,13 @@ This RFC proposes restructuring around two key interfaces:
 ```mermaid
 sequenceDiagram
     participant Client
-    participant DiscoveryMW as Discovery Middleware
     participant SessionMgr as SessionManager
     participant Factory as SessionFactory
     participant Aggregator
     participant Session
     participant SDK as MCP SDK
 
-    Client->>DiscoveryMW: POST /mcp/initialize
-    DiscoveryMW->>SessionMgr: CreateSession(ctx, identity, backends)
+    Client->>SessionMgr: POST /mcp/initialize
     SessionMgr->>Factory: MakeSession(ctx, identity, backends)
     Factory->>Aggregator: AggregateCapabilities(ctx, backends)
     Aggregator-->>Factory: {Capabilities, Clients}
@@ -174,9 +178,8 @@ sequenceDiagram
     Session-->>Factory: session
     Factory-->>SessionMgr: session
     SessionMgr->>SessionMgr: Store session by ID
-    SessionMgr-->>DiscoveryMW: sessionID
-    DiscoveryMW->>SDK: Continue to SDK handler
-    SDK->>SessionMgr: Generate() // Returns pre-created session ID
+    SessionMgr->>SDK: Wire session to SDK
+    SDK->>SessionMgr: Generate() // Returns session ID
     SessionMgr-->>SDK: sessionID
     SDK->>SessionMgr: OnRegisterSession(sessionID)
     SessionMgr->>Session: Tools(), Resources()
@@ -186,11 +189,12 @@ sequenceDiagram
 ```
 
 **Key Flow:**
-1. Discovery middleware intercepts initialize request, calls `SessionManager.CreateSession()`
-2. SessionManager uses SessionFactory to create fully-formed session (capabilities + clients)
-3. Session stored before SDK sees request
-4. SDK calls `Generate()` which returns pre-created session ID
-5. SDK registers session's tools/resources via `OnRegisterSession` hook
+1. Initialize request triggers `SessionManager` to create session via `SessionFactory`
+2. SessionFactory creates clients, passes them to aggregator for capability discovery
+3. SessionFactory returns fully-formed session (capabilities + pre-initialized clients)
+4. Session stored in SessionManager before SDK registration
+5. SDK calls `Generate()` which returns the session ID
+6. SDK registers session's tools/resources via `OnRegisterSession` hook
 
 ### Terminology Clarification
 
@@ -208,7 +212,33 @@ This RFC uses the following terms consistently:
 
 Today's `VMCPSession` is a passive data container with getters and setters (e.g., `SetRoutingTable()`, `RoutingTable()`), not an object with encapsulated behavior.
 
-Proposed `Session` interface is an active domain object that owns backend clients and encapsulates routing logic. Instead of exposing internal state through setters, it provides behavior methods like `CallTool()` that look up the target in the routing table and invoke the appropriate backend client.
+Proposed `Session` interface is an active domain object that owns backend clients and encapsulates routing logic:
+
+```go
+type Session interface {
+    // Identity and metadata
+    ID() string
+    Identity() *auth.Identity
+
+    // Capabilities - returns discovered tools/resources for this session
+    Tools() []Tool
+    Resources() []Resource
+    Prompts() []Prompt
+
+    // MCP operations - routing logic is encapsulated here
+    CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error)
+    ReadResource(ctx context.Context, uri string) (*ResourceResult, error)
+    GetPrompt(ctx context.Context, name string, arguments map[string]any) (*PromptResult, error)
+
+    // Lifecycle
+    Close() error
+
+    // Initialization state - returns error if session initialization failed, nil if successful
+    InitError() error
+}
+```
+
+Instead of exposing internal state through setters, it provides behavior methods like `CallTool()` that look up the target in the routing table and invoke the appropriate backend client.
 
 **2. Decoupling Session Creation from SDK Wiring**
 
@@ -216,7 +246,12 @@ Today: Session construction is spread across middleware (discovery), adapter cal
 
 Proposed: Clean separation:
 - **SessionFactory**: Builds complete sessions (discovery + client initialization + routing setup)
-- **SessionManager**: Bridges between `SessionFactory` (domain) and MCP SDK (protocol)
+- **SessionManager**: Bridges between `SessionFactory` (domain) and MCP SDK (protocol) by:
+  - Implementing `SessionIdManager` interface (`Generate`, `Validate`, `Terminate`)
+  - Converting domain types to SDK types (Session's `Tools()` → `[]mcp.Tool`)
+  - Registering capabilities with SDK via `server.AddTool()`, `server.AddResource()`, `server.AddPrompt()`
+  - Creating handlers that delegate to `session.CallTool()`, `session.ReadResource()`, `session.GetPrompt()`
+  - See example in [`pkg/vmcp/server/server.go` (injectCapabilities)](https://github.com/stacklok/toolhive/blob/main/pkg/vmcp/server/server.go)
 
 **3. Pre-initialized Backend Clients**
 
@@ -248,14 +283,17 @@ type Session interface {
 
     // Lifecycle
     Close() error
+
+    // Initialization state - returns error if session initialization failed, nil if successful
+    InitError() error
 }
 ```
 
 **Key behaviors:**
 - **Owns backend clients**: Session stores pre-initialized MCP clients in an internal map
 - **Encapsulates routing**: `CallTool()` looks up tool in routing table, routes to correct backend client
-- **Manages lifecycle**: `Close()` iterates through all clients and closes them
-- **Thread-safe**: All methods protected by mutex for concurrent access
+- **Manageable lifetime**: `Close()` cleans up clients and any other resources. The SessionManager/caller is decoupled from what exactly happens on close()
+- **Thread-safe**: All methods protected by internal RWMutex for concurrent access. Read operations (Tools, Resources, Prompts, CallTool, ReadResource, GetPrompt) use read locks; write operations (Close) use write locks. Methods like `Tools()`, `Resources()`, `Prompts()` return defensive copies (not references to internal slices) to prevent caller mutations. Internal maps and slices are never exposed directly. Proper encapsulation may allow removal of mutexes elsewhere (e.g., SessionManager, current VMCPSession) since state is now owned by the Session
 
 **Separation of concerns**: The Session interface focuses on domain logic (capabilities, routing, client ownership). TTL management and storage are handled by the existing session storage layer (`pkg/transport/session`) - sessions are stored in a storage backend that automatically extends TTL on access via `Get()` and expires sessions based on configured TTL.
 
@@ -275,9 +313,9 @@ type SessionFactory interface {
 ```
 
 **Responsibilities:**
-- Calls aggregator to discover capabilities from all backends
-- Receives both capabilities AND initialized clients from aggregator (reuse for efficiency)
-- Constructs session with routing table, tools, resources, clients
+- Creates MCP clients for all backends (owns client lifecycle)
+- Passes clients to aggregator to discover capabilities
+- Constructs session with routing table, tools, resources, and the same pre-initialized clients
 - Returns fully-formed session ready for use
 
 #### 2. SessionFactory Implementation
@@ -286,30 +324,34 @@ type SessionFactory interface {
 
 The default factory implementation follows this pattern:
 
-1. **Single-pass discovery**: Call `aggregator.AggregateCapabilities(ctx, backends)`
-   - Aggregator creates MCP clients for each backend
-   - Queries capabilities (tools, resources, prompts) from each backend
-   - Resolves conflicts using configured strategy (prefix, priority, manual)
-   - Returns both aggregated capabilities AND initialized clients
+1. **Create MCP clients**: Initialize clients for each backend
+   - Factory creates one client per backend using existing client factory
+   - Clients are connection-ready but not yet used
 
-2. **Client reuse**: Use clients from aggregator (don't create new ones)
-   - Avoids redundant TCP handshake + TLS negotiation + MCP initialization
-   - Clients used for discovery are already working and initialized
+2. **Capability discovery**: Pass clients to `aggregator.AggregateCapabilities(ctx, clients)`
+   - Aggregator uses provided clients to query capabilities (tools, resources, prompts) from each backend
+   - Resolves conflicts using configured strategy (prefix, priority, manual)
+   - Returns aggregated capabilities (routing table, tools, resources, prompts)
+   - Aggregator does NOT own client lifecycle - just uses them for queries
 
 3. **Return fully-formed session**: Construct session with all dependencies
    - Session ID (UUID)
    - User identity
    - Routing table (maps tool names to backend workload IDs)
-   - Tools, resources, prompts (aggregated from all backends)
-   - Pre-initialized clients map (keyed by workload ID)
+   - Tools, resources, prompts (aggregated from backends)
+   - Pre-initialized clients map (keyed by workload ID) - same clients used for discovery
 
-**Key Design Decision**: The aggregator returns both capabilities and clients together. This enables the factory to pass initialized clients directly to the session without additional connection overhead.
+**Key Design Decision**: Clients are created once by the factory and threaded through: factory → aggregator (for discovery) → session (for reuse). This avoids redundant connection setup while maintaining clean separation of concerns (factory owns client creation, aggregator only queries capabilities).
 
 **Updated Aggregator Interface**:
 
-The aggregator must be updated to return clients alongside capabilities. Currently, `AggregateCapabilities()` returns only discovered capabilities (routing table, tools, resources, prompts). The updated version returns an `AggregationResult` struct that includes both capabilities **and** the initialized clients map (keyed by workload ID).
+The aggregator interface takes clients as input instead of creating them internally:
 
-**Rationale**: The aggregator already creates clients internally to query backend capabilities. This change just makes it return those clients instead of discarding them. No additional connection overhead.
+```go
+AggregateCapabilities(ctx context.Context, clients map[string]*Client) (*AggregationResult, error)
+```
+
+**Rationale**: Separates client lifecycle (factory's concern) from capability discovery (aggregator's concern). The factory creates clients once, passes them to aggregator for discovery, then passes same clients to session for reuse.
 
 **Performance impact**: Reusing discovery clients eliminates redundant connection setup (TCP handshake + TLS negotiation + MCP initialization), saving ~100-500ms per backend depending on network conditions.
 
@@ -322,7 +364,7 @@ The default session implementation stores:
 - Routing table mapping tool/resource names to backend workload IDs
 - Discovered capabilities (tools, resources, prompts)
 - Pre-initialized backend clients map (keyed by workload ID)
-- Mutex for thread-safe access
+- RWMutex for thread-safe access (read lock for queries/calls, write lock for Close)
 
 **Method behaviors:**
 
@@ -358,30 +400,19 @@ type SessionIdManager interface {
 
 **The Challenge**: Ideally, discovery middleware (which has request context) would create the fully-formed session, and `Generate()` would just return the ID. However, Go lacks goroutine-local storage, so `Generate()` can't retrieve a pre-created session ID without unsafe patterns.
 
-**Implementation Options**:
+**Implementation Approach**:
 
-The SDK constraint requires choosing between these approaches:
+Since `Generate()` lacks context access, we use a two-phase creation pattern:
 
-**Option A: Hybrid approach (minimal changes to current flow)** ⭐ **Recommended for Phase 1**
-- Discovery middleware discovers capabilities and stores in context (current behavior)
-- `Generate()` creates session ID, stores empty session
-- `OnRegisterSession` hook retrieves capabilities from context and calls `SessionFactory.MakeSession()`
-- Replace empty session with fully-formed session in SessionManager
-- **Pros**: Works with SDK as-is, minimal risk, no unsafe patterns
-- **Cons**: Still uses context-passing and hook pattern (not fully encapsulated)
+1. **Phase 1 - Session ID generation**: `Generate()` creates session ID, stores empty session placeholder
+2. **Phase 2 - Session population**: `OnRegisterSession` hook retrieves backends and identity from request context and calls `SessionFactory.MakeSession()` to create the fully-formed session, then replaces the placeholder
 
-**Option B: Upstream SDK change**
-- Contribute patch to mark3labs/mcp-go to add context parameter: `Generate(ctx context.Context) string`
-- Once merged, migrate to clean flow where middleware creates session via `CreateSession(ctx, ...)`
-- **Pros**: Clean interface, proper context propagation, no workarounds
-- **Cons**: Requires upstream coordination, blocks implementation until merged
+This hybrid approach:
+- Works with SDK as-is (no unsafe patterns like goroutine-local storage)
+- Reuses existing discovery middleware and context-passing patterns
+- Enables incremental migration without blocking on upstream changes
 
-**Option C: Fork SDK** (not recommended)
-- Maintain internal fork with `Generate(ctx context.Context)` signature
-- **Pros**: Clean interface immediately available
-- **Cons**: Maintenance burden, divergence from upstream, delayed security/bug fixes
-
-**Recommendation**: **Option A** for initial implementation (Phase 1-2), then propose **Option B** as upstream improvement for Phase 3-4.
+**Note**: While we could contribute a patch to mark3labs/mcp-go adding `Generate(ctx context.Context) string`, this would block implementation and isn't necessary—the two-phase pattern is pragmatic and maintains safety.
 
 ##### SessionManager Design
 
@@ -395,7 +426,7 @@ The SDK constraint requires choosing between these approaches:
    - Returns session ID (or error if creation fails)
 
 2. **SDK lifecycle** (implements `SessionIdManager`):
-   - `Generate() string` - Returns session ID (pre-created or newly created depending on option)
+   - `Generate() string` - Creates empty session placeholder, returns session ID (phase 1 of two-phase creation)
    - `Validate(sessionID) (bool, error)` - Checks if session exists
    - `Terminate(sessionID) (bool, error)` - Loads session, calls `Close()`, removes from map
 
@@ -412,13 +443,15 @@ The SDK constraint requires choosing between these approaches:
 - Return MCP-formatted response
 
 **SDK Registration Flow**:
-1. Create SessionManager with SessionFactory
-2. Pass to SDK as `WithSessionIdManager(sessionManager)`
+1. Construct SessionManager with SessionFactory as a dependency: `sessionManager := NewSessionManager(sessionFactory)`
+2. Pass SessionManager to SDK: `WithSessionIdManager(sessionManager)`
 3. In `OnRegisterSession` hook: register tools via `sessionManager.GetAdaptedTools(sessionID)`
 
 #### 5. Migration from Current Architecture
 
 **Phase 1**: Introduce interfaces and new implementation alongside existing code
+- **Note**: Phase 1 will be done incrementally across multiple PRs if necessary, reusing existing implementation pieces. This allows us to introduce the bulk of the code without having to worry about refactoring the existing system.
+
 **Phase 2**: Update server initialization to use new SessionManager
 **Phase 3**: Remove old discovery middleware and httpBackendClient patterns
 **Phase 4**: Clean up deprecated code paths
@@ -427,9 +460,9 @@ Details in Implementation Plan section below.
 
 #### 6. Error Handling
 
-**Session Creation Failures (Option A - Hybrid Approach)**:
+**Session Creation Failures**:
 
-With the recommended Option A implementation:
+With the two-phase creation pattern:
 
 1. **`Generate()` phase**: Creates empty session (UUID only), always succeeds
    - Stores empty session in map
@@ -439,14 +472,14 @@ With the recommended Option A implementation:
 2. **`OnRegisterSession` hook phase**: Calls `SessionFactory.MakeSession()` - failure possible here
    - If `MakeSession()` fails completely (e.g., all backends unreachable):
      - Log error with session ID and failure details
-     - Store error state in session metadata (e.g., `session.SetMetadata("init_error", err.Error())`)
-     - Keep session in map but mark as failed
+     - Create a failed session that stores the initialization error internally
+     - Keep session in map (tool calls will check `session.InitError()`)
    - Subsequent tool calls to this session:
-     - Check session metadata for `init_error`
-     - Return clear error: "Session initialization failed: [details]"
+     - Check `session.InitError()` before processing request
+     - If non-nil, return clear error: "Session initialization failed: [details from InitError()]"
      - Client should delete session and re-initialize
 
-**Rationale**: Option A uses a two-phase creation (empty session + populate via hook). The SDK's `Generate()` must return a session ID, so failures during `MakeSession()` are handled by storing error state in the session object itself.
+**Rationale**: The two-phase creation pattern (empty session + populate via hook) is necessary because the SDK's `Generate()` must return a session ID. Additionally, the SDK's `OnRegisterSession` hook does not allow returning an error, so failures during `MakeSession()` cannot be propagated directly. Instead, errors are stored in the session object via `InitError()`, and subsequent tool calls check this state before processing requests. The Session interface includes `InitError() error` to support this pattern.
 
 **Partial Backend Initialization**:
 
@@ -503,7 +536,7 @@ No new security boundaries are introduced. This is a refactoring of existing ses
 
 **Mitigation strategies** (implementation details for future work):
 1. **Per-request header injection**: Resolve credentials fresh for each request, inject into MCP client headers
-2. **Token refresh hooks**: Backend client library refreshes tokens automatically (if supported by `mcp-go`)
+2. **Manual token refresh**: Use `mcp-go`'s `OAuthHandler.RefreshToken()` to refresh OAuth tokens when they expire (note: mcp-go provides refresh capability but does not do it automatically - requires explicit calls)
 3. **Client recreation on auth failure**: Detect 401/403 errors, recreate client with refreshed credentials
 4. **Session TTL alignment**: Set session TTL shorter than credential lifetime
 
@@ -653,34 +686,15 @@ The interface-based design enables future enhancements:
 - **Health-based initialization**: Skip unhealthy backends in `SessionFactory.MakeSession()`
 - **Credential refresh**: Add token refresh hooks in session implementation
 
-**Decorators and Composition**:
+### Decorator Pattern and Extensibility
 
-The `Session` interface enables clean feature composition through decoration. This addresses complexity issues seen in previous feature additions:
-
-**Example: Optimizer-in-vMCP** (simplified approach vs [PR #3517](https://github.com/stacklok/toolhive/pull/3517)):
-
-The optimizer can be implemented as a `Session` decorator that wraps an underlying session and adds semantic search capabilities. The decorator:
-- Stores a reference to the inner `Session` and maintains an in-memory vector database
-- Implements the `Session` interface
-- Intercepts `CallTool()` - if tool name is "find_tool", performs semantic search over `inner.Tools()` and returns matching tools; otherwise delegates to `inner.CallTool()`
-- Delegates all other methods (`Tools()`, `Resources()`, `Close()`) directly to the inner session
+The Session interface enables clean feature composition through the decorator pattern. External components and features can be integrated by wrapping the base session implementation (see Appendix A for the optimizer integration example).
 
-**Benefits vs [PR #3517](https://github.com/stacklok/toolhive/pull/3517) approach**:
-1. **No SDK integration concerns**: Optimizer doesn't touch MCP protocol or server.go hooks
-2. **Testable**: Unit test optimizer with mock inner session, no vMCP server required
-3. **Minimal cognitive load**: Standard decorator pattern, clear separation of concerns
-
-**Example: Session Capability Refresh** (simplified vs [PR #3642](https://github.com/stacklok/toolhive/pull/3642)):
-
-With Session as a domain object, refreshing capabilities becomes straightforward: add a `RefreshCapabilities(ctx)` method that calls the aggregator to re-discover capabilities from backends, then updates the session's internal routing table and tools list under a write lock. The session state update is encapsulated—no middleware coordination needed.
-
-**Note**: SDK tool registration would need a separate refresh mechanism, but the session domain logic is simple and self-contained.
-
-**Other Decoration Examples**:
-- **Caching layer**: Wrap session to cache tool results for repeated calls
-- **Rate limiting**: Wrap session to enforce per-session rate limits
-- **Audit logging**: Wrap session to log all tool calls with detailed context
-- **Circuit breaker**: Wrap session to implement circuit breaker pattern per backend
+**Potential decorators**:
+- **Caching layer**: Cache tool results for repeated calls
+- **Rate limiting**: Enforce per-session rate limits
+- **Audit logging**: Log all tool calls with detailed context
+- **Capability refresh**: Re-discover capabilities and update routing table (simplified vs [PR #3642](https://github.com/stacklok/toolhive/pull/3642))
 
 **Testing Benefits** (addresses [toolhive#2852](https://github.com/stacklok/toolhive/issues/2852)):
 
@@ -694,11 +708,11 @@ With the Session interface, you can unit test session routing logic without spin
 
 ## Implementation Plan
 
-This implementation introduces new interfaces and gradually migrates from the current architecture to the proposed design. All phases are additive until Phase 4, ensuring low risk and easy rollback.
+This implementation introduces new interfaces and gradually migrates from the current architecture to the proposed design. Phases 1-2 are purely additive (ensuring low risk and easy rollback), while Phase 3 switches over and removes old code.
 
 ### Phase 1: Introduce Core Interfaces and Factory
 
-**Goal**: Establish the `Session` and `SessionFactory` interfaces and provide a working implementation without changing existing flows.
+**Goal**: Create the `Session` and `SessionFactory` interfaces and provide working implementations, but do not wire them up to the existing system yet (purely additive, no changes to existing flows).
 
 **New Files**:
 - `pkg/vmcp/session/session.go` - Define `Session` interface
@@ -708,9 +722,9 @@ This implementation introduces new interfaces and gradually migrates from the cu
 **Implementation Details**:
 
 The `defaultSession` implementation owns backend clients in an internal map. When `SessionFactory.MakeSession()` runs:
-1. Aggregate capabilities from all backends
-2. Create and initialize one MCP client per backend
-3. Return session with pre-initialized clients map
+1. Create and initialize one MCP client per backend
+2. Pass clients to aggregator to discover capabilities from all backends
+3. Return session with pre-initialized clients map (same clients used for discovery)
 
 **Testing**:
 - Unit tests for `Session` interface methods (CallTool, ReadResource, Close)
@@ -720,44 +734,63 @@ The `defaultSession` implementation owns backend clients in an internal map. Whe
 
 **Files Modified**: None (purely additive)
 
-### Phase 2: Introduce SessionManager and Wire to SDK
+### Phase 2: Wire Up New Code Path Behind Feature Flag
 
-**Goal**: Create `SessionManager` that uses `SessionFactory` and implements the SDK's `SessionIdManager` interface.
+**Goal**: Implement the new ideal code path using `SessionManager` + `SessionFactory` + `Session`, but hide it behind a `sessionManagementV2` config flag (defaults to disabled). Do not attempt to migrate/refactor existing code - write fresh ideal implementation.
 
 **New Files**:
 - `pkg/vmcp/server/session_manager.go` - Implement `SessionManager` bridging domain (Session) to protocol (SDK)
+- `pkg/vmcp/config/config.go` - Add `sessionManagementV2` feature flag (defaults to `false`)
 
-**Core structure**: SessionManager stores a `SessionFactory` reference and a concurrent map of sessions (keyed by session ID).
+**Feature Flag Integration**:
+Add conditional logic in server/hooks/middleware to toggle between old and new behavior:
+
+```go
+if config.SessionManagementV2 {
+    // New code path: Use SessionFactory + Session interface
+    sessionManager := NewSessionManager(sessionFactory)
+    server.WithSessionIdManager(sessionManager)
+    // ... new hooks using sessionManager.GetAdaptedTools()
+} else {
+    // Old code path: Existing VMCPSession + httpBackendClient + discovery middleware
+    // ... existing code unchanged
+}
+```
 
-**Key methods** (see "Wiring into the MCP SDK" section in Detailed Design for full behavior):
-- `Generate() string` - Creates session via factory or returns pre-created ID (depending on SDK constraint workaround chosen)
+**Key methods in SessionManager** (see "Wiring into the MCP SDK" section in Detailed Design for full behavior):
+- `Generate() string` - Creates empty session (two-phase creation pattern)
 - `Terminate(sessionID) (bool, error)` - Loads session, calls `Close()`, removes from map
 - `GetAdaptedTools(sessionID) []mcp.Tool` - Retrieves session, converts its tools to SDK format with handlers
-- `CreateSession(ctx, identity, backends) (string, error)` - Creates fully-formed session via factory (used by discovery middleware)
-
-**Integration steps**:
-1. Create `SessionManager` with `SessionFactory` dependency
-2. Pass to SDK as `WithSessionIdManager(sessionManager)`
-3. In `OnRegisterSession` hook: register tools via `GetAdaptedTools()`
-4. For SDK constraint workarounds, see "SDK Interface Constraint" in Detailed Design
+- `CreateSession(ctx, identity, backends) (string, error)` - Called from `OnRegisterSession` hook to create fully-formed session via factory
 
 **Testing**:
-- Integration tests: Create session via `SessionManager.Generate()`, verify session stored
+- Integration tests with feature flag enabled: Create session via `SessionManager.Generate()`, verify session stored
 - Test tool handler routing: Call tool, verify session's `CallTool()` is invoked
 - Test session termination: Call `Terminate()`, verify session closed
 - Test SDK integration: Initialize session via HTTP `/mcp/initialize`, verify tools registered
+- Verify old code path still works when flag is disabled (no regressions)
 
 **Files Modified**:
-- `pkg/vmcp/server/server.go` - Add SessionManager creation and wiring
+- `pkg/vmcp/server/server.go` - Add conditional logic based on `sessionManagementV2` flag
+- `pkg/vmcp/server/hooks.go` - Add conditional logic for new vs old session registration
+
+**Rationale**: Writing new ideal code behind a feature flag is easier than incrementally refactoring tangled existing code. Once validated, we can delete the old code path entirely (Phase 3) rather than spending cycles on gradual migration.
+
+### Phase 3: Validate, Switch Over, and Delete Old Code
 
-### Phase 3: Migrate Existing Code to Use Session Interface
+**Goal**: Validate the new code path, flip the default to enabled, and delete the old implementation entirely (no gradual migration).
 
-**Goal**: Update callsites to use the new `Session` interface instead of accessing `VMCPSession` directly or using `httpBackendClient`.
+**Validation and rollout**:
+1. Enable `sessionManagementV2` flag in CI to validate new code path
+2. Change `sessionManagementV2` default from `false` to `true` and release
+3. Let it bake in production and fix any bugs that arise
+4. Monitor metrics: session creation latency, tool call latency, connection counts, error rates
 
-**Changes**:
-- Remove discovery middleware's context-passing pattern (capabilities no longer passed via context)
-- Remove `httpBackendClient` per-request client creation pattern (session owns clients now)
-- Update any code that called `VMCPSession.SetRoutingTable()` / `SetTools()` to use `Session` interface methods
+**Cleanup** (after successful bake period with no rollbacks):
+- Remove old `VMCPSession` data container implementation (replaced by `defaultSession`)
+- Remove old discovery middleware context-passing pattern (replaced by `SessionFactory`)
+- Remove old `httpBackendClient` per-request creation pattern (replaced by session-owned clients)
+- Remove feature flag and conditional logic (only new code path remains)
 
 **Testing**:
 - End-to-end tests: Full vMCP workflow with multiple backends
@@ -766,18 +799,16 @@ The `defaultSession` implementation owns backend clients in an internal map. Whe
 - Session expiration test: Verify TTL cleanup closes all clients
 
 **Files Modified**:
-- `pkg/vmcp/discovery/middleware.go` - Remove context-based capability passing (deprecated by SessionFactory)
-- `pkg/vmcp/client/client.go` - Remove `httpBackendClient` per-request creation (deprecated by Session ownership)
-- Any code directly accessing `VMCPSession` fields - Migrate to `Session` interface
+- `pkg/vmcp/server/server.go` - Remove old code path and feature flag conditionals
+- `pkg/vmcp/discovery/middleware.go` - Delete (replaced by SessionFactory)
+- `pkg/vmcp/client/client.go` - Remove `httpBackendClient` (replaced by Session ownership)
+- Delete old `VMCPSession` implementation files
 
-### Phase 4: Cleanup and Observability
+**Rationale**: Once the new code path is validated, delete the old code entirely rather than maintaining both paths. This avoids technical debt and ongoing maintenance burden.
 
-**Goal**: Remove deprecated code and add observability for session lifecycle.
+### Phase 4: Add Observability
 
-**Cleanup**:
-- Remove old `VMCPSession` data container implementation (replaced by `defaultSession`)
-- Remove old discovery middleware hooks (replaced by `SessionFactory`)
-- Remove old `httpBackendClient` patterns (replaced by session-owned clients)
+**Goal**: Add comprehensive observability for session lifecycle (cleanup is done in Phase 3).
 
 **Observability**:
 - Add audit log events for session creation with backend initialization results
@@ -816,10 +847,10 @@ The `defaultSession` implementation owns backend clients in an internal map. Whe
 
 Each phase is independently testable and can be rolled back:
 
-- **Phase 1**: New interfaces unused, no behavior change
-- **Phase 2**: SessionManager wired but old code paths still present
-- **Phase 3**: Can revert to Phase 2 state (both implementations coexist)
-- **Phase 4**: Can keep deprecated code if needed (cleanup optional)
+- **Phase 1**: New interfaces unused, no behavior change. Can be safely ignored or deleted.
+- **Phase 2**: Feature flag `sessionManagementV2` defaults to `false`, old code path still active. No behavior change unless flag is explicitly enabled.
+- **Phase 3**: Can toggle `sessionManagementV2` back to `false` to revert to old code path. Once old code is deleted, rollback requires redeploying Phase 2 code.
+- **Phase 4**: Observability additions are purely additive, no rollback needed.
 
 ### Testing Strategy
 
@@ -878,7 +909,7 @@ Each phase is independently testable and can be rolled back:
 
 ## Open Questions
 
-Most major design questions have been resolved during RFC review. One implementation detail requires resolution:
+Most major design questions have been resolved during RFC review. One architectural question requires design during implementation:
 
 1. ~~Should clients be created eagerly or lazily?~~ → **Resolved: Eager initialization during `SessionFactory.MakeSession()`**
 2. ~~How should session concerns be organized?~~ → **Resolved: Encapsulate in `Session` interface (domain object), separate from SDK wiring (`SessionManager`)**
@@ -886,11 +917,25 @@ Most major design questions have been resolved during RFC review. One implementa
 4. ~~Should we share clients across sessions for efficiency?~~ → **Resolved: No, session isolation is critical (prevents state leakage, security risks)**
 5. ~~How to handle short-lived credentials with session-scoped clients?~~ → **Resolved: Document limitation, provide mitigation strategies (per-request header injection, token refresh hooks), assume most backends use long-lived credentials for initial implementation**
 6. ~~How should client closure be triggered?~~ → **Resolved: Session owns clients, `Session.Close()` called on expiration/termination via `SessionManager.Terminate()`**
-7. **How to work around SDK `Generate()` having no context parameter?** → **Requires resolution during Phase 2 implementation**
-   - SDK's `SessionIdManager.Generate()` has no context, cannot access identity/backends directly
-   - Three options documented: (A) Hybrid with OnRegisterSession hook, (B) Custom HTTP middleware wrapper, (C) Fork SDK
-   - **Recommendation**: Start with Option A (pragmatic), evaluate Option B for better encapsulation
-   - See "SDK Interface Constraint" section in Detailed Design for full analysis
+7. ~~How to work around SDK `Generate()` having no context parameter?~~ → **Resolved: Use two-phase creation pattern**
+   - SDK's `SessionIdManager.Generate()` has no context parameter
+   - Solution: Two-phase pattern where `Generate()` creates empty session, then `OnRegisterSession` hook calls `SessionFactory.MakeSession()` to populate it
+   - Works with SDK as-is, no unsafe patterns (e.g., goroutine-local storage)
+   - See "SDK Interface Constraint" section in Detailed Design for details
+
+8. **How does this architecture work with inter-session state like circuit breakers and health checks?** → **Requires design during Phase 1-2 implementation**
+   - **Current state**: Circuit breaker and health monitoring exist at backend level (across all sessions) in `pkg/vmcp/health`
+   - **Challenge**: Clients are now session-scoped, but circuit breaker state must be shared across sessions
+   - **Proposed approach**:
+     - Circuit breaker and health check state remains at **backend/aggregator level** (inter-session, shared state)
+     - During `SessionFactory.MakeSession()`, consult health monitor to exclude unhealthy/circuit-broken backends
+     - Individual sessions don't maintain circuit breaker state - they call through to a shared health monitor service
+     - When a tool call fails, session delegates error reporting to the health monitor which updates shared circuit breaker state
+     - Existing health monitoring system (`pkg/vmcp/health`) continues tracking backend health independently of sessions
+   - **Open design questions**:
+     - Should sessions created before a backend is circuit-broken continue using that backend's client, or should they dynamically check health state on each call?
+     - How to handle mid-session backend failures? (Current mitigation: session storage extends TTL, short TTL limits exposure)
+   - **Recommendation**: Defer detailed design to Phase 1-2 implementation, leverage existing health monitoring patterns where possible
 
 ## References
 
@@ -907,6 +952,27 @@ Most major design questions have been resolved during RFC review. One implementa
 
 ---
 
+## Appendix A: Optimizer Integration
+
+The session architecture directly enables integration of the MCP Optimizer into vMCP. Per [stacklok-epics#201](https://github.com/stacklok/stacklok-epics/issues/201), the standalone optimizer will be migrated into vMCP to address production requirements:
+- Support searching over ~150 tools with reasonable latency (<5s cold start)
+- Production-grade reliability (standalone optimizer crashes with 3+ servers, 20+ tools)
+
+### Integration Approach
+
+The optimizer will be integrated as a `Session` decorator that wraps the base session implementation. This decorator:
+- Stores a reference to the inner `Session` and maintains an in-memory vector database of tool embeddings
+- Implements the `Session` interface
+- Intercepts `CallTool()`:
+  - If tool name is `"find_tool"`, performs semantic search over `inner.Tools()` and returns matching tools
+  - If tool name is `"call_tool"`, delegates to appropriate backend via `inner.CallTool()` with the target tool name
+  - Otherwise, delegates directly to `inner.CallTool()`
+- Delegates all other methods (`Tools()`, `Resources()`, `Close()`) directly to the inner session
+
+This decorator pattern allows the optimizer to be added without modifying core session logic or SDK integration, demonstrating how the Session interface enables clean feature composition.
+
+---
+
 ## RFC Lifecycle
 
 <!-- This section is maintained by RFC reviewers -->
@@ -916,6 +982,7 @@ Most major design questions have been resolved during RFC review. One implementa
 | Date | Reviewer | Decision | Notes |
 |------|----------|----------|-------|
 | 2026-02-04 | @yrobla, @jerm-dro | Draft | Initial submission |
+| 2026-02-09 | @jerm-dro, Copilot | Under Review | Addressed PR #38 comments |
 
 ### Implementation Tracking
 

From 1d0cbe070db1c83f818b8be5383e03c181e939cd Mon Sep 17 00:00:00 2001
From: Yolanda Robla <info@ysoft.biz>
Date: Thu, 12 Feb 2026 10:46:01 +0100
Subject: [PATCH 6/8] fixes from review

---
 ...HV-0038-session-scoped-client-lifecycle.md | 305 ++++++++++++++----
 1 file changed, 248 insertions(+), 57 deletions(-)

diff --git a/rfcs/THV-0038-session-scoped-client-lifecycle.md b/rfcs/THV-0038-session-scoped-client-lifecycle.md
index 428ec49..41a8963 100644
--- a/rfcs/THV-0038-session-scoped-client-lifecycle.md
+++ b/rfcs/THV-0038-session-scoped-client-lifecycle.md
@@ -106,10 +106,12 @@ These architectural problems have cascading effects:
 ## Non-Goals
 
 - **Rewrite All Session Types**: Initial focus is on `VMCPSession`; other session types (ProxySession, SSESession) remain unchanged except for interface compliance
+- **Replace Existing Session Storage Architecture**: This RFC **builds on** the existing `transportsession.Manager` and `Storage` interface ([PR #1989](https://github.com/stacklok/toolhive/pull/1989), [PR #1677](https://github.com/stacklok/toolhive/pull/1677), [PR #1770](https://github.com/stacklok/toolhive/pull/1770), [PR #1771](https://github.com/stacklok/toolhive/pull/1771)) rather than creating a parallel storage path. Session storage continues to use the pluggable `Storage` interface designed for Redis/Valkey support.
 - **Connection Pooling Within Clients**: Individual MCP clients may internally pool connections, but that's outside this RFC's scope
 - **Multi-Session Client Sharing**: Clients remain session-scoped and are not shared across sessions
 - **Lazy Capability Discovery**: Capability discovery remains eager and static (i.e., done once at session creation, current behavior)
 - **Client Versioning**: Handling MCP protocol version negotiation is out of scope
+- **Transparent Horizontal Scaling**: Session-scoped MCP clients are held in Go process memory and cannot be serialized to Redis/Valkey. While session metadata can be stored in pluggable backends ([Issue #2417](https://github.com/stacklok/toolhive/issues/2417)), the MCP client objects themselves are in-process only. **Horizontal scaling of vMCP instances requires sticky sessions** (session affinity at load balancer). Transparent failover across instances is out of scope (clients must be recreated on instance switch, acceptable one-time cost).
 
 ## Proposed Solution
 
@@ -232,9 +234,6 @@ type Session interface {
 
     // Lifecycle
     Close() error
-
-    // Initialization state - returns error if session initialization failed, nil if successful
-    InitError() error
 }
 ```
 
@@ -283,9 +282,6 @@ type Session interface {
 
     // Lifecycle
     Close() error
-
-    // Initialization state - returns error if session initialization failed, nil if successful
-    InitError() error
 }
 ```
 
@@ -293,9 +289,15 @@ type Session interface {
 - **Owns backend clients**: Session stores pre-initialized MCP clients in an internal map
 - **Encapsulates routing**: `CallTool()` looks up tool in routing table, routes to correct backend client
 - **Manageable lifetime**: `Close()` cleans up clients and any other resources. The SessionManager/caller is decoupled from what exactly happens on close()
-- **Thread-safe**: All methods protected by internal RWMutex for concurrent access. Read operations (Tools, Resources, Prompts, CallTool, ReadResource, GetPrompt) use read locks; write operations (Close) use write locks. Methods like `Tools()`, `Resources()`, `Prompts()` return defensive copies (not references to internal slices) to prevent caller mutations. Internal maps and slices are never exposed directly. Proper encapsulation may allow removal of mutexes elsewhere (e.g., SessionManager, current VMCPSession) since state is now owned by the Session
+- **Thread-safe**: All methods protected by internal RWMutex for concurrent access to prevent data races on internal state. Read operations (Tools, Resources, Prompts, CallTool, ReadResource, GetPrompt) use read locks; write operations (Close) use write locks. Methods like `Tools()`, `Resources()`, `Prompts()` return defensive copies (not references to internal slices) to prevent caller mutations. Internal maps and slices are never exposed directly. Note: The RWMutex prevents data races but does not prevent `CallTool()` from using a closed client after `Close()` completes—the session implementation must track closed state explicitly (see "Client Closed Mid-Request" in Error Handling). Proper encapsulation may allow removal of mutexes elsewhere (e.g., SessionManager, current VMCPSession) since state is now owned by the Session
 
-**Separation of concerns**: The Session interface focuses on domain logic (capabilities, routing, client ownership). TTL management and storage are handled by the existing session storage layer (`pkg/transport/session`) - sessions are stored in a storage backend that automatically extends TTL on access via `Get()` and expires sessions based on configured TTL.
+**Separation of concerns**: The Session interface focuses on domain logic (capabilities, routing, client ownership). This RFC builds on the existing pluggable session storage architecture ([PR #1989](https://github.com/stacklok/toolhive/pull/1989)) which provides `Storage` interface and `Manager` for Redis/Valkey support. The architecture uses a **dual-layer model** (detailed in section "Session Architecture and Serializability"):
+- **Metadata layer** (serializable): Session ID, timestamps, identity reference - stored via `transportsession.Manager` + `Storage` interface (supports `LocalStorage` today, `RedisStorage` in future)
+- **Runtime layer** (in-memory only): MCP client objects, routing table, capabilities - cannot be serialized due to TCP connections and goroutines
+
+**Important notes**:
+- The existing `Storage.DeleteExpired()` only removes from storage without calling `Close()`, which would leak backend client connections. As part of this RFC, the session storage layer must be updated to call `session.Close()` before removing expired sessions from storage.
+- **Horizontal scaling**: Session metadata can be in Redis/Valkey, but MCP clients are in-process only. Horizontal scaling requires sticky sessions (session affinity at load balancer) to ensure requests route to the instance holding the client objects. Without sticky sessions, clients must be recreated on instance switch (acceptable one-time cost per [Issue #2417](https://github.com/stacklok/toolhive/issues/2417)).
 
 **SessionFactory Interface** - Creates fully-formed sessions:
 
@@ -324,8 +326,12 @@ type SessionFactory interface {
 
 The default factory implementation follows this pattern:
 
-1. **Create MCP clients**: Initialize clients for each backend
+1. **Create MCP clients**: Initialize clients for each backend **in parallel**
    - Factory creates one client per backend using existing client factory
+   - **Performance requirement**: Use parallel initialization (e.g., `errgroup` with bounded concurrency) to avoid sequential latency accumulation. With 20 backends at 100-500ms each, sequential would mean 2-10 seconds of session creation latency.
+   - **Bounded concurrency**: Limit parallel goroutines (e.g., 10 concurrent initializations) to avoid resource exhaustion
+   - **Per-backend timeout**: Apply context timeout (e.g., 5s per backend) so one slow backend doesn't block session creation
+   - **Partial initialization**: If some backends fail, log warnings and continue with successfully initialized backends (failed backends not added to clients map)
    - Clients are connection-ready but not yet used
 
 2. **Capability discovery**: Pass clients to `aggregator.AggregateCapabilities(ctx, clients)`
@@ -376,7 +382,85 @@ The default session implementation stores:
 
 **Thread safety**: Read operations (CallTool, ReadResource) use read locks; Close uses write lock.
 
-#### 4. Wiring into the MCP SDK
+#### 4. Session Architecture and Serializability
+
+**Dual-Layer Session Model**: This RFC builds on the existing session storage architecture introduced in [PR #1989](https://github.com/stacklok/toolhive/pull/1989) and [Issue #2417](https://github.com/stacklok/toolhive/issues/2417), which established a pluggable `Storage` interface designed for Redis/Valkey externalization. The session architecture follows a **two-layer model**:
+
+**Layer 1: Metadata (Serializable)** - Managed by `transportsession.Manager` + `Storage` interface:
+- Session ID (string)
+- Session type (string)
+- Creation timestamp
+- Last access timestamp (TTL tracking)
+- User identity reference
+- Backend workload IDs list
+- **Stored via**: `pkg/transport/session/storage.Storage` interface (`Store`, `Load`, `Delete`, `DeleteExpired`, `Close`)
+- **Backends**: `LocalStorage` (in-memory `sync.Map`) today, `RedisStorage` in future
+- **Serialization**: JSON format already implemented in `pkg/transport/session/serialization.go`
+
+**Layer 2: Runtime State (In-Memory Only)** - Managed by behavior-oriented `Session`:
+- Pre-initialized MCP client objects (contain TCP connections, goroutines)
+- Routing table (tool/resource name → backend workload ID mapping)
+- Discovered capabilities (tools, resources, prompts)
+- Closed state flag
+- **Cannot be serialized**: MCP clients contain active TCP connections and goroutine state that cannot be persisted
+
+**Composition Model**: How the two layers relate:
+
+```go
+// Existing storage-oriented session interface (pkg/transport/session/session.go)
+type Session interface {
+    ID() string
+    Type() string
+    Touch() time.Time
+    GetData() any
+}
+
+// New behavior-oriented vMCP session implementation
+type vmcpSession struct {
+    // Metadata layer - embeds transportsession.Session
+    transportSession transportsession.Session  // provides ID(), Type(), Touch(), GetData()
+
+    // Runtime layer - behavior and clients
+    identity       *auth.Identity
+    routingTable   map[string]string  // tool name → backend ID
+    capabilities   []Tool
+    clients        map[string]*mcp.Client  // backend ID → MCP client
+    closed         bool
+    mu             sync.RWMutex
+}
+
+// Implements both interfaces
+func (s *vmcpSession) ID() string { return s.transportSession.ID() }
+func (s *vmcpSession) Type() string { return s.transportSession.Type() }
+func (s *vmcpSession) Touch() time.Time { return s.transportSession.Touch() }
+func (s *vmcpSession) GetData() any { return s.transportSession.GetData() }
+
+func (s *vmcpSession) CallTool(ctx, name, args) (*ToolResult, error) { /* behavior */ }
+// ... other behavior methods
+```
+
+**Integration with `transportsession.Manager`**: The behavior-oriented `Session` is stored in `transportsession.Manager` (which uses the pluggable `Storage` interface), NOT in a separate `sync.Map`. This ensures:
+- Redis/Valkey support works automatically when `RedisStorage` is implemented
+- No parallel storage path that bypasses the existing architecture
+- Consistent TTL management and expiration across all session types
+
+**Distributed Deployment Considerations** ([Issue #2417](https://github.com/stacklok/toolhive/issues/2417)):
+
+1. **With sticky sessions (recommended)**:
+   - Load balancer routes client to same vMCP instance for session lifetime
+   - Session metadata in Redis (shared), clients in-memory (per-instance)
+   - Optimal performance: clients reused across requests
+
+2. **Without sticky sessions (graceful degradation)**:
+   - Request may route to different vMCP instance
+   - New instance loads session metadata from Redis
+   - **Clients must be recreated** (one-time initialization cost ~100-500ms per backend)
+   - Subsequent requests to same instance reuse clients
+   - Acceptable trade-off: horizontal scaling with temporary perf hit on instance switch
+
+**Serializability constraints**: Only session metadata can be stored in Redis/Valkey. Runtime state (MCP clients) cannot be serialized due to TCP connections and goroutines. This is a **fundamental technical limitation**, not an implementation choice.
+
+#### 5. Wiring into the MCP SDK
 
 ##### SDK Interface Constraint
 
@@ -396,9 +480,7 @@ type SessionIdManager interface {
 - Discovered capabilities from middleware
 - Authentication information
 
-**Current workaround**: The existing `sessionIDAdapter.Generate()` creates an **empty session** (just a UUID), then `OnRegisterSession` hook populates it later. This is the "tangled flow" we're trying to fix.
-
-**The Challenge**: Ideally, discovery middleware (which has request context) would create the fully-formed session, and `Generate()` would just return the ID. However, Go lacks goroutine-local storage, so `Generate()` can't retrieve a pre-created session ID without unsafe patterns.
+**The Challenge**: Due to the SDK's `Generate()` lacking context access, we cannot create fully-formed sessions in a single step. While not ideal, the well-encapsulated `Session` interface we're introducing makes it straightforward to swap out the SDK for a more ergonomic implementation in the future—this is a key benefit of decoupling domain logic from protocol concerns.
 
 **Implementation Approach**:
 
@@ -416,37 +498,62 @@ This hybrid approach:
 
 ##### SessionManager Design
 
-**SessionManager** (`pkg/vmcp/server/session_manager.go`) bridges domain logic to SDK protocol. It stores a `SessionFactory` reference and a concurrent map of sessions (keyed by session ID).
+**SessionManager** (`pkg/vmcp/server/session_manager.go`) bridges domain logic (behavior-oriented `Session`) to SDK protocol. It **delegates session storage to the existing `transportsession.Manager`**, which uses the pluggable `Storage` interface ([PR #1989](https://github.com/stacklok/toolhive/pull/1989)).
+
+**Architecture**:
+```go
+type sessionManager struct {
+    factory         SessionFactory
+    sessionStorage  *transportsession.Manager  // Uses Storage interface
+}
+
+func NewSessionManager(factory SessionFactory, storage transportsession.Storage) *sessionManager {
+    return &sessionManager{
+        factory:        factory,
+        sessionStorage: transportsession.NewManagerWithStorage(storage),
+    }
+}
+```
 
 **Key responsibilities:**
 
 1. **Session creation**: `CreateSession(ctx, identity, backends)` - Used by discovery middleware
-   - Calls `factory.MakeSession()` to build fully-formed session
-   - Stores session in map
+   - Calls `factory.MakeSession()` to build fully-formed behavior-oriented session
+   - Wraps session in `transportsession.Session` (metadata layer)
+   - Stores via `sessionStorage.AddSession()` (uses `Storage` interface)
    - Returns session ID (or error if creation fails)
 
 2. **SDK lifecycle** (implements `SessionIdManager`):
-   - `Generate() string` - Creates empty session placeholder, returns session ID (phase 1 of two-phase creation)
-   - `Validate(sessionID) (bool, error)` - Checks if session exists
-   - `Terminate(sessionID) (bool, error)` - Loads session, calls `Close()`, removes from map
+   - `Generate() string` - Creates empty session placeholder, stores via `sessionStorage.AddSession()`, returns session ID (phase 1 of two-phase creation)
+   - `Validate(sessionID) (bool, error)` - Checks if session exists via `sessionStorage.Get(sessionID)`
+   - `Terminate(sessionID) (bool, error)` - Loads session via `sessionStorage.Get()`, calls `Close()` on behavior-oriented session, removes via `sessionStorage.Delete()`
 
 3. **SDK adaptation**:
-   - `GetAdaptedTools(sessionID)` - Converts session's tools to SDK format
+   - `GetAdaptedTools(sessionID)` - Loads session via `sessionStorage.Get()`, converts tools to SDK format
    - Creates tool handlers that delegate to `session.CallTool()`
    - Wraps results in SDK types (`mcp.ToolResult`)
 
+**Why use `transportsession.Manager` instead of `sync.Map`?**
+- **Pluggable storage**: Automatically supports `LocalStorage` (in-memory) and future `RedisStorage` (distributed) without code changes
+- **No parallel storage path**: All sessions go through the same `Storage` interface, avoiding architectural debt
+- **Consistent TTL**: `sessionStorage.Get()` automatically extends TTL via `Touch()` (existing behavior)
+- **Expiration cleanup**: `sessionStorage.DeleteExpired()` (updated to call `session.Close()` before deletion) works uniformly
+
 **Handler pattern**: Tool handlers are closures that:
-- Load session from map by ID
+- Load session from `sessionStorage` by ID
+- Extract behavior-oriented session from metadata layer (`GetData()`)
 - Validate request arguments (type assertions)
 - Call `session.CallTool()` with validated args
 - Convert domain result to SDK format
 - Return MCP-formatted response
 
 **SDK Registration Flow**:
-1. Construct SessionManager with SessionFactory as a dependency: `sessionManager := NewSessionManager(sessionFactory)`
+1. Construct SessionManager with factory and storage: `sessionManager := NewSessionManager(sessionFactory, localStorage)`
 2. Pass SessionManager to SDK: `WithSessionIdManager(sessionManager)`
 3. In `OnRegisterSession` hook: register tools via `sessionManager.GetAdaptedTools(sessionID)`
 
+**Redis/Valkey support**: When `RedisStorage` is implemented, just pass it to `NewSessionManager()` instead of `LocalStorage`. Session metadata goes to Redis, runtime state (clients) remains in-memory. No other changes needed.
+
 #### 5. Migration from Current Architecture
 
 **Phase 1**: Introduce interfaces and new implementation alongside existing code
@@ -472,14 +579,15 @@ With the two-phase creation pattern:
 2. **`OnRegisterSession` hook phase**: Calls `SessionFactory.MakeSession()` - failure possible here
    - If `MakeSession()` fails completely (e.g., all backends unreachable):
      - Log error with session ID and failure details
-     - Create a failed session that stores the initialization error internally
-     - Keep session in map (tool calls will check `session.InitError()`)
-   - Subsequent tool calls to this session:
-     - Check `session.InitError()` before processing request
-     - If non-nil, return clear error: "Session initialization failed: [details from InitError()]"
-     - Client should delete session and re-initialize
+     - Create session with **empty capabilities** (no tools, resources, or prompts)
+     - SDK registers empty capability list, client sees "no tools available"
+     - Keep session in map (allows graceful handling without special error checks)
+   - Client experience:
+     - Session exists but has no capabilities
+     - Client can query session but sees empty tool list
+     - Client may delete session and re-initialize, or backend recovery may populate capabilities later
 
-**Rationale**: The two-phase creation pattern (empty session + populate via hook) is necessary because the SDK's `Generate()` must return a session ID. Additionally, the SDK's `OnRegisterSession` hook does not allow returning an error, so failures during `MakeSession()` cannot be propagated directly. Instead, errors are stored in the session object via `InitError()`, and subsequent tool calls check this state before processing requests. The Session interface includes `InitError() error` to support this pattern.
+**Rationale**: The two-phase creation pattern (empty session + populate via hook) is necessary because the SDK's `Generate()` must return a session ID. Additionally, the SDK's `OnRegisterSession` hook does not allow returning an error, so failures during `MakeSession()` cannot be propagated directly. Instead of storing initialization errors that require checks in every method (`InitError()` pattern is error-prone - easy to forget checks when adding new methods), we create sessions with empty capabilities. Failed backends don't advertise tools/resources, so clients never try to call them. This is simpler and safer than requiring defensive `InitError()` checks throughout the codebase.
 
 **Partial Backend Initialization**:
 
@@ -503,8 +611,10 @@ If a tool call fails after successful session creation:
 
 Race condition exists: session may be terminated while a request is in flight:
 - Session `Close()` closes all backend clients
+- **Race scenario**: RWMutex serializes access but doesn't prevent using closed clients—`CallTool()` can acquire a read lock after `Close()` completes and attempt to use a closed client
 - In-flight requests receive "client closed" error from MCP library
-- Mitigation: Session storage layer extends TTL on every request (via existing `Get()` call which touches session), reducing race window
+- **Required mitigation**: Session implementation must track a `closed` flag; `CallTool()`, `ReadResource()`, and `GetPrompt()` check this flag under read lock before using clients, returning "session closed" error if true
+- **Additional mitigation**: Session storage layer extends TTL on every request (via existing `Get()` call which touches session), reducing race window
 - Future enhancement: Add reference counting to delay `Close()` until in-flight requests complete
 
 **Session Not Found**:
@@ -534,16 +644,50 @@ No new security boundaries are introduced. This is a refactoring of existing ses
 
 ⚠️ **Short-lived credentials**: With session-scoped clients, short-lived outgoing credentials (e.g., expiring OAuth tokens) resolved at client creation could become stale mid-session.
 
-**Mitigation strategies** (implementation details for future work):
-1. **Per-request header injection**: Resolve credentials fresh for each request, inject into MCP client headers
-2. **Manual token refresh**: Use `mcp-go`'s `OAuthHandler.RefreshToken()` to refresh OAuth tokens when they expire (note: mcp-go provides refresh capability but does not do it automatically - requires explicit calls)
-3. **Client recreation on auth failure**: Detect 401/403 errors, recreate client with refreshed credentials
-4. **Session TTL alignment**: Set session TTL shorter than credential lifetime
+**Mitigation strategies**:
+
+**Phase 1 (initial implementation)**:
+1. **Automatic client recreation on auth failure**: Detect 401/403 errors from backend calls, automatically recreate client with fresh credentials from `OutgoingAuthRegistry`, retry the operation. This handles token expiration gracefully without requiring new sessions.
+2. **Session TTL alignment**: Set session TTL (typically 30 minutes) shorter than expected credential lifetime to reduce stale credential exposure.
+
+**Recommended approach (Phase 2 enhancement)**:
+- **Modify `identityPropagatingRoundTripper`** (`pkg/vmcp/client/client.go`): Instead of capturing identity/credentials at client creation time, read identity from the request context on each operation. This ensures each backend call picks up the latest credentials dynamically, eliminating stale credential issues entirely.
+
+**Alternative approaches** (for reference):
+- **Manual token refresh**: Use `mcp-go`'s `OAuthHandler.RefreshToken()` to refresh OAuth tokens when they expire (note: mcp-go provides refresh capability but does not do it automatically - requires explicit calls)
+- **Per-request header injection**: Resolve credentials fresh for each request, inject into MCP client headers
+
+For initial implementation, we assume most backends use long-lived credentials (API keys, client certificates), and Phase 1's automatic recreation on auth failure is sufficient.
+
+### Session Hijacking Prevention
+
+⚠️ **Session hijacking risk**: Session IDs are passed via HTTP header (`Mcp-Session-Id`). If a session ID leaks (network sniffing, logs, stolen from client), an attacker could potentially use the session to access backends with the victim's credentials.
+
+**Current state**: Sessions store identity (`auth.Identity`) but the RFC does not currently specify session binding to prevent hijacking. This is a **security gap** that should be addressed.
 
-For initial implementation, we assume:
-- Most backends use long-lived credentials (API keys, client certificates)
-- Session TTLs (typically 30 minutes) are shorter than credential lifetimes
-- Per-request credential resolution is future enhancement if needed
+**Recommended mitigations**:
+
+**Phase 1 (required for production)**:
+1. **Session binding to authentication token**:
+   - Store a cryptographic hash of the original authentication token (e.g., `SHA256(bearerToken)`) in the session during creation
+   - On each request, validate that the current auth token hash matches the session's bound token hash
+   - If mismatch, reject with "session authentication mismatch" error and terminate session
+   - This prevents stolen session IDs from being used with different credentials
+
+2. **TLS-only enforcement**:
+   - Require TLS for all vMCP connections (prevent session ID interception)
+   - Already enforced in production deployments, but document as requirement
+
+3. **Short session TTL**:
+   - Default 30-minute TTL limits exposure window for stolen session IDs
+   - Already specified in design, reinforces hijacking mitigation
+
+**Phase 2 (optional enhancements)**:
+- **mTLS client certificate binding**: If using mTLS, bind session to client certificate fingerprint for stronger binding
+- **IP address validation**: Optionally validate client IP hasn't changed (breaks mobile clients, proxies - use with caution)
+- **Session rotation**: Periodically rotate session IDs (e.g., every 10 minutes) to limit stolen session lifetime
+
+**Implementation**: Add token hash binding in Phase 2 SessionManager implementation. Store `tokenHash` in session metadata, validate on each request via middleware.
 
 ### Data Protection
 
@@ -560,6 +704,22 @@ For initial implementation, we assume:
 - Handlers should catch and handle "client closed" errors appropriately
 - Future Enhancement: Add reference counting to delay `Close()` until all in-flight requests complete
 
+**Resource Exhaustion & DoS Protection**:
+
+⚠️ **Connection multiplication**: Session-scoped clients create N sessions × M backends = N×M backend connections. At scale (e.g., 500 sessions × 20 backends = 10,000 connections), this can lead to resource exhaustion or DoS.
+
+**Required mitigations**:
+1. **Max concurrent sessions limit**: Implement configurable limit on active sessions (e.g., `TOOLHIVE_MAX_SESSIONS=1000`). Return clear error when limit reached: "maximum concurrent sessions exceeded, try again later"
+2. **Per-client session limits**: Track sessions per client identity/IP, enforce per-client limits (e.g., 10 sessions per client) to prevent single client from exhausting resources
+3. **Aggressive session TTL**: Default 30-minute TTL with idle timeout (e.g., 5 minutes of inactivity) to reclaim unused sessions faster
+4. **Connection pooling consideration**: Future enhancement to share connections across sessions for same backend+identity combination (out of scope for this RFC but noted for future work)
+
+**Monitoring**:
+- Expose metrics: `vmcp_active_sessions`, `vmcp_backend_connections_total`, `vmcp_sessions_rejected_total`
+- Alert on: high session count, connection exhaustion, rejected sessions
+
+**Implementation**: Add max sessions limit in Phase 2 (SessionManager tracks active session count, rejects new sessions when limit reached).
+
 ### Secrets Management
 
 **Storage and Retrieval**: Outgoing auth secrets are retrieved via `OutgoingAuthRegistry` during client creation. The timing changes (session init vs first request) but the mechanism and storage are identical.
@@ -571,7 +731,20 @@ For initial implementation, we assume:
 - **Client cleanup**: `Session.Close()` closes all clients, allowing credential cleanup by MCP client library
 - **Existing protections**: MCP client library handles credential redaction in logs and error messages (inherited behavior)
 
-**No change in security posture**: Moving from per-request clients (short-lived credentials in memory) to session-scoped clients (longer-lived credentials in memory) increases credential lifetime but does not change exposure—credentials were always in memory during client existence. Session TTL limits this window.
+**Security trade-off**: Moving from per-request clients to session-scoped clients **significantly increases credential exposure window**:
+- **Before**: Credentials in memory for milliseconds (duration of single request ~100-500ms)
+- **After**: Credentials in memory for minutes (duration of session, typically 30 minutes)
+
+This trade-off is acceptable because:
+- **Still in-process memory**: Credentials remain in process memory (not persisted to disk), same as before
+- **Session TTL limits window**: Configurable TTL (default 30 minutes) bounds exposure duration
+- **Performance benefit**: Eliminates per-request authentication overhead, enables stateful workflows
+- **Mitigations available**:
+  - 401/403 detection with automatic client recreation reduces stale credential lifetime
+  - Can set shorter session TTLs for backends with sensitive credentials
+  - Future enhancement: Per-request credential resolution (modify `identityPropagatingRoundTripper`)
+
+This is a **deliberate trade-off** prioritizing performance and functionality while maintaining reasonable security boundaries.
 
 ### Audit Logging
 
@@ -721,16 +894,24 @@ This implementation introduces new interfaces and gradually migrates from the cu
 
 **Implementation Details**:
 
-The `defaultSession` implementation owns backend clients in an internal map. When `SessionFactory.MakeSession()` runs:
-1. Create and initialize one MCP client per backend
+The `defaultSession` implementation:
+- **Embeds `transportsession.Session`** for metadata layer (ID, Type, Touch, GetData)
+- **Owns backend clients** in an internal map (runtime layer)
+- **Implements both interfaces**: storage-oriented `transportsession.Session` + behavior-oriented vMCP `Session`
+
+When `SessionFactory.MakeSession()` runs:
+1. Create and initialize one MCP client per backend **in parallel** (with bounded concurrency, per-backend timeouts)
 2. Pass clients to aggregator to discover capabilities from all backends
-3. Return session with pre-initialized clients map (same clients used for discovery)
+3. Create `transportsession.Session` for metadata layer (ID, timestamps)
+4. Return behavior-oriented session that embeds transport session + owns clients map (same clients used for discovery)
 
 **Testing**:
 - Unit tests for `Session` interface methods (CallTool, ReadResource, Close)
 - Unit tests for `SessionFactory.MakeSession()` with mocked aggregator and client factory
+- Test parallel client initialization with timeouts
 - Test partial backend initialization (some backends fail, session still created)
 - Test session closure closes all clients
+- Test composition: verify session implements both `transportsession.Session` and behavior-oriented `Session` interfaces
 
 **Files Modified**: None (purely additive)
 
@@ -739,7 +920,7 @@ The `defaultSession` implementation owns backend clients in an internal map. Whe
 **Goal**: Implement the new ideal code path using `SessionManager` + `SessionFactory` + `Session`, but hide it behind a `sessionManagementV2` config flag (defaults to disabled). Do not attempt to migrate/refactor existing code - write fresh ideal implementation.
 
 **New Files**:
-- `pkg/vmcp/server/session_manager.go` - Implement `SessionManager` bridging domain (Session) to protocol (SDK)
+- `pkg/vmcp/server/session_manager.go` - Implement `SessionManager` bridging domain (Session) to protocol (SDK), **delegates storage to `transportsession.Manager`**
 - `pkg/vmcp/config/config.go` - Add `sessionManagementV2` feature flag (defaults to `false`)
 
 **Feature Flag Integration**:
@@ -747,8 +928,9 @@ Add conditional logic in server/hooks/middleware to toggle between old and new b
 
 ```go
 if config.SessionManagementV2 {
-    // New code path: Use SessionFactory + Session interface
-    sessionManager := NewSessionManager(sessionFactory)
+    // New code path: Use SessionFactory + Session interface + transportsession.Manager
+    localStorage := transportsession.NewLocalStorage()  // or RedisStorage in future
+    sessionManager := NewSessionManager(sessionFactory, localStorage)
     server.WithSessionIdManager(sessionManager)
     // ... new hooks using sessionManager.GetAdaptedTools()
 } else {
@@ -757,11 +939,12 @@ if config.SessionManagementV2 {
 }
 ```
 
-**Key methods in SessionManager** (see "Wiring into the MCP SDK" section in Detailed Design for full behavior):
-- `Generate() string` - Creates empty session (two-phase creation pattern)
-- `Terminate(sessionID) (bool, error)` - Loads session, calls `Close()`, removes from map
-- `GetAdaptedTools(sessionID) []mcp.Tool` - Retrieves session, converts its tools to SDK format with handlers
-- `CreateSession(ctx, identity, backends) (string, error)` - Called from `OnRegisterSession` hook to create fully-formed session via factory
+**Key methods in SessionManager** (see "SessionManager Design" section in Detailed Design for full behavior):
+- Uses `transportsession.Manager` internally (NOT a separate `sync.Map`)
+- `Generate() string` - Creates empty session via `sessionStorage.AddSession()`, returns ID (two-phase creation pattern)
+- `Terminate(sessionID) (bool, error)` - Loads session via `sessionStorage.Get()`, calls `Close()`, removes via `sessionStorage.Delete()`
+- `GetAdaptedTools(sessionID) []mcp.Tool` - Loads session via `sessionStorage.Get()`, converts tools to SDK format with handlers
+- `CreateSession(ctx, identity, backends) (string, error)` - Called from `OnRegisterSession` hook, creates session via factory, stores via `sessionStorage.AddSession()`
 
 **Testing**:
 - Integration tests with feature flag enabled: Create session via `SessionManager.Generate()`, verify session stored
@@ -802,6 +985,7 @@ if config.SessionManagementV2 {
 - `pkg/vmcp/server/server.go` - Remove old code path and feature flag conditionals
 - `pkg/vmcp/discovery/middleware.go` - Delete (replaced by SessionFactory)
 - `pkg/vmcp/client/client.go` - Remove `httpBackendClient` (replaced by Session ownership)
+- `pkg/transport/session/manager.go` - Update `DeleteExpired()` to call `session.Close()` before removing from storage (fixes resource leak)
 - Delete old `VMCPSession` implementation files
 
 **Rationale**: Once the new code path is validated, delete the old code entirely rather than maintaining both paths. This avoids technical debt and ongoing maintenance burden.
@@ -932,20 +1116,27 @@ Most major design questions have been resolved during RFC review. One architectu
      - Individual sessions don't maintain circuit breaker state - they call through to a shared health monitor service
      - When a tool call fails, session delegates error reporting to the health monitor which updates shared circuit breaker state
      - Existing health monitoring system (`pkg/vmcp/health`) continues tracking backend health independently of sessions
-   - **Open design questions**:
-     - Should sessions created before a backend is circuit-broken continue using that backend's client, or should they dynamically check health state on each call?
-     - How to handle mid-session backend failures? (Current mitigation: session storage extends TTL, short TTL limits exposure)
-   - **Recommendation**: Defer detailed design to Phase 1-2 implementation, leverage existing health monitoring patterns where possible
+   - **Recommended approach for mid-session backend failures**:
+     - **Track backend health in session state**: When a tool call to a backend fails (connection error, timeout, etc.), mark that backend as unhealthy in session-local state
+     - **Return clean errors**: If a backend is marked unhealthy, subsequent calls to that backend return "backend unavailable" error immediately without attempting connection
+     - **Lazy reconnection**: On tool call to an unhealthy backend, check global health monitor to see if backend has recovered. If recovered, attempt to recreate client lazily and mark backend healthy again
+     - This avoids checking global health state on every call (performance) while allowing sessions to recover from transient failures
+   - **Implementation details**: Defer to Phase 1-2 implementation, leverage existing health monitoring patterns where possible
 
 ## References
 
 - [MCP Specification](https://modelcontextprotocol.io/specification/2025-06-18) - Model Context Protocol specification
 - [toolhive#3062](https://github.com/stacklok/toolhive/issues/3062) - Original issue: Per-request client creation overhead
+- [toolhive#2417](https://github.com/stacklok/toolhive/issues/2417) - Session storage architecture and dual-layer model (metadata vs runtime state)
 - [toolhive#2852](https://github.com/stacklok/toolhive/issues/2852) - Missing integration tests below server level
 - [mark3labs/mcp-go SDK](https://github.com/mark3labs/mcp-go) - MCP Go SDK used for backend clients
 - [Virtual MCP Architecture Documentation](https://github.com/stacklok/toolhive/blob/main/docs/arch/10-virtual-mcp-architecture.md) - Current vMCP architecture
-- Related PRs in ToolHive:
-  - [toolhive#1989](https://github.com/stacklok/toolhive/pull/1989) - Session management infrastructure (provides storage and cleanup hooks)
+- **Session Storage Architecture PRs** (existing pluggable storage this RFC builds on):
+  - [toolhive#1677](https://github.com/stacklok/toolhive/pull/1677) - Foundation for pluggable session storage
+  - [toolhive#1770](https://github.com/stacklok/toolhive/pull/1770) - Session storage layer enhancements
+  - [toolhive#1771](https://github.com/stacklok/toolhive/pull/1771) - Additional session storage improvements
+  - [toolhive#1989](https://github.com/stacklok/toolhive/pull/1989) - Session management infrastructure with `Storage` interface and `Manager` for Redis/Valkey support
+- **vMCP Feature PRs**:
   - [toolhive#3517](https://github.com/stacklok/toolhive/pull/3517) - Optimizer-in-vMCP implementation (motivates Session interface design)
   - [toolhive#3312](https://github.com/stacklok/toolhive/pull/3312) - Optimizer-in-vMCP changes to server.go
   - [toolhive#3642](https://github.com/stacklok/toolhive/pull/3642) - Discussion on session capability refresh (simplified by Session encapsulation)

From 1f73b30c35bf0e77da34a2bfc0e3c7accbdd9455 Mon Sep 17 00:00:00 2001
From: Yolanda Robla <info@ysoft.biz>
Date: Thu, 12 Feb 2026 11:05:49 +0100
Subject: [PATCH 7/8] fixes from review

---
 ...HV-0038-session-scoped-client-lifecycle.md | 372 +++++++-----------
 1 file changed, 133 insertions(+), 239 deletions(-)

diff --git a/rfcs/THV-0038-session-scoped-client-lifecycle.md b/rfcs/THV-0038-session-scoped-client-lifecycle.md
index 41a8963..f57fa03 100644
--- a/rfcs/THV-0038-session-scoped-client-lifecycle.md
+++ b/rfcs/THV-0038-session-scoped-client-lifecycle.md
@@ -3,13 +3,13 @@
 - **Status**: Draft
 - **Author(s)**: @yrobla, @jerm-dro
 - **Created**: 2026-02-04
-- **Last Updated**: 2026-02-09
+- **Last Updated**: 2026-02-12
 - **Target Repository**: toolhive
 - **Related Issues**: [toolhive#3062](https://github.com/stacklok/toolhive/issues/3062)
 
 ## Summary
 
-Refactor vMCP session architecture to encapsulate behavior in well-defined interfaces (`Session` and `SessionFactory`), decoupling session creation (domain logic) from SDK integration (protocol concerns).
+Refactor vMCP session architecture around three core components: `Session` (domain object with behavior), `SessionFactory` (creates fully-formed sessions), and `SessionManager` (bridges domain logic to MCP SDK protocol). This separates session creation (domain logic) from SDK integration (protocol concerns) and integrates with the existing pluggable session storage architecture.
 
 Restructuring the session management enables:
 
@@ -156,47 +156,54 @@ This makes testing difficult (can't create sessions without full server) and add
 
 ### Proposed Design
 
-This RFC proposes restructuring around two key interfaces:
+This RFC proposes restructuring around three core components:
 
-1. **SessionFactory**: Creates fully-formed sessions with all dependencies (capability discovery, client initialization, resource allocation)
-2. **Session**: A domain object that owns its resources, encapsulates routing logic, and manages its own lifecycle
+1. **Session**: A domain object that owns its resources (clients), encapsulates routing logic, and manages its own lifecycle
+2. **SessionFactory**: Creates fully-formed sessions with all dependencies (capability discovery, client initialization, resource allocation)
+3. **SessionManager**: Bridges domain logic (Session/SessionFactory) to MCP SDK protocol, using the existing `transportsession.Manager` for storage
 
-**Proposed session creation flow:**
+**Proposed session creation flow** (two-phase pattern due to SDK constraints):
 
 ```mermaid
 sequenceDiagram
     participant Client
+    participant SDK as MCP SDK
     participant SessionMgr as SessionManager
     participant Factory as SessionFactory
     participant Aggregator
     participant Session
-    participant SDK as MCP SDK
 
-    Client->>SessionMgr: POST /mcp/initialize
+    Client->>SDK: POST /mcp/initialize
+    Note over SDK: SDK handles initialize,<br/>needs session ID
+    SDK->>SessionMgr: Generate() // No context!
+    Note over SessionMgr: Phase 1: Create empty<br/>session placeholder
+    SessionMgr->>SessionMgr: Store empty session by ID
+    SessionMgr-->>SDK: sessionID
+    SDK->>SessionMgr: OnRegisterSession(sessionID, ctx)
+    Note over SessionMgr: Phase 2: Now we have context!<br/>Populate the session
     SessionMgr->>Factory: MakeSession(ctx, identity, backends)
-    Factory->>Aggregator: AggregateCapabilities(ctx, backends)
-    Aggregator-->>Factory: {Capabilities, Clients}
+    Factory->>Factory: Create clients for each backend
+    Factory->>Aggregator: AggregateCapabilities(ctx, clients)
+    Aggregator-->>Factory: Capabilities (tools, resources, routing)
     Factory->>Session: new Session(capabilities, clients)
     Session-->>Factory: session
     Factory-->>SessionMgr: session
-    SessionMgr->>SessionMgr: Store session by ID
-    SessionMgr->>SDK: Wire session to SDK
-    SDK->>SessionMgr: Generate() // Returns session ID
-    SessionMgr-->>SDK: sessionID
-    SDK->>SessionMgr: OnRegisterSession(sessionID)
+    SessionMgr->>SessionMgr: Replace placeholder with real session
     SessionMgr->>Session: Tools(), Resources()
     Session-->>SessionMgr: capabilities
     SessionMgr->>SDK: AddSessionTools(sessionID, tools)
     SDK-->>Client: InitializeResult with Mcp-Session-Id header
 ```
 
-**Key Flow:**
-1. Initialize request triggers `SessionManager` to create session via `SessionFactory`
-2. SessionFactory creates clients, passes them to aggregator for capability discovery
-3. SessionFactory returns fully-formed session (capabilities + pre-initialized clients)
-4. Session stored in SessionManager before SDK registration
-5. SDK calls `Generate()` which returns the session ID
-6. SDK registers session's tools/resources via `OnRegisterSession` hook
+**Key Flow (two-phase creation):**
+1. SDK receives initialize request, needs session ID
+2. **Phase 1**: SDK calls `Generate()` (no context) → SessionManager creates empty session placeholder, stores it, returns session ID
+3. **Phase 2**: SDK calls `OnRegisterSession(sessionID, ctx)` (has context) → SessionManager calls `SessionFactory.MakeSession()` to create fully-formed session (capability discovery, client initialization)
+4. SessionManager replaces empty placeholder with fully-formed session
+5. SDK registers session's tools/resources via `AddSessionTools()`
+6. Client receives InitializeResult with session ID header
+
+**Why two phases?** The SDK's `Generate()` has no context parameter (see "SDK Interface Constraint" section below), so we cannot create fully-formed sessions until `OnRegisterSession` hook provides request context.
 
 ### Terminology Clarification
 
@@ -291,13 +298,15 @@ type Session interface {
 - **Manageable lifetime**: `Close()` cleans up clients and any other resources. The SessionManager/caller is decoupled from what exactly happens on close()
 - **Thread-safe**: All methods protected by internal RWMutex for concurrent access to prevent data races on internal state. Read operations (Tools, Resources, Prompts, CallTool, ReadResource, GetPrompt) use read locks; write operations (Close) use write locks. Methods like `Tools()`, `Resources()`, `Prompts()` return defensive copies (not references to internal slices) to prevent caller mutations. Internal maps and slices are never exposed directly. Note: The RWMutex prevents data races but does not prevent `CallTool()` from using a closed client after `Close()` completes—the session implementation must track closed state explicitly (see "Client Closed Mid-Request" in Error Handling). Proper encapsulation may allow removal of mutexes elsewhere (e.g., SessionManager, current VMCPSession) since state is now owned by the Session
 
-**Separation of concerns**: The Session interface focuses on domain logic (capabilities, routing, client ownership). This RFC builds on the existing pluggable session storage architecture ([PR #1989](https://github.com/stacklok/toolhive/pull/1989)) which provides `Storage` interface and `Manager` for Redis/Valkey support. The architecture uses a **dual-layer model** (detailed in section "Session Architecture and Serializability"):
+**Separation of concerns**: The Session interface focuses on domain logic (capabilities, routing, client ownership). This RFC builds on the existing pluggable session storage architecture ([PR #1989](https://github.com/stacklok/toolhive/pull/1989)) which provides `Storage` interface and `Manager` for Redis/Valkey support.
+
+**Dual-layer architecture** (detailed in section "Session Architecture and Serializability"):
 - **Metadata layer** (serializable): Session ID, timestamps, identity reference - stored via `transportsession.Manager` + `Storage` interface (supports `LocalStorage` today, `RedisStorage` in future)
 - **Runtime layer** (in-memory only): MCP client objects, routing table, capabilities - cannot be serialized due to TCP connections and goroutines
 
-**Important notes**:
-- The existing `Storage.DeleteExpired()` only removes from storage without calling `Close()`, which would leak backend client connections. As part of this RFC, the session storage layer must be updated to call `session.Close()` before removing expired sessions from storage.
-- **Horizontal scaling**: Session metadata can be in Redis/Valkey, but MCP clients are in-process only. Horizontal scaling requires sticky sessions (session affinity at load balancer) to ensure requests route to the instance holding the client objects. Without sticky sessions, clients must be recreated on instance switch (acceptable one-time cost per [Issue #2417](https://github.com/stacklok/toolhive/issues/2417)).
+**Lifecycle management**: Sessions are stored via `transportsession.Manager.AddSession()` and expire based on configured TTL. Proper cleanup requires calling `session.Close()` to release backend client connections before removing from storage. The existing `Storage.DeleteExpired()` implementation only removes metadata without calling `Close()`, which would leak connections. This RFC updates the storage layer to call `Close()` before deletion, ensuring complete resource cleanup (see Implementation Plan Phase 3).
+
+**Horizontal scaling**: Session metadata can be in Redis/Valkey, but MCP clients are in-process only. Horizontal scaling requires sticky sessions (session affinity at load balancer) to ensure requests route to the instance holding the client objects. Without sticky sessions, clients must be recreated on instance switch (acceptable one-time cost per [Issue #2417](https://github.com/stacklok/toolhive/issues/2417)).
 
 **SessionFactory Interface** - Creates fully-formed sessions:
 
@@ -328,7 +337,7 @@ The default factory implementation follows this pattern:
 
 1. **Create MCP clients**: Initialize clients for each backend **in parallel**
    - Factory creates one client per backend using existing client factory
-   - **Performance requirement**: Use parallel initialization (e.g., `errgroup` with bounded concurrency) to avoid sequential latency accumulation. With 20 backends at 100-500ms each, sequential would mean 2-10 seconds of session creation latency.
+   - **Performance requirement**: Use parallel initialization (e.g., `errgroup` with bounded concurrency) to avoid sequential latency accumulation. Connection initialization (TCP handshake + TLS negotiation + MCP protocol handshake) can take tens to hundreds of milliseconds per backend depending on network latency and backend responsiveness. With 20 backends, sequential initialization could easily exceed acceptable session creation latency.
    - **Bounded concurrency**: Limit parallel goroutines (e.g., 10 concurrent initializations) to avoid resource exhaustion
    - **Per-backend timeout**: Apply context timeout (e.g., 5s per backend) so one slow backend doesn't block session creation
    - **Partial initialization**: If some backends fail, log warnings and continue with successfully initialized backends (failed backends not added to clients map)
@@ -359,7 +368,7 @@ AggregateCapabilities(ctx context.Context, clients map[string]*Client) (*Aggrega
 
 **Rationale**: Separates client lifecycle (factory's concern) from capability discovery (aggregator's concern). The factory creates clients once, passes them to aggregator for discovery, then passes same clients to session for reuse.
 
-**Performance impact**: Reusing discovery clients eliminates redundant connection setup (TCP handshake + TLS negotiation + MCP initialization), saving ~100-500ms per backend depending on network conditions.
+**Performance impact**: Reusing discovery clients eliminates redundant connection setup (TCP handshake + TLS negotiation + MCP initialization), avoiding connection overhead on every request (this overhead varies significantly based on network latency, TLS configuration, and backend responsiveness).
 
 #### 3. Session Implementation
 
@@ -382,83 +391,49 @@ The default session implementation stores:
 
 **Thread safety**: Read operations (CallTool, ReadResource) use read locks; Close uses write lock.
 
-#### 4. Session Architecture and Serializability
+**Health monitoring integration**: Sessions interact with the existing global health monitor (`pkg/vmcp/health`) to handle backend failures:
+- **Session creation**: `SessionFactory.MakeSession()` consults health monitor to exclude unhealthy/circuit-broken backends (don't create clients for known-bad backends)
+- **Mid-session failures**: When `CallTool()` fails (connection error, timeout), mark backend unhealthy in session-local state and delegate error reporting to global health monitor
+- **Recovery**: On subsequent calls to unhealthy backend, check global health monitor; if recovered, lazily recreate client
+- **Design approach**: Global health monitor maintains shared circuit breaker state across sessions. Individual sessions track local backend health (unhealthy flag) to avoid redundant connection attempts while allowing recovery when backends come back online. Implementation details deferred to Phase 1-2.
 
-**Dual-Layer Session Model**: This RFC builds on the existing session storage architecture introduced in [PR #1989](https://github.com/stacklok/toolhive/pull/1989) and [Issue #2417](https://github.com/stacklok/toolhive/issues/2417), which established a pluggable `Storage` interface designed for Redis/Valkey externalization. The session architecture follows a **two-layer model**:
+#### 4. Storage Integration and Distributed Deployment
 
-**Layer 1: Metadata (Serializable)** - Managed by `transportsession.Manager` + `Storage` interface:
-- Session ID (string)
-- Session type (string)
-- Creation timestamp
-- Last access timestamp (TTL tracking)
-- User identity reference
-- Backend workload IDs list
-- **Stored via**: `pkg/transport/session/storage.Storage` interface (`Store`, `Load`, `Delete`, `DeleteExpired`, `Close`)
-- **Backends**: `LocalStorage` (in-memory `sync.Map`) today, `RedisStorage` in future
-- **Serialization**: JSON format already implemented in `pkg/transport/session/serialization.go`
+**Building on existing architecture**: This RFC integrates with the existing pluggable session storage layer ([PR #1989](https://github.com/stacklok/toolhive/pull/1989), [Issue #2417](https://github.com/stacklok/toolhive/issues/2417)). The existing architecture provides:
+- `Storage` interface with pluggable backends (`LocalStorage` today, `RedisStorage` in future)
+- `transportsession.Manager` for session lifecycle management
+- JSON serialization for session metadata
 
-**Layer 2: Runtime State (In-Memory Only)** - Managed by behavior-oriented `Session`:
-- Pre-initialized MCP client objects (contain TCP connections, goroutines)
-- Routing table (tool/resource name → backend workload ID mapping)
-- Discovered capabilities (tools, resources, prompts)
-- Closed state flag
-- **Cannot be serialized**: MCP clients contain active TCP connections and goroutine state that cannot be persisted
+**The fundamental constraint**: MCP clients contain active TCP connections and goroutine state that **cannot be serialized**. This is a technical limitation, not an implementation choice. You cannot persist a live TCP connection to Redis.
 
-**Composition Model**: How the two layers relate:
+**The solution: Dual-layer model**
 
-```go
-// Existing storage-oriented session interface (pkg/transport/session/session.go)
-type Session interface {
-    ID() string
-    Type() string
-    Touch() time.Time
-    GetData() any
-}
-
-// New behavior-oriented vMCP session implementation
-type vmcpSession struct {
-    // Metadata layer - embeds transportsession.Session
-    transportSession transportsession.Session  // provides ID(), Type(), Touch(), GetData()
-
-    // Runtime layer - behavior and clients
-    identity       *auth.Identity
-    routingTable   map[string]string  // tool name → backend ID
-    capabilities   []Tool
-    clients        map[string]*mcp.Client  // backend ID → MCP client
-    closed         bool
-    mu             sync.RWMutex
-}
+Sessions are split into two layers with different lifecycles:
 
-// Implements both interfaces
-func (s *vmcpSession) ID() string { return s.transportSession.ID() }
-func (s *vmcpSession) Type() string { return s.transportSession.Type() }
-func (s *vmcpSession) Touch() time.Time { return s.transportSession.Touch() }
-func (s *vmcpSession) GetData() any { return s.transportSession.GetData() }
-
-func (s *vmcpSession) CallTool(ctx, name, args) (*ToolResult, error) { /* behavior */ }
-// ... other behavior methods
-```
+| Layer | Contents | Storage | Lifetime |
+|-------|----------|---------|----------|
+| **Metadata** | Session ID, timestamps, identity reference, backend IDs list | Serializable to Redis/Valkey via `Storage` interface | Persists across vMCP restarts, shared across instances |
+| **Runtime** | MCP client objects, routing table, capabilities, closed flag | In-process memory only | Lives only while vMCP instance is running |
 
-**Integration with `transportsession.Manager`**: The behavior-oriented `Session` is stored in `transportsession.Manager` (which uses the pluggable `Storage` interface), NOT in a separate `sync.Map`. This ensures:
-- Redis/Valkey support works automatically when `RedisStorage` is implemented
-- No parallel storage path that bypasses the existing architecture
-- Consistent TTL management and expiration across all session types
+**How it works**:
+- The behavior-oriented `Session` embeds `transportsession.Session` (metadata layer) and owns MCP clients (runtime layer)
+- Sessions are stored via `transportsession.Manager.AddSession()` which uses the pluggable `Storage` interface
+- When `RedisStorage` is implemented, metadata automatically persists to Redis, clients remain in-memory
+- No parallel storage path—all sessions go through the same `Storage` interface
 
-**Distributed Deployment Considerations** ([Issue #2417](https://github.com/stacklok/toolhive/issues/2417)):
+**Distributed deployment implications**:
 
-1. **With sticky sessions (recommended)**:
-   - Load balancer routes client to same vMCP instance for session lifetime
-   - Session metadata in Redis (shared), clients in-memory (per-instance)
-   - Optimal performance: clients reused across requests
+With **sticky sessions** (recommended):
+- Load balancer routes client to same vMCP instance → clients stay warm, optimal performance
+- Session metadata in Redis (shared), clients in-memory (per-instance)
 
-2. **Without sticky sessions (graceful degradation)**:
-   - Request may route to different vMCP instance
-   - New instance loads session metadata from Redis
-   - **Clients must be recreated** (one-time initialization cost ~100-500ms per backend)
-   - Subsequent requests to same instance reuse clients
-   - Acceptable trade-off: horizontal scaling with temporary perf hit on instance switch
+Without sticky sessions (graceful degradation):
+- Request routes to different vMCP instance → that instance loads metadata from Redis
+- **Clients must be recreated** (connection initialization overhead: TCP + TLS + MCP handshake, varies with network conditions)
+- Subsequent requests to same instance reuse recreated clients
+- Trade-off: horizontal scaling flexibility vs temporary performance impact on instance switch
 
-**Serializability constraints**: Only session metadata can be stored in Redis/Valkey. Runtime state (MCP clients) cannot be serialized due to TCP connections and goroutines. This is a **fundamental technical limitation**, not an implementation choice.
+**Implementation detail**: See Phase 1 in Implementation Plan for how `vmcpSession` struct composes `transportsession.Session` (metadata) with runtime state (clients map, routing table).
 
 #### 5. Wiring into the MCP SDK
 
@@ -498,61 +473,50 @@ This hybrid approach:
 
 ##### SessionManager Design
 
-**SessionManager** (`pkg/vmcp/server/session_manager.go`) bridges domain logic (behavior-oriented `Session`) to SDK protocol. It **delegates session storage to the existing `transportsession.Manager`**, which uses the pluggable `Storage` interface ([PR #1989](https://github.com/stacklok/toolhive/pull/1989)).
+**SessionManager** (`pkg/vmcp/server/session_manager.go`) bridges domain logic to SDK protocol, coordinating three components:
+1. `SessionFactory` - Creates fully-formed sessions with initialized clients
+2. `transportsession.Manager` - Handles session storage via pluggable `Storage` interface ([PR #1989](https://github.com/stacklok/toolhive/pull/1989))
+3. MCP SDK's `SessionIdManager` - Lifecycle hooks for session creation/termination
+
+**Key insight**: By delegating storage to `transportsession.Manager`, SessionManager automatically supports pluggable backends (LocalStorage today, RedisStorage in future) without creating a parallel storage path.
+
+**Session lifecycle flow**:
+
+```
+Client Request
+    ↓
+Generate() → Create empty placeholder → sessionStorage.AddSession() → Return session ID
+    ↓
+OnRegisterSession hook → factory.MakeSession() → Wrap in transportsession.Session → sessionStorage.AddSession() → SDK registers tools
+    ↓
+CallTool request → sessionStorage.Get(sessionID) → Extract behavior session → session.CallTool() → Return result
+    ↓
+Terminate() → sessionStorage.Get() → session.Close() → sessionStorage.Delete()
+```
 
-**Architecture**:
+**Implementation structure**:
 ```go
 type sessionManager struct {
-    factory         SessionFactory
-    sessionStorage  *transportsession.Manager  // Uses Storage interface
+    factory        SessionFactory
+    sessionStorage *transportsession.Manager  // Delegates to Storage interface
 }
 
-func NewSessionManager(factory SessionFactory, storage transportsession.Storage) *sessionManager {
-    return &sessionManager{
-        factory:        factory,
-        sessionStorage: transportsession.NewManagerWithStorage(storage),
-    }
-}
+// Constructor accepts pluggable storage backend
+func NewSessionManager(factory SessionFactory, storage transportsession.Storage) *sessionManager
 ```
 
-**Key responsibilities:**
-
-1. **Session creation**: `CreateSession(ctx, identity, backends)` - Used by discovery middleware
-   - Calls `factory.MakeSession()` to build fully-formed behavior-oriented session
-   - Wraps session in `transportsession.Session` (metadata layer)
-   - Stores via `sessionStorage.AddSession()` (uses `Storage` interface)
-   - Returns session ID (or error if creation fails)
-
-2. **SDK lifecycle** (implements `SessionIdManager`):
-   - `Generate() string` - Creates empty session placeholder, stores via `sessionStorage.AddSession()`, returns session ID (phase 1 of two-phase creation)
-   - `Validate(sessionID) (bool, error)` - Checks if session exists via `sessionStorage.Get(sessionID)`
-   - `Terminate(sessionID) (bool, error)` - Loads session via `sessionStorage.Get()`, calls `Close()` on behavior-oriented session, removes via `sessionStorage.Delete()`
-
-3. **SDK adaptation**:
-   - `GetAdaptedTools(sessionID)` - Loads session via `sessionStorage.Get()`, converts tools to SDK format
-   - Creates tool handlers that delegate to `session.CallTool()`
-   - Wraps results in SDK types (`mcp.ToolResult`)
-
-**Why use `transportsession.Manager` instead of `sync.Map`?**
-- **Pluggable storage**: Automatically supports `LocalStorage` (in-memory) and future `RedisStorage` (distributed) without code changes
-- **No parallel storage path**: All sessions go through the same `Storage` interface, avoiding architectural debt
-- **Consistent TTL**: `sessionStorage.Get()` automatically extends TTL via `Touch()` (existing behavior)
-- **Expiration cleanup**: `sessionStorage.DeleteExpired()` (updated to call `session.Close()` before deletion) works uniformly
-
-**Handler pattern**: Tool handlers are closures that:
-- Load session from `sessionStorage` by ID
-- Extract behavior-oriented session from metadata layer (`GetData()`)
-- Validate request arguments (type assertions)
-- Call `session.CallTool()` with validated args
-- Convert domain result to SDK format
-- Return MCP-formatted response
-
-**SDK Registration Flow**:
-1. Construct SessionManager with factory and storage: `sessionManager := NewSessionManager(sessionFactory, localStorage)`
-2. Pass SessionManager to SDK: `WithSessionIdManager(sessionManager)`
-3. In `OnRegisterSession` hook: register tools via `sessionManager.GetAdaptedTools(sessionID)`
-
-**Redis/Valkey support**: When `RedisStorage` is implemented, just pass it to `NewSessionManager()` instead of `LocalStorage`. Session metadata goes to Redis, runtime state (clients) remains in-memory. No other changes needed.
+**Core operations** (implements SDK's `SessionIdManager`):
+- `Generate()` - Creates empty session, stores via `sessionStorage.AddSession()`, returns ID (phase 1 of two-phase pattern)
+- `CreateSession(ctx, identity, backends)` - Calls `factory.MakeSession()`, stores fully-formed session (phase 2, called by `OnRegisterSession` hook)
+- `Validate(sessionID)` - Checks session exists via `sessionStorage.Get()`
+- `Terminate(sessionID)` - Loads session, calls `Close()` on clients, removes via `sessionStorage.Delete()`
+- `GetAdaptedTools(sessionID)` - Loads session, converts domain types to SDK format, creates handlers that delegate to `session.CallTool()`
+
+**Benefits of using `transportsession.Manager`**:
+- Pluggable storage works automatically (just pass `RedisStorage` instead of `LocalStorage` to constructor)
+- TTL management handled by existing `sessionStorage.Get()` which calls `Touch()`
+- No parallel storage path—consistent with all other session types
+- Expiration cleanup works uniformly (updated to call `session.Close()` before deletion)
 
 #### 5. Migration from Current Architecture
 
@@ -611,11 +575,21 @@ If a tool call fails after successful session creation:
 
 Race condition exists: session may be terminated while a request is in flight:
 - Session `Close()` closes all backend clients
-- **Race scenario**: RWMutex serializes access but doesn't prevent using closed clients—`CallTool()` can acquire a read lock after `Close()` completes and attempt to use a closed client
-- In-flight requests receive "client closed" error from MCP library
-- **Required mitigation**: Session implementation must track a `closed` flag; `CallTool()`, `ReadResource()`, and `GetPrompt()` check this flag under read lock before using clients, returning "session closed" error if true
-- **Additional mitigation**: Session storage layer extends TTL on every request (via existing `Get()` call which touches session), reducing race window
-- Future enhancement: Add reference counting to delay `Close()` until in-flight requests complete
+- **Race scenario**: RWMutex serializes access but doesn't fully prevent using closed clients—even with a `closed` flag check, there's a window between checking the flag (under read lock) and actually invoking the client method where `Close()` could complete
+- **Timing issue**: Thread A checks `closed == false` → releases read lock → Thread B calls `Close()` (acquires write lock, closes clients, sets `closed = true`) → Thread A invokes client method on closed client
+- In-flight requests receive "client closed" error from MCP library when this race occurs
+
+**Required mitigation (Phase 1 implementation)**:
+- **Reference counting**: Session tracks in-flight operation count with atomic operations:
+  - `CallTool()`, `ReadResource()`, `GetPrompt()` atomically increment counter before using client, decrement after completion
+  - `Close()` sets `closed` flag, waits for counter to reach zero (with timeout, e.g., 30s), then closes clients
+  - Operations check `closed` flag with atomic load; if true, return "session closed" error without incrementing counter
+  - This eliminates the race window by ensuring `Close()` waits for all in-flight operations
+- **Graceful degradation**: If `Close()` timeout expires with operations still in-flight, log warning and close clients anyway (in-flight operations get "client closed" errors)
+- **Alternative approach**: Use context cancellation—`Close()` cancels session context, operations detect cancellation and abort before using clients
+
+**Additional mitigation**:
+- Session storage layer extends TTL on every request (via existing `Get()` call which touches session), reducing likelihood of expiration during active use
 
 **Session Not Found**:
 
@@ -647,16 +621,17 @@ No new security boundaries are introduced. This is a refactoring of existing ses
 **Mitigation strategies**:
 
 **Phase 1 (initial implementation)**:
-1. **Automatic client recreation on auth failure**: Detect 401/403 errors from backend calls, automatically recreate client with fresh credentials from `OutgoingAuthRegistry`, retry the operation. This handles token expiration gracefully without requiring new sessions.
+1. **Automatic client recreation on auth failure**: Detect 401/403 errors from backend calls, automatically recreate client with fresh credentials from `OutgoingAuthRegistry`, retry the operation with safeguards:
+   - **Retry limit**: Maximum 3 attempts per operation to prevent infinite retry loops
+   - **Exponential backoff**: Initial 100ms delay, doubling on each retry (100ms, 200ms, 400ms) to avoid overwhelming backends
+   - **Circuit breaker integration**: Report auth failures to existing health monitor; if backend exceeds threshold (e.g., 5 failures in 60s), open circuit to prevent cascading failures
+   - **Audit logging**: Log each retry attempt with session ID, backend ID, and attempt number for debugging
+   - This handles token expiration gracefully without requiring new sessions while preventing resource exhaustion from misconfigured backends
 2. **Session TTL alignment**: Set session TTL (typically 30 minutes) shorter than expected credential lifetime to reduce stale credential exposure.
 
 **Recommended approach (Phase 2 enhancement)**:
 - **Modify `identityPropagatingRoundTripper`** (`pkg/vmcp/client/client.go`): Instead of capturing identity/credentials at client creation time, read identity from the request context on each operation. This ensures each backend call picks up the latest credentials dynamically, eliminating stale credential issues entirely.
 
-**Alternative approaches** (for reference):
-- **Manual token refresh**: Use `mcp-go`'s `OAuthHandler.RefreshToken()` to refresh OAuth tokens when they expire (note: mcp-go provides refresh capability but does not do it automatically - requires explicit calls)
-- **Per-request header injection**: Resolve credentials fresh for each request, inject into MCP client headers
-
 For initial implementation, we assume most backends use long-lived credentials (API keys, client certificates), and Phase 1's automatic recreation on auth failure is sufficient.
 
 ### Session Hijacking Prevention
@@ -667,7 +642,7 @@ For initial implementation, we assume most backends use long-lived credentials (
 
 **Recommended mitigations**:
 
-**Phase 1 (required for production)**:
+**Required for production deployment**:
 1. **Session binding to authentication token**:
    - Store a cryptographic hash of the original authentication token (e.g., `SHA256(bearerToken)`) in the session during creation
    - On each request, validate that the current auth token hash matches the session's bound token hash
@@ -682,12 +657,12 @@ For initial implementation, we assume most backends use long-lived credentials (
    - Default 30-minute TTL limits exposure window for stolen session IDs
    - Already specified in design, reinforces hijacking mitigation
 
-**Phase 2 (optional enhancements)**:
+**Optional enhancements**:
 - **mTLS client certificate binding**: If using mTLS, bind session to client certificate fingerprint for stronger binding
 - **IP address validation**: Optionally validate client IP hasn't changed (breaks mobile clients, proxies - use with caution)
 - **Session rotation**: Periodically rotate session IDs (e.g., every 10 minutes) to limit stolen session lifetime
 
-**Implementation**: Add token hash binding in Phase 2 SessionManager implementation. Store `tokenHash` in session metadata, validate on each request via middleware.
+**Implementation timing**: Token hash binding should be implemented during Implementation Plan Phase 2 (SessionManager implementation) before production rollout. Store `tokenHash` in session metadata, validate on each request via middleware.
 
 ### Data Protection
 
@@ -732,7 +707,7 @@ For initial implementation, we assume most backends use long-lived credentials (
 - **Existing protections**: MCP client library handles credential redaction in logs and error messages (inherited behavior)
 
 **Security trade-off**: Moving from per-request clients to session-scoped clients **significantly increases credential exposure window**:
-- **Before**: Credentials in memory for milliseconds (duration of single request ~100-500ms)
+- **Before**: Credentials in memory for milliseconds to seconds (duration of single request)
 - **After**: Credentials in memory for minutes (duration of session, typically 30 minutes)
 
 This trade-off is acceptable because:
@@ -760,56 +735,6 @@ This is a **deliberate trade-off** prioritizing performance and functionality wh
 
 **Existing Events Unchanged**: Tool call logs, authentication logs, and session lifecycle logs remain the same.
 
-## Alternatives Considered
-
-### Alternative 1: Keep Per-Request Pattern with Connection Pooling
-
-**Approach**: Continue creating/closing clients per request but add a connection pool underneath to reuse TCP connections.
-
-**Pros**:
-- Minimal changes to existing code
-- Reduces TCP handshake overhead
-
-**Cons**:
-- Doesn't address MCP protocol initialization overhead (still happens per request)
-- Doesn't solve state preservation problem (each request still gets fresh MCP client)
-- Authentication still resolved and validated per request
-- Adds complexity at wrong layer
-
-**Decision**: Rejected. This addresses symptoms but not the root cause.
-
-### Alternative 2: Lazy Client Creation on First Use
-
-**Approach**: Create clients on first tool call to a backend, then store in session for reuse.
-
-**Pros**:
-- Avoids creating clients for unused backends
-- Delays initialization until needed
-
-**Cons**:
-- First tool call to each backend still has initialization latency
-- More complex state management (need to track which clients exist)
-- Doesn't leverage existing capability discovery phase
-- Inconsistent performance (first vs subsequent calls)
-
-**Decision**: Rejected. Complexity outweighs benefits. Capability discovery already knows which backends exist.
-
-### Alternative 3: Global Client Cache Across Sessions
-
-**Approach**: Share clients across all sessions using a global cache, keyed by backend + auth identity.
-
-**Pros**:
-- Maximum connection reuse
-- Lowest initialization overhead
-
-**Cons**:
-- **State Pollution**: Backend state (Playwright contexts, DB transactions) would leak across sessions
-- **Security Risk**: Potential for cross-session data exposure if keying is incorrect
-- **Complexity**: Requires complex cache eviction, client sanitization, and identity tracking
-- **Violates Session Isolation**: Sessions should be independent
-
-**Decision**: Rejected. Session isolation is a core requirement for vMCP. The security and complexity risks outweigh performance gains.
-
 ## Compatibility
 
 ### Backward Compatibility
@@ -1091,41 +1016,9 @@ Each phase is independently testable and can be rolled back:
   - Traces: Session creation span → first tool call span (shows reduced latency)
 - Add runbook for investigating client initialization failures
 
-## Open Questions
-
-Most major design questions have been resolved during RFC review. One architectural question requires design during implementation:
-
-1. ~~Should clients be created eagerly or lazily?~~ → **Resolved: Eager initialization during `SessionFactory.MakeSession()`**
-2. ~~How should session concerns be organized?~~ → **Resolved: Encapsulate in `Session` interface (domain object), separate from SDK wiring (`SessionManager`)**
-3. ~~What happens if backend client initialization fails during session creation?~~ → **Resolved: Log warning, continue with partial initialization, failed backends not in clients map**
-4. ~~Should we share clients across sessions for efficiency?~~ → **Resolved: No, session isolation is critical (prevents state leakage, security risks)**
-5. ~~How to handle short-lived credentials with session-scoped clients?~~ → **Resolved: Document limitation, provide mitigation strategies (per-request header injection, token refresh hooks), assume most backends use long-lived credentials for initial implementation**
-6. ~~How should client closure be triggered?~~ → **Resolved: Session owns clients, `Session.Close()` called on expiration/termination via `SessionManager.Terminate()`**
-7. ~~How to work around SDK `Generate()` having no context parameter?~~ → **Resolved: Use two-phase creation pattern**
-   - SDK's `SessionIdManager.Generate()` has no context parameter
-   - Solution: Two-phase pattern where `Generate()` creates empty session, then `OnRegisterSession` hook calls `SessionFactory.MakeSession()` to populate it
-   - Works with SDK as-is, no unsafe patterns (e.g., goroutine-local storage)
-   - See "SDK Interface Constraint" section in Detailed Design for details
-
-8. **How does this architecture work with inter-session state like circuit breakers and health checks?** → **Requires design during Phase 1-2 implementation**
-   - **Current state**: Circuit breaker and health monitoring exist at backend level (across all sessions) in `pkg/vmcp/health`
-   - **Challenge**: Clients are now session-scoped, but circuit breaker state must be shared across sessions
-   - **Proposed approach**:
-     - Circuit breaker and health check state remains at **backend/aggregator level** (inter-session, shared state)
-     - During `SessionFactory.MakeSession()`, consult health monitor to exclude unhealthy/circuit-broken backends
-     - Individual sessions don't maintain circuit breaker state - they call through to a shared health monitor service
-     - When a tool call fails, session delegates error reporting to the health monitor which updates shared circuit breaker state
-     - Existing health monitoring system (`pkg/vmcp/health`) continues tracking backend health independently of sessions
-   - **Recommended approach for mid-session backend failures**:
-     - **Track backend health in session state**: When a tool call to a backend fails (connection error, timeout, etc.), mark that backend as unhealthy in session-local state
-     - **Return clean errors**: If a backend is marked unhealthy, subsequent calls to that backend return "backend unavailable" error immediately without attempting connection
-     - **Lazy reconnection**: On tool call to an unhealthy backend, check global health monitor to see if backend has recovered. If recovered, attempt to recreate client lazily and mark backend healthy again
-     - This avoids checking global health state on every call (performance) while allowing sessions to recover from transient failures
-   - **Implementation details**: Defer to Phase 1-2 implementation, leverage existing health monitoring patterns where possible
-
 ## References
 
-- [MCP Specification](https://modelcontextprotocol.io/specification/2025-06-18) - Model Context Protocol specification
+- [MCP Specification](https://modelcontextprotocol.io/specification) - Model Context Protocol specification
 - [toolhive#3062](https://github.com/stacklok/toolhive/issues/3062) - Original issue: Per-request client creation overhead
 - [toolhive#2417](https://github.com/stacklok/toolhive/issues/2417) - Session storage architecture and dual-layer model (metadata vs runtime state)
 - [toolhive#2852](https://github.com/stacklok/toolhive/issues/2852) - Missing integration tests below server level
@@ -1174,6 +1067,7 @@ This decorator pattern allows the optimizer to be added without modifying core s
 |------|----------|----------|-------|
 | 2026-02-04 | @yrobla, @jerm-dro | Draft | Initial submission |
 | 2026-02-09 | @jerm-dro, Copilot | Under Review | Addressed PR #38 comments |
+| 2026-02-12 | @jerm-dro, @ChrisJBurns | Under Review | Major revisions: Integrated with existing transportsession.Manager architecture, added dual-layer storage model, addressed security concerns (session hijacking, credential exposure, resource exhaustion), clarified error handling (removed InitError pattern), added parallel client initialization, simplified SessionManager design |
 
 ### Implementation Tracking
 

From 0ca7b5ae90e1a4123875e8c67a5477fe4268ab3b Mon Sep 17 00:00:00 2001
From: Yolanda Robla <info@ysoft.biz>
Date: Mon, 16 Feb 2026 16:21:12 +0100
Subject: [PATCH 8/8] changes from review meeting

---
 ...HV-0038-session-scoped-client-lifecycle.md | 751 ++++++++++++++++--
 1 file changed, 692 insertions(+), 59 deletions(-)

diff --git a/rfcs/THV-0038-session-scoped-client-lifecycle.md b/rfcs/THV-0038-session-scoped-client-lifecycle.md
index f57fa03..372a9b5 100644
--- a/rfcs/THV-0038-session-scoped-client-lifecycle.md
+++ b/rfcs/THV-0038-session-scoped-client-lifecycle.md
@@ -3,7 +3,7 @@
 - **Status**: Draft
 - **Author(s)**: @yrobla, @jerm-dro
 - **Created**: 2026-02-04
-- **Last Updated**: 2026-02-12
+- **Last Updated**: 2026-02-16
 - **Target Repository**: toolhive
 - **Related Issues**: [toolhive#3062](https://github.com/stacklok/toolhive/issues/3062)
 
@@ -11,12 +11,31 @@
 
 Refactor vMCP session architecture around three core components: `Session` (domain object with behavior), `SessionFactory` (creates fully-formed sessions), and `SessionManager` (bridges domain logic to MCP SDK protocol). This separates session creation (domain logic) from SDK integration (protocol concerns) and integrates with the existing pluggable session storage architecture.
 
+**Multi-client session model**: One vMCP instance serves multiple clients (e.g., Claude Desktop, VSCode, Cursor) via a single HTTP endpoint. Each client that sends an `InitializeRequest` gets a unique session ID (per MCP Streamable HTTP protocol). Sessions are created dynamically when clients connect, not pre-created at startup. Each session owns its own backend clients for complete isolation.
+
 Restructuring the session management enables:
 
-- **Client reuse throughout the session**: Clients are created once during initialization, reused for all requests, and closed during cleanup, enabling stateful workflows (e.g., Playwright browser sessions, database transactions)
+- **Client reuse throughout the session**: Backend clients are created once during session initialization, reused for all requests within that session, and closed during cleanup, enabling stateful workflows (e.g., Playwright browser sessions, database transactions)
 - **Integration testing without full server**: Test session capabilities and routing logic in isolation without spinning up the complete vMCP server
 - **Simplified feature extensions**: Add capabilities like optimizer-in-vmcp and capability refresh through interface decoration, since protocol concerns are decoupled from how capabilities are calculated
 
+## Terminology
+
+This RFC uses the following terms consistently:
+
+| Term | Definition | Example |
+|------|------------|---------|
+| **Backend** | A deployed MCP server instance that vMCP aggregates | Filesystem MCP server, Playwright MCP server |
+| **Backend workload** | Same as "backend" - the deployment unit | Used interchangeably with "backend" |
+| **MCP server** | Generic term for any MCP-compliant server | Can refer to vMCP itself or a backend |
+| **vMCP session** | Client-to-vMCP session identified by vMCP's session ID | Session created when Claude Desktop initializes with vMCP |
+| **Backend session** | vMCP-to-backend session identified by backend's session ID | Session between vMCP's client and a Playwright backend |
+| **Backend client** | MCP client instance connecting vMCP to one backend | Instance of `mcp-go/client.Client` |
+| **Session-scoped** | Resources that live for the duration of a vMCP session | Backend clients created at init, closed at session expiration |
+| **Session ID** | Unique identifier for a vMCP session (UUID) | Returned in `Mcp-Session-Id` header |
+
+**Key distinction**: A single **vMCP session** owns multiple **backend clients**, each potentially maintaining its own **backend session** with the respective backend MCP server.
+
 ## Problem Statement
 
 ### The Symptom: Per-Request Client Lifecycle
@@ -128,7 +147,7 @@ sequenceDiagram
     participant Hook as OnRegisterSession
     participant VMCPSess as VMCPSession
 
-    Client->>DiscoveryMW: POST /mcp/initialize
+    Client->>DiscoveryMW: POST /mcp<br/>(method: initialize)
     Note over DiscoveryMW: Capabilities discovered<br/>BEFORE session exists
     DiscoveryMW->>DiscoveryMW: Discover capabilities
     DiscoveryMW->>DiscoveryMW: Stuff into request context
@@ -162,7 +181,51 @@ This RFC proposes restructuring around three core components:
 2. **SessionFactory**: Creates fully-formed sessions with all dependencies (capability discovery, client initialization, resource allocation)
 3. **SessionManager**: Bridges domain logic (Session/SessionFactory) to MCP SDK protocol, using the existing `transportsession.Manager` for storage
 
-**Proposed session creation flow** (two-phase pattern due to SDK constraints):
+#### Multi-Client Session Model
+
+**One vMCP instance serves multiple clients**, each with their own isolated session:
+
+```
+┌─────────────┐
+│ Client A    │──┐
+│ (Claude)    │  │
+└─────────────┘  │     ┌────────────────────────────┐
+                 │     │  vMCP Server               │
+┌─────────────┐  ├────→│  Endpoint: /mcp            │
+│ Client B    │──┤     │  (Streamable HTTP)         │
+│ (VSCode)    │  │     └────────────────────────────┘
+└─────────────┘  │              │
+                 │              ├─ Session aaa-111 (Client A's session)
+┌─────────────┐  │              ├─ Session bbb-222 (Client B's session)
+│ Client C    │──┘              └─ Session ccc-333 (Client C's session)
+│ (Cursor)    │
+└─────────────┘
+```
+
+**How sessions are created per the MCP Streamable HTTP protocol:**
+
+- vMCP exposes **one HTTP endpoint** (e.g., `POST /mcp`) per [MCP Streamable HTTP spec](https://spec.modelcontextprotocol.io/specification/2025-06-18/basic/transports/)
+- **All JSON-RPC methods** (`initialize`, `tools/call`, `resources/read`, etc.) go to the **same endpoint**
+  - Method specified in JSON-RPC body, not URL path
+  - Example: `POST /mcp` with body `{"jsonrpc": "2.0", "method": "initialize", ...}`
+- **Each client** that sends an `InitializeRequest` triggers creation of a **new unique session**
+- The server returns a unique session ID via the `Mcp-Session-Id` header
+- Clients include their session ID in the `Mcp-Session-Id` header on all subsequent requests
+- **Sessions are created dynamically when clients connect**, not pre-created at vMCP startup
+- Each session owns its own backend clients (no sharing between sessions)
+
+**vMCP session vs backend sessions** (see [Terminology](#terminology) for definitions):
+
+Clients interact with a **single vMCP session ID** (e.g., `aaa-111`), but vMCP internally manages **multiple backend session IDs**—one per backend MCP server. vMCP maintains an explicit `[vMCP_session_id] -> {backend_id: backend_session_id}` mapping for observability (logging, metrics, health checks, debugging).
+
+**Key behaviors**:
+- Client objects include backend session IDs in request headers (MCP protocol requirement)
+- vMCP tracks backend session IDs separately (production observability requirement)
+- Each vMCP session owns isolated backend clients and session mappings
+
+For detailed lifecycle management (initialization, runtime, re-initialization, cleanup), see [Backend Session ID Lifecycle Management](#3-session-implementation).
+
+**Proposed session creation flow (per client, two-phase pattern due to SDK constraints):**
 
 ```mermaid
 sequenceDiagram
@@ -173,19 +236,22 @@ sequenceDiagram
     participant Aggregator
     participant Session
 
-    Client->>SDK: POST /mcp/initialize
+    Note over Client,Session: Session Creation (happens for EACH client that connects)
+
+    Client->>SDK: POST /mcp<br/>(method: initialize)
     Note over SDK: SDK handles initialize,<br/>needs session ID
     SDK->>SessionMgr: Generate() // No context!
-    Note over SessionMgr: Phase 1: Create empty<br/>session placeholder
+    Note over SessionMgr: Phase 1: Create NEW unique<br/>session ID and placeholder
     SessionMgr->>SessionMgr: Store empty session by ID
-    SessionMgr-->>SDK: sessionID
+    SessionMgr-->>SDK: sessionID (unique per client)
     SDK->>SessionMgr: OnRegisterSession(sessionID, ctx)
     Note over SessionMgr: Phase 2: Now we have context!<br/>Populate the session
     SessionMgr->>Factory: MakeSession(ctx, identity, backends)
     Factory->>Factory: Create clients for each backend
+    Note over Factory: Each client sends Initialize to<br/>its backend, receives backend's<br/>session ID (stored by client)<br/>Factory captures session IDs<br/>via client.SessionID()
     Factory->>Aggregator: AggregateCapabilities(ctx, clients)
     Aggregator-->>Factory: Capabilities (tools, resources, routing)
-    Factory->>Session: new Session(capabilities, clients)
+    Factory->>Session: new Session(capabilities, clients, backendSessionIDs)
     Session-->>Factory: session
     Factory-->>SessionMgr: session
     SessionMgr->>SessionMgr: Replace placeholder with real session
@@ -193,18 +259,222 @@ sequenceDiagram
     Session-->>SessionMgr: capabilities
     SessionMgr->>SDK: AddSessionTools(sessionID, tools)
     SDK-->>Client: InitializeResult with Mcp-Session-Id header
+
+    Note over Client,Session: Client stores session ID, uses in all subsequent requests
 ```
 
-**Key Flow (two-phase creation):**
-1. SDK receives initialize request, needs session ID
-2. **Phase 1**: SDK calls `Generate()` (no context) → SessionManager creates empty session placeholder, stores it, returns session ID
-3. **Phase 2**: SDK calls `OnRegisterSession(sessionID, ctx)` (has context) → SessionManager calls `SessionFactory.MakeSession()` to create fully-formed session (capability discovery, client initialization)
-4. SessionManager replaces empty placeholder with fully-formed session
-5. SDK registers session's tools/resources via `AddSessionTools()`
-6. Client receives InitializeResult with session ID header
+**Key Flow (two-phase creation per client):**
+
+This flow executes **independently for each client** that connects to vMCP:
+
+1. **Client sends Initialize**: Client (e.g., Claude Desktop, VSCode) sends `POST /mcp` with JSON-RPC method `initialize` to vMCP endpoint
+2. **Phase 1 - Generate unique session ID**: SDK calls `Generate()` (no context) → SessionManager creates a **new unique session ID** (e.g., UUID), stores empty placeholder, returns ID
+3. **Phase 2 - Build full session**: SDK calls `OnRegisterSession(sessionID, ctx)` (has context) → SessionManager calls `SessionFactory.MakeSession()` to create fully-formed session with:
+   - Backend client initialization (one client per backend, each performs MCP `initialize` handshake with its backend; backends respond with their own session IDs which are stored by the client)
+   - Capability discovery (tools, resources, prompts) using the initialized clients
+   - Routing table construction
+4. **Replace placeholder**: SessionManager replaces empty placeholder with fully-formed session
+5. **Register capabilities**: SDK registers session's tools/resources via `AddSessionTools()`
+6. **Return session ID to client**: Client receives `InitializeResult` with `Mcp-Session-Id: <unique-id>` header
+
+**Critical point**: When Client A and Client B both send Initialize requests:
+- Client A triggers `Generate()` → gets session ID `aaa-111` → gets Session A with its own backend clients
+- Client B triggers `Generate()` → gets session ID `bbb-222` → gets Session B with its own backend clients
+- Sessions are **completely isolated** - no shared state, no shared clients
 
 **Why two phases?** The SDK's `Generate()` has no context parameter (see "SDK Interface Constraint" section below), so we cannot create fully-formed sessions until `OnRegisterSession` hook provides request context.
 
+#### Multi-Client Isolation
+
+To illustrate how multiple clients get isolated sessions:
+
+```mermaid
+sequenceDiagram
+    participant CA as Client A<br/>(Claude)
+    participant CB as Client B<br/>(VSCode)
+    participant vMCP as vMCP Server<br/>(Single Endpoint)
+    participant SessionMgr as SessionManager
+    participant SA as Session A
+    participant SB as Session B
+
+    Note over CA,SB: Both clients connect to same vMCP endpoint
+
+    CA->>vMCP: POST /mcp<br/>(method: initialize)
+    vMCP->>SessionMgr: Generate()
+    SessionMgr->>SessionMgr: Create session ID "aaa-111"
+    SessionMgr->>SA: Create Session A with backend clients
+    SessionMgr-->>vMCP: sessionID "aaa-111"
+    vMCP-->>CA: Mcp-Session-Id: aaa-111
+
+    CB->>vMCP: POST /mcp<br/>(method: initialize)
+    vMCP->>SessionMgr: Generate()
+    SessionMgr->>SessionMgr: Create session ID "bbb-222"
+    SessionMgr->>SB: Create Session B with backend clients
+    SessionMgr-->>vMCP: sessionID "bbb-222"
+    vMCP-->>CB: Mcp-Session-Id: bbb-222
+
+    Note over CA,SB: Clients use their session IDs in subsequent requests
+
+    CA->>vMCP: POST /mcp<br/>(method: tools/call)<br/>Mcp-Session-Id: aaa-111
+    vMCP->>SessionMgr: Get session "aaa-111"
+    SessionMgr->>SA: CallTool(...)
+    SA-->>CA: Tool result
+
+    CB->>vMCP: POST /mcp<br/>(method: tools/call)<br/>Mcp-Session-Id: bbb-222
+    vMCP->>SessionMgr: Get session "bbb-222"
+    SessionMgr->>SB: CallTool(...)
+    SB-->>CB: Tool result
+
+    Note over CA,SB: Sessions are completely isolated
+```
+
+**Key points:**
+- **Single endpoint serves all clients**: All clients POST to the same `/mcp` endpoint
+- **Unique session per Initialize**: Each `InitializeRequest` creates a new session with unique ID
+- **Session isolation**: Session A and Session B have separate backend clients, routing tables, and state
+- **Session ID routing**: `Mcp-Session-Id` header routes requests to the correct session
+
+#### Session Routing Mechanism (How vMCP Associates Requests to Sessions)
+
+**Critical question**: With a single endpoint (`POST /mcp`) serving multiple clients, how does vMCP know which session each request belongs to?
+
+**Answer**: The `Mcp-Session-Id` HTTP header (per [MCP Streamable HTTP spec](https://spec.modelcontextprotocol.io/specification/2025-06-18/basic/transports/)).
+
+**Step-by-step flow:**
+
+1. **Initialize request (no session yet)**:
+   ```http
+   POST /mcp
+   Content-Type: application/json
+   # NO Mcp-Session-Id header
+
+   {"jsonrpc": "2.0", "method": "initialize", ...}
+   ```
+
+   vMCP logic:
+   ```go
+   // Request has NO Mcp-Session-Id header → This is initialization
+   sessionID := sessionManager.Generate()  // Create new session "aaa-111"
+
+   // Return session ID to client
+   w.Header().Set("Mcp-Session-Id", sessionID)  // "aaa-111"
+   w.Write(initializeResult)
+   ```
+
+2. **Client receives and stores session ID**:
+   ```http
+   HTTP/1.1 200 OK
+   Mcp-Session-Id: aaa-111   # ← Client stores this
+   Content-Type: application/json
+
+   {"result": {...}}
+   ```
+
+3. **Subsequent request (with session ID)**:
+   ```http
+   POST /mcp
+   Mcp-Session-Id: aaa-111   # ← Client includes stored ID
+   Content-Type: application/json
+
+   {"jsonrpc": "2.0", "method": "tools/call", "params": {...}}
+   ```
+
+   vMCP logic:
+   ```go
+   // Request HAS Mcp-Session-Id header → Look up existing session
+   sessionID := r.Header.Get("Mcp-Session-Id")  // "aaa-111"
+
+   session := sessionStorage.Get(sessionID)  // Retrieve Session A
+   if session == nil {
+       return Error("session not found or expired")
+   }
+
+   // Route to session's backend
+   result := session.CallTool(ctx, toolName, args)
+   w.Write(result)
+   ```
+
+**How vMCP determines request type:**
+
+| Scenario | `Mcp-Session-Id` Header | vMCP Action |
+|----------|------------------------|-------------|
+| **First Initialize** | ❌ Not present | `Generate()` → Create new session → Return new ID |
+| **Subsequent call** | ✅ Present (e.g., `aaa-111`) | `Get(sessionID)` → Retrieve session → Route request |
+| **Invalid session ID** | ✅ Present but expired/invalid | Return `404 Session Not Found` |
+| **Re-initialize after expire** | ❌ Not present (client deleted ID) | `Generate()` → Create new session |
+
+**Session storage and lookup:**
+
+```go
+type SessionManager struct {
+    sessionStorage *transportsession.Manager  // Maps sessionID → Session
+}
+
+func (sm *SessionManager) HandleRequest(r *http.Request, w http.ResponseWriter) {
+    // Parse JSON-RPC request
+    var req struct {
+        JSONRPC string      `json:"jsonrpc"`
+        Method  string      `json:"method"`
+        Params  interface{} `json:"params"`
+        ID      interface{} `json:"id"`
+    }
+    json.NewDecoder(r.Body).Decode(&req)
+
+    sessionID := r.Header.Get("Mcp-Session-Id")
+
+    if sessionID == "" {
+        // No session ID → Must be Initialize request
+        if req.Method != "initialize" {
+            http.Error(w, "Session required", 400)
+            return
+        }
+        newSessionID := sm.Generate()
+        // ... handle initialization ...
+        w.Header().Set("Mcp-Session-Id", newSessionID)
+        return
+    }
+
+    // Session ID present → Look up existing session
+    session := sm.sessionStorage.Get(sessionID)
+    if session == nil {
+        http.Error(w, "Session not found or expired", 404)
+        return
+    }
+
+    // Route request to session based on JSON-RPC method
+    switch req.Method {
+    case "tools/call":
+        result := session.CallTool(req.Params)
+    case "resources/read":
+        result := session.ReadResource(req.Params)
+    case "prompts/get":
+        result := session.GetPrompt(req.Params)
+    default:
+        http.Error(w, "Unknown method", 400)
+    }
+}
+```
+
+**Parallel clients (completely independent):**
+
+```
+Time T0: Both clients send Initialize (no session headers)
+  Client A → POST /mcp (no header) → Generate() → Session "aaa-111"
+  Client B → POST /mcp (no header) → Generate() → Session "bbb-222"
+
+Time T1: Both clients make tool calls (with their session headers)
+  Client A → POST /mcp (Mcp-Session-Id: aaa-111) → Get("aaa-111") → Session A
+  Client B → POST /mcp (Mcp-Session-Id: bbb-222) → Get("bbb-222") → Session B
+```
+
+**Sessions never collide** because:
+- ✅ Session IDs are UUIDs (globally unique)
+- ✅ Each Generate() call creates NEW unique ID
+- ✅ Clients explicitly include their session ID in header
+- ✅ vMCP maps requests to sessions via header lookup
+
+**This is standard MCP Streamable HTTP protocol behavior** - vMCP doesn't invent this mechanism, it follows the spec.
+
 ### Terminology Clarification
 
 This RFC uses the following terms consistently:
@@ -234,6 +504,9 @@ type Session interface {
     Resources() []Resource
     Prompts() []Prompt
 
+    // Backend session tracking - for observability and debugging
+    BackendSessions() map[string]string  // Returns map of backendID -> backendSessionID
+
     // MCP operations - routing logic is encapsulated here
     CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error)
     ReadResource(ctx context.Context, uri string) (*ResourceResult, error)
@@ -282,6 +555,9 @@ type Session interface {
     Resources() []Resource
     Prompts() []Prompt
 
+    // Backend session tracking - for observability and debugging
+    BackendSessions() map[string]string  // Returns map of backendID -> backendSessionID
+
     // MCP operations - routing logic is encapsulated here
     CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error)
     ReadResource(ctx context.Context, uri string) (*ResourceResult, error)
@@ -296,7 +572,7 @@ type Session interface {
 - **Owns backend clients**: Session stores pre-initialized MCP clients in an internal map
 - **Encapsulates routing**: `CallTool()` looks up tool in routing table, routes to correct backend client
 - **Manageable lifetime**: `Close()` cleans up clients and any other resources. The SessionManager/caller is decoupled from what exactly happens on close()
-- **Thread-safe**: All methods protected by internal RWMutex for concurrent access to prevent data races on internal state. Read operations (Tools, Resources, Prompts, CallTool, ReadResource, GetPrompt) use read locks; write operations (Close) use write locks. Methods like `Tools()`, `Resources()`, `Prompts()` return defensive copies (not references to internal slices) to prevent caller mutations. Internal maps and slices are never exposed directly. Note: The RWMutex prevents data races but does not prevent `CallTool()` from using a closed client after `Close()` completes—the session implementation must track closed state explicitly (see "Client Closed Mid-Request" in Error Handling). Proper encapsulation may allow removal of mutexes elsewhere (e.g., SessionManager, current VMCPSession) since state is now owned by the Session
+- **Thread-safe**: Internal RWMutex protects access to internal data structures (routing table, client map, closed flag) but is NOT held across network I/O—locks are acquired only while reading/mutating state, then released before invoking client methods. Methods like `Tools()`, `Resources()`, `Prompts()` return defensive copies to prevent caller mutations. Internal maps and slices are never exposed directly. **Coordination with Close()**: Since locks are released before network I/O, the session uses **context cancellation** (not locks) to coordinate—operations check `ctx.Done()` before using clients, and `Close()` cancels the context. This prevents holding locks during long network operations while providing safe cleanup (see "Client Closed Mid-Request" in Error Handling for details). Proper encapsulation may allow removal of mutexes elsewhere (e.g., SessionManager, current VMCPSession) since state is now owned by the Session
 
 **Separation of concerns**: The Session interface focuses on domain logic (capabilities, routing, client ownership). This RFC builds on the existing pluggable session storage architecture ([PR #1989](https://github.com/stacklok/toolhive/pull/1989)) which provides `Storage` interface and `Manager` for Redis/Valkey support.
 
@@ -335,13 +611,15 @@ type SessionFactory interface {
 
 The default factory implementation follows this pattern:
 
-1. **Create MCP clients**: Initialize clients for each backend **in parallel**
+1. **Create MCP clients and capture backend session IDs**: Initialize clients for each backend **in parallel**
    - Factory creates one client per backend using existing client factory
+   - **Client initialization includes MCP handshake**: Each client sends `InitializeRequest` to its backend, and the backend responds with capabilities and its own `Mcp-Session-Id`. The client stores the session ID for protocol compliance (includes it in subsequent request headers).
+   - **Capture backend session IDs**: Factory also captures each backend's session ID (via `client.SessionID()`) for observability, storing them in a map to pass to the session
    - **Performance requirement**: Use parallel initialization (e.g., `errgroup` with bounded concurrency) to avoid sequential latency accumulation. Connection initialization (TCP handshake + TLS negotiation + MCP protocol handshake) can take tens to hundreds of milliseconds per backend depending on network latency and backend responsiveness. With 20 backends, sequential initialization could easily exceed acceptable session creation latency.
    - **Bounded concurrency**: Limit parallel goroutines (e.g., 10 concurrent initializations) to avoid resource exhaustion
    - **Per-backend timeout**: Apply context timeout (e.g., 5s per backend) so one slow backend doesn't block session creation
    - **Partial initialization**: If some backends fail, log warnings and continue with successfully initialized backends (failed backends not added to clients map)
-   - Clients are connection-ready but not yet used
+   - Clients are connection-ready and stateful (each maintains its backend session for protocol use)
 
 2. **Capability discovery**: Pass clients to `aggregator.AggregateCapabilities(ctx, clients)`
    - Aggregator uses provided clients to query capabilities (tools, resources, prompts) from each backend
@@ -355,8 +633,53 @@ The default factory implementation follows this pattern:
    - Routing table (maps tool names to backend workload IDs)
    - Tools, resources, prompts (aggregated from backends)
    - Pre-initialized clients map (keyed by workload ID) - same clients used for discovery
+   - Backend session IDs map (keyed by workload ID) - captured from `client.SessionID()` calls
+
+**Key Design Decision**: Clients are created once and threaded through: factory → aggregator → session. This eliminates redundant connection setup (TCP handshake + TLS + MCP initialization) while maintaining separation of concerns.
+
+**Example: Factory capturing backend session IDs**:
 
-**Key Design Decision**: Clients are created once by the factory and threaded through: factory → aggregator (for discovery) → session (for reuse). This avoids redundant connection setup while maintaining clean separation of concerns (factory owns client creation, aggregator only queries capabilities).
+```go
+func (f *defaultSessionFactory) MakeSession(ctx context.Context, identity *auth.Identity, backends []Backend) (Session, error) {
+    clients := make(map[string]*Client)
+    backendSessions := make(map[string]string)  // NEW: Track backend session IDs
+
+    // Create clients in parallel (simplified for brevity)
+    for _, backend := range backends {
+        client, err := f.clientFactory.CreateClient(ctx, backend, identity)
+        if err != nil {
+            log.Warn("Failed to initialize backend %s: %v", backend.ID, err)
+            continue
+        }
+
+        clients[backend.ID] = client
+
+        // Capture backend session ID for observability
+        if sessionID := client.SessionID(); sessionID != "" {
+            backendSessions[backend.ID] = sessionID
+            log.Debug("Backend %s initialized with session %s", backend.ID, sessionID)
+        }
+    }
+
+    // Aggregate capabilities using the clients
+    capabilities, err := f.aggregator.AggregateCapabilities(ctx, clients)
+    if err != nil {
+        return nil, fmt.Errorf("capability aggregation failed: %w", err)
+    }
+
+    // Create session with clients AND backend session IDs
+    return &defaultSession{
+        vmcpSessionID:   uuid.New().String(),
+        identity:        identity,
+        clients:         clients,
+        backendSessions: backendSessions,  // Pass the captured session IDs
+        routingTable:    capabilities.RoutingTable,
+        tools:           capabilities.Tools,
+        resources:       capabilities.Resources,
+        prompts:         capabilities.Prompts,
+    }, nil
+}
+```
 
 **Updated Aggregator Interface**:
 
@@ -366,21 +689,58 @@ The aggregator interface takes clients as input instead of creating them interna
 AggregateCapabilities(ctx context.Context, clients map[string]*Client) (*AggregationResult, error)
 ```
 
-**Rationale**: Separates client lifecycle (factory's concern) from capability discovery (aggregator's concern). The factory creates clients once, passes them to aggregator for discovery, then passes same clients to session for reuse.
-
-**Performance impact**: Reusing discovery clients eliminates redundant connection setup (TCP handshake + TLS negotiation + MCP initialization), avoiding connection overhead on every request (this overhead varies significantly based on network latency, TLS configuration, and backend responsiveness).
+**Rationale**: Separates client lifecycle (factory's concern) from capability discovery (aggregator's concern).
 
 #### 3. Session Implementation
 
 **Internal structure** (`pkg/vmcp/session/default_session.go`):
 
 The default session implementation stores:
-- Session ID and user identity
+- **vMCP session ID** and user identity (this is the session ID clients use)
 - Routing table mapping tool/resource names to backend workload IDs
 - Discovered capabilities (tools, resources, prompts)
 - Pre-initialized backend clients map (keyed by workload ID)
+- **Backend session IDs map** (keyed by workload ID) - explicit `[vMCP_session_id] -> [backend_session_ids]` mapping
+  - Stored separately from client objects for observability and lifecycle management
+  - Captured during client initialization when backends return `Mcp-Session-Id` headers
+  - Used for: logging, metrics, health checks, debugging, and explicit session cleanup
+  - Updated when clients are re-initialized (e.g., after backend session expiration)
 - RWMutex for thread-safe access (read lock for queries/calls, write lock for Close)
 
+**Backend session ID lifecycle management:**
+
+The session maintains an explicit mapping of `[vMCP_session_id] -> {backend_id: backend_session_id}` throughout its lifetime:
+
+1. **Initialization** (`SessionFactory.MakeSession()`):
+   - Factory creates clients for each backend (which send `InitializeRequest`)
+   - Backends respond with `Mcp-Session-Id` headers
+   - Factory captures session IDs and passes them to session constructor
+   - Session stores the mapping: `backendSessions[backendID] = sessionID`
+
+2. **Runtime operations** (`CallTool`, `ReadResource`):
+   - Client objects include backend session IDs in request headers (handled by MCP client library)
+   - vMCP uses stored session IDs for logging, metrics, and observability
+   - Example log: `"Calling tool 'create_pr' on backend 'github' (backend_session: abc-123)"`
+
+3. **Re-initialization** (when backend session expires):
+   - Detect expiration via 404/"session expired" error
+   - Create new client (sends new `InitializeRequest`)
+   - Backend responds with new `Mcp-Session-Id`
+   - Update mapping: `backendSessions[backendID] = newSessionID`
+   - Log: `"Backend 'github' session re-initialized: old=abc-123, new=def-456"`
+
+4. **Cleanup** (`Close()`):
+   - Iterate through all clients and call `Close()`
+   - Closing clients implicitly terminates backend sessions (connection close)
+   - Clear backend session mapping: `backendSessions = nil`
+   - Log: `"Closed vMCP session xyz-789 with N backend sessions"`
+
+5. **Observability uses**:
+   - Health endpoints: `GET /health` returns active backend sessions per vMCP session
+   - Metrics: Track backend session churn rate, re-initialization frequency
+   - Debugging: Associate tool call errors with specific backend sessions
+   - Audit logs: Record which backend sessions were used for compliance
+
 **Method behaviors:**
 
 - **`CallTool(ctx, name, args)`**: Looks up tool in routing table to find target backend, retrieves pre-initialized client from map, calls backend with original (un-prefixed) tool name. Tool names may be prefixed for conflict resolution (e.g., `github__create_pr`), but backend receives original name (`create_pr`).
@@ -391,11 +751,11 @@ The default session implementation stores:
 
 **Thread safety**: Read operations (CallTool, ReadResource) use read locks; Close uses write lock.
 
-**Health monitoring integration**: Sessions interact with the existing global health monitor (`pkg/vmcp/health`) to handle backend failures:
-- **Session creation**: `SessionFactory.MakeSession()` consults health monitor to exclude unhealthy/circuit-broken backends (don't create clients for known-bad backends)
-- **Mid-session failures**: When `CallTool()` fails (connection error, timeout), mark backend unhealthy in session-local state and delegate error reporting to global health monitor
-- **Recovery**: On subsequent calls to unhealthy backend, check global health monitor; if recovered, lazily recreate client
-- **Design approach**: Global health monitor maintains shared circuit breaker state across sessions. Individual sessions track local backend health (unhealthy flag) to avoid redundant connection attempts while allowing recovery when backends come back online. Implementation details deferred to Phase 1-2.
+**Session lifecycle**: Sessions own backend clients for their lifetime:
+- **Session creation**: `SessionFactory.MakeSession()` creates backend clients during initialization
+- **Mid-session failures**: If backend client call fails, error is returned to the caller. Session remains active.
+- **Session continuity**: Session remains active even if individual backend calls fail. Other backends continue operating normally.
+- **No automatic recovery**: Sessions do not attempt to recreate failed clients. Client must handle errors (retry, use different tool, or reinitialize session).
 
 #### 4. Storage Integration and Distributed Deployment
 
@@ -412,8 +772,8 @@ Sessions are split into two layers with different lifecycles:
 
 | Layer | Contents | Storage | Lifetime |
 |-------|----------|---------|----------|
-| **Metadata** | Session ID, timestamps, identity reference, backend IDs list | Serializable to Redis/Valkey via `Storage` interface | Persists across vMCP restarts, shared across instances |
-| **Runtime** | MCP client objects, routing table, capabilities, closed flag | In-process memory only | Lives only while vMCP instance is running |
+| **Metadata** | vMCP session ID, timestamps, identity reference, backend IDs list | Serializable to Redis/Valkey via `Storage` interface | Persists across vMCP restarts, shared across instances |
+| **Runtime** | MCP client objects, backend session IDs map (`backendID -> sessionID`), routing table, capabilities, closed flag | In-process memory only | Lives only while vMCP instance is running |
 
 **How it works**:
 - The behavior-oriented `Session` embeds `transportsession.Session` (metadata layer) and owns MCP clients (runtime layer)
@@ -433,7 +793,7 @@ Without sticky sessions (graceful degradation):
 - Subsequent requests to same instance reuse recreated clients
 - Trade-off: horizontal scaling flexibility vs temporary performance impact on instance switch
 
-**Implementation detail**: See Phase 1 in Implementation Plan for how `vmcpSession` struct composes `transportsession.Session` (metadata) with runtime state (clients map, routing table).
+**Implementation detail**: See Phase 1 in Implementation Plan for how `defaultSession` struct composes `transportsession.Session` (metadata) with runtime state (clients map, routing table).
 
 #### 5. Wiring into the MCP SDK
 
@@ -551,7 +911,13 @@ With the two-phase creation pattern:
      - Client can query session but sees empty tool list
      - Client may delete session and re-initialize, or backend recovery may populate capabilities later
 
-**Rationale**: The two-phase creation pattern (empty session + populate via hook) is necessary because the SDK's `Generate()` must return a session ID. Additionally, the SDK's `OnRegisterSession` hook does not allow returning an error, so failures during `MakeSession()` cannot be propagated directly. Instead of storing initialization errors that require checks in every method (`InitError()` pattern is error-prone - easy to forget checks when adding new methods), we create sessions with empty capabilities. Failed backends don't advertise tools/resources, so clients never try to call them. This is simpler and safer than requiring defensive `InitError()` checks throughout the codebase.
+**Rationale**: The two-phase creation pattern (empty session + populate via hook) is necessary because the SDK's `Generate()` must return a session ID. Additionally, the SDK's `OnRegisterSession` hook does not allow returning an error, so failures during `MakeSession()` cannot be propagated directly. Instead of storing initialization errors that require checks in every method (`InitError()` pattern is error-prone - easy to forget checks when adding new methods), we create sessions with empty capabilities. Failed backends don't advertise tools/resources, so clients never try to call them.
+
+**Error messages for empty-capability sessions**: If a client attempts to call a tool on a session with empty capabilities (due to complete initialization failure), the session returns a clear error:
+- **NOT**: "Tool not found" (misleading - implies tool doesn't exist in general)
+- **INSTEAD**: "Session has no available tools - backend initialization failed during session creation" (accurate - indicates setup failure)
+
+This is simpler and safer than requiring defensive `InitError()` checks throughout the codebase.
 
 **Partial Backend Initialization**:
 
@@ -568,8 +934,77 @@ If some backends fail during `MakeSession()`:
 If a tool call fails after successful session creation:
 - Return error to client (existing behavior)
 - Client remains usable for subsequent requests
-- No automatic client re-initialization (clients live for session lifetime)
 - Health monitoring tracks backend health (existing behavior)
+- See "Mid-Session Backend Failures" below for recovery strategies
+
+**Mid-Session Backend Failures**:
+
+⚠️ **Critical scenario**: A session successfully initialized with 5 backend clients, but during the session lifetime, one backend crashes, times out, or terminates its connection while the other 4 backends remain healthy.
+
+**Example**:
+```
+Session ABC-123 (30 minutes old):
+  ✅ Backend 1 (filesystem) - HEALTHY
+  ❌ Backend 2 (database)   - CRASHED (connection lost)
+  ✅ Backend 3 (browser)    - HEALTHY
+  ✅ Backend 4 (slack)      - HEALTHY
+  ✅ Backend 5 (github)     - HEALTHY
+
+Client calls tool on Backend 2 → Connection error
+```
+
+**Question**: Should the entire session be terminated, or can it continue with the remaining healthy backends?
+
+**Proposed Strategy: Simple Error Propagation**
+
+The session **continues operating** with remaining backends. If a backend client fails, the error is returned to the client.
+
+**Implementation approach**:
+
+```go
+func (s *Session) CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error) {
+    backend := s.routingTable.Lookup(name)
+    client := s.clients[backend.ID]
+
+    // Call backend client - if it fails, return the error
+    result, err := client.CallTool(ctx, name, arguments)
+    if err != nil {
+        return nil, fmt.Errorf("backend %s call failed: %w", backend.ID, err)
+    }
+
+    return result, nil
+}
+```
+
+**Behavior summary**:
+
+| Failure Type | Session Behavior | Client Action |
+|--------------|------------------|---------------|
+| **Single backend crashes** | Session continues with other backends | Returns error for that backend, client can retry or use other tools |
+| **Multiple backends fail** | Session continues with remaining healthy backends | Returns errors for failed backends |
+| **All backends fail** | Session remains active but all tool calls fail | Client should delete session and reinitialize |
+| **Backend times out** | Return timeout error, mark backend unhealthy | Client can retry or switch to different tool |
+| **Backend terminates (graceful)** | Detect via connection close, mark unavailable | Client receives error, can retry later |
+| **Backend terminates (crash)** | Detect on next call (connection error) | Client receives error indicating backend unavailable |
+
+**Benefits of this approach**:
+1. ✅ **Session resilience** - One backend failure doesn't kill entire session
+2. ✅ **Simplicity** - No automatic retry/recovery logic complexity
+3. ✅ **Stateful workflows preserved** - Healthy backends (e.g., browser session) remain unaffected
+4. ✅ **Clear error messages** - Client knows exactly which backend failed
+
+**Client experience**:
+- **Call to failed backend**: Returns clear error "backend X unavailable: connection lost"
+- **Retry behavior**: Client decides whether to retry, switch tools, or reinitialize
+- **Other backends**: Continue working normally, unaffected by single backend failure
+
+**Alternative considered: Fail entire session**
+- ❌ Requires client to reinitialize (loses all state)
+- ❌ Kills healthy backend connections (e.g., browser session with open tabs)
+- ❌ No automatic recovery (client must manually retry)
+- ❌ Poor user experience (disrupts workflow)
+
+**Implementation timing**: Implementation Plan Phase 1 (error propagation only - no recovery logic).
 
 **Client Closed Mid-Request**:
 
@@ -579,23 +1014,157 @@ Race condition exists: session may be terminated while a request is in flight:
 - **Timing issue**: Thread A checks `closed == false` → releases read lock → Thread B calls `Close()` (acquires write lock, closes clients, sets `closed = true`) → Thread A invokes client method on closed client
 - In-flight requests receive "client closed" error from MCP library when this race occurs
 
-**Required mitigation (Phase 1 implementation)**:
-- **Reference counting**: Session tracks in-flight operation count with atomic operations:
-  - `CallTool()`, `ReadResource()`, `GetPrompt()` atomically increment counter before using client, decrement after completion
-  - `Close()` sets `closed` flag, waits for counter to reach zero (with timeout, e.g., 30s), then closes clients
-  - Operations check `closed` flag with atomic load; if true, return "session closed" error without incrementing counter
-  - This eliminates the race window by ensuring `Close()` waits for all in-flight operations
-- **Graceful degradation**: If `Close()` timeout expires with operations still in-flight, log warning and close clients anyway (in-flight operations get "client closed" errors)
-- **Alternative approach**: Use context cancellation—`Close()` cancels session context, operations detect cancellation and abort before using clients
+**Mitigation strategy**:
+
+- **Context cancellation**: Session has a context that's cancelled on `Close()`
+  - `CallTool()` checks `ctx.Done()` before using client
+  - If cancelled, return "session closed" error without touching client
+  - Acceptable race window: client might get "connection closed" error in rare cases
+- **Graceful error handling**: MCP client library handles `Close()` on active connections gracefully (returns errors)
+- **TTL extension**: Session storage extends TTL on every request (via existing `Get()` call which touches session), reducing likelihood of expiration during active use
+
+**Backend MCP Server Session Expiration**:
+
+⚠️ **Critical scenario**: Backend MCP servers have their **own session management** independent of vMCP. The backend might expire its session while vMCP's session is still active.
+
+**Example**:
+```
+vMCP Session ABC-123 (created 30 minutes ago, TTL: 60 min) ✅
+  ├─ vMCP Backend Client 1 → Backend MCP Server 1
+  │    vMCP side: Connection alive, client reused ✅
+  │    Backend side: Session expired (10 min TTL) ❌
+  │
+  │  Client calls tool → Backend returns:
+  │  HTTP 404 "session not found" or
+  │  JSON-RPC error "session expired"
+```
+
+**Root cause**: Backend MCP servers can:
+- Have shorter session TTLs than vMCP (e.g., backend: 10 min, vMCP: 30 min)
+- Restart and lose all session state
+- Terminate sessions due to inactivity
+- Use different session management strategies
+
+**Problem**: vMCP's client holds a reference to a **dead backend session**. The connection is alive, but the backend doesn't recognize the session ID.
+
+**Proposed Solution: Automatic Backend Session Re-initialization**
+
+When vMCP detects backend session expiration (404 or "session expired" error), automatically re-initialize with that backend:
+
+```go
+func (s *Session) CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error) {
+    backend := s.routingTable.Lookup(name)
+    client := s.clients[backend.ID]
+
+    result, err := client.CallTool(ctx, name, arguments)
+
+    // Detect backend session expiration
+    if isSessionExpiredError(err) {  // 404 or "session not found"
+        log.Warn("Backend %s session expired, re-initializing", backend.ID)
+
+        // Re-initialize the backend session
+        if reinitErr := s.reinitializeBackend(ctx, backend.ID); reinitErr != nil {
+            return nil, fmt.Errorf("failed to reinitialize backend %s: %w",
+                backend.ID, reinitErr)
+        }
+
+        // Retry the operation with re-initialized session
+        client = s.clients[backend.ID]  // Get updated client
+        return client.CallTool(ctx, name, args)
+    }
+
+    return result, err
+}
+
+func (s *Session) reinitializeBackend(ctx context.Context, backendID string) error {
+    s.mu.Lock()
+    defer s.mu.Unlock()
 
-**Additional mitigation**:
-- Session storage layer extends TTL on every request (via existing `Get()` call which touches session), reducing likelihood of expiration during active use
+    backend := s.getBackendConfig(backendID)
 
-**Session Not Found**:
+    // Close old client (with expired backend session)
+    if oldClient := s.clients[backendID]; oldClient != nil {
+        oldClient.Close()
+    }
 
-If client uses expired/invalid session ID:
+    // Create NEW client (triggers new Initialize handshake with backend)
+    // This sends InitializeRequest to backend MCP server
+    newClient, err := s.clientFactory.CreateClient(ctx, backend, s.identity)
+    if err != nil {
+        return fmt.Errorf("client creation failed: %w", err)
+    }
+
+    // Backend responds with new session ID (in Mcp-Session-Id header)
+    // The client stores it for subsequent requests (protocol requirement)
+
+    // Capture the new backend session ID for observability
+    newBackendSessionID := newClient.SessionID()  // Assuming client exposes this method
+
+    // Update session state
+    oldSessionID := s.backendSessions[backendID]
+    s.backendSessions[backendID] = newBackendSessionID
+    s.clients[backendID] = newClient
+
+    log.Info("Backend %s session re-initialized: old=%s, new=%s",
+        backendID, oldSessionID, newBackendSessionID)
+
+    log.Info("Backend %s re-initialized with new session", backendID)
+    return nil
+}
+```
+
+**Flow**: Detect expiration (404/"session expired") → Close old client → Create new client (new backend session) → Update mapping → Retry operation.
+
+**Trade-offs**:
+
+⚠️ **State loss on backend**: Re-initialization creates a **new backend session**, losing any stateful context:
+- **Playwright backends**: Browser tabs, cookies, auth state lost
+- **Database backends**: Open transactions, temporary tables lost
+- **File system backends**: Working directory, file handles lost
+
+**Mitigation strategies for stateful backends**:
+
+⚠️ **Problem**: Stateful backends (Playwright, databases) lose state on re-initialization. Most stateful resources (browser tabs, open transactions) cannot be serialized and restored.
+
+**Mitigations (best effort)**:
+
+1. **Optional keepalive** (prevents expiration):
+   - vMCP can periodically call low-cost operations on backends (e.g., `ping`, `listCapabilities`)
+   - Extends backend session TTL while vMCP session is active
+   - **Limitation**: Not all MCP servers support ping/low-cost operations
+   - **Approach**: Best effort - attempt keepalive, gracefully handle backends that don't support it
+   - **Configuration**: Enable per backend, configurable interval (default: 5 min)
+
+2. **Session TTL alignment**:
+   - Configure backend session TTLs longer than vMCP session TTL
+   - Reduces likelihood of backend expiration during active vMCP session
+
+3. **Graceful state loss**:
+   - When backend session expires, vMCP re-initializes automatically
+   - Return warning in tool result metadata: `backend_reinitialized: true`
+   - Client can decide whether to continue or restart workflow
+
+**Fundamental limitations**:
+- Some state cannot be preserved (browser DOM, active transactions)
+- Keepalive depends on backend support (best effort)
+- Re-initialization is unavoidable on backend restart
+
+**Flow comparison**:
+
+| Scenario | vMCP Session | Backend Session | Behavior |
+|----------|--------------|-----------------|----------|
+| **Backend session expires** | ✅ Active | ❌ Expired | Automatic re-init, retry succeeds, backend state lost |
+| **Backend server crashes** | ✅ Active | ❌ Gone | Automatic reconnect, retry succeeds (if backend recovered) |
+| **vMCP session expires** | ❌ Expired | ✅ Active | Client must re-initialize vMCP session (top-level) |
+| **Both sessions active** | ✅ Active | ✅ Active | Normal operation, state preserved |
+
+**Implementation timing**: Phase 2 (automatic backend session re-initialization with state loss awareness).
+
+**Session Not Found (vMCP-level)**:
+
+If client uses expired/invalid vMCP session ID:
 - Return clear error: "session not found" or "session expired"
-- Client should re-initialize via `/mcp/initialize` endpoint to create new session
+- Client should re-initialize via `POST /mcp` with `initialize` method to create new vMCP session
 
 ## Security Considerations
 
@@ -662,7 +1231,7 @@ For initial implementation, we assume most backends use long-lived credentials (
 - **IP address validation**: Optionally validate client IP hasn't changed (breaks mobile clients, proxies - use with caution)
 - **Session rotation**: Periodically rotate session IDs (e.g., every 10 minutes) to limit stolen session lifetime
 
-**Implementation timing**: Token hash binding should be implemented during Implementation Plan Phase 2 (SessionManager implementation) before production rollout. Store `tokenHash` in session metadata, validate on each request via middleware.
+**Implementation timing**: Token hash binding is **required** for Implementation Plan Phase 2 (SessionManager implementation) - this is a **blocking requirement** before production rollout. Without this mitigation, session hijacking is possible. Implementation: Store `tokenHash` in session metadata, validate on each request via middleware.
 
 ### Data Protection
 
@@ -674,17 +1243,23 @@ For initial implementation, we assume most backends use long-lived credentials (
 
 **Client Usage During Cleanup**:
 - Race condition exists: request may use client while session is being closed
-- Mitigation: Session storage layer extends TTL on every request (via existing `Get()` call which touches session), reducing race window
-- MCP client library handles `Close()` on active connections gracefully (returns errors)
-- Handlers should catch and handle "client closed" errors appropriately
-- Future Enhancement: Add reference counting to delay `Close()` until all in-flight requests complete
+- **Mitigation**: Context cancellation + graceful error handling
+  - Session storage extends TTL on every request (reduces race window)
+  - Operations check context cancellation before using clients
+  - MCP client library handles `Close()` on active connections gracefully (returns errors)
+  - Handlers catch and handle "client closed" errors appropriately
+  - See "Error Handling → Client Closed Mid-Request" for detailed implementation
 
 **Resource Exhaustion & DoS Protection**:
 
 ⚠️ **Connection multiplication**: Session-scoped clients create N sessions × M backends = N×M backend connections. At scale (e.g., 500 sessions × 20 backends = 10,000 connections), this can lead to resource exhaustion or DoS.
 
 **Required mitigations**:
-1. **Max concurrent sessions limit**: Implement configurable limit on active sessions (e.g., `TOOLHIVE_MAX_SESSIONS=1000`). Return clear error when limit reached: "maximum concurrent sessions exceeded, try again later"
+1. **Max concurrent sessions limit**: Implement configurable limit on active sessions (e.g., `TOOLHIVE_MAX_SESSIONS=1000`).
+   - **Behavior when limit reached**: Immediately reject new `InitializeRequest` with HTTP 503 Service Unavailable
+   - **Error response**: `{"error": {"code": -32000, "message": "Maximum concurrent sessions (1000) exceeded. Please try again later or contact administrator."}}`
+   - **Client experience**: Client should retry with exponential backoff or notify user
+   - **No queueing**: Requests are rejected immediately (not queued) to prevent resource exhaustion
 2. **Per-client session limits**: Track sessions per client identity/IP, enforce per-client limits (e.g., 10 sessions per client) to prevent single client from exhausting resources
 3. **Aggressive session TTL**: Default 30-minute TTL with idle timeout (e.g., 5 minutes of inactivity) to reclaim unused sessions faster
 4. **Connection pooling consideration**: Future enhancement to share connections across sessions for same backend+identity combination (out of scope for this RFC but noted for future work)
@@ -804,6 +1379,34 @@ With the Session interface, you can unit test session routing logic without spin
 
 **After this RFC**: Session is a testable domain object. Instantiate it directly with mocked dependencies (clients, routing table), test the routing and client delegation logic in isolation.
 
+## Alternatives Considered
+
+### Alternative: Rely on Backend MCP Server Session Management
+
+**Approach**: Instead of vMCP managing session-scoped clients, rely entirely on backend MCP servers to manage their own stateful sessions via the [MCP Streamable HTTP transport session management](https://spec.modelcontextprotocol.io/specification/2025-06-18/basic/transports/).
+
+**How it would work**:
+- Each backend MCP server returns `Mcp-Session-Id` header during initialization
+- Backend manages its own session state (browser tabs, transactions, etc.)
+- vMCP passes through session IDs without managing backend client lifecycle
+- Backend responsible for session persistence and resumption
+
+**Why not chosen**:
+
+1. **Inconsistent backend support**: Not all MCP servers implement stateful sessions or session persistence. The MCP protocol makes sessions optional, and many backends are stateless (REST APIs, simple tools).
+
+2. **Protocol moving toward stateless-by-default**: The MCP community is actively working on [making the protocol stateless by default](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/1442) to improve scalability and resilience. Relying on backend statefulness goes against this direction.
+
+3. **vMCP-specific aggregation needs**: vMCP aggregates multiple backends with capability discovery, conflict resolution, and routing logic that requires vMCP-level session management. Backend sessions don't help with vMCP's aggregation layer.
+
+4. **No guarantees for stateful backends**: Even backends that support sessions (like Playwright) may not persist state across restarts or provide resumption capabilities. vMCP would still need fallback mechanisms.
+
+5. **Complex coordination**: Managing per-backend session lifecycles (different TTLs, expiration strategies, restart behaviors) across multiple backends would be more complex than managing a single vMCP session with owned clients.
+
+6. **Scaling challenges**: Per [MCP scaling discussions](https://github.com/modelcontextprotocol/modelcontextprotocol/discussions/102), stateful connections create bottlenecks for load balancing and enterprise deployments. vMCP sessions with sticky load balancing are more practical.
+
+**Conclusion**: Backend session management doesn't solve vMCP's core problems (per-request client creation, aggregation complexity). The proposed vMCP-level session architecture provides consistent behavior regardless of backend capabilities, with optional keepalive as best-effort mitigation for backends that do support stateful sessions.
+
 ## Implementation Plan
 
 This implementation introduces new interfaces and gradually migrates from the current architecture to the proposed design. Phases 1-2 are purely additive (ensuring low risk and easy rollback), while Phase 3 switches over and removes old code.
@@ -822,13 +1425,15 @@ This implementation introduces new interfaces and gradually migrates from the cu
 The `defaultSession` implementation:
 - **Embeds `transportsession.Session`** for metadata layer (ID, Type, Touch, GetData)
 - **Owns backend clients** in an internal map (runtime layer)
+- **Tracks backend session IDs** in a separate map for observability
 - **Implements both interfaces**: storage-oriented `transportsession.Session` + behavior-oriented vMCP `Session`
 
 When `SessionFactory.MakeSession()` runs:
 1. Create and initialize one MCP client per backend **in parallel** (with bounded concurrency, per-backend timeouts)
-2. Pass clients to aggregator to discover capabilities from all backends
-3. Create `transportsession.Session` for metadata layer (ID, timestamps)
-4. Return behavior-oriented session that embeds transport session + owns clients map (same clients used for discovery)
+2. Capture backend session IDs from clients (via `client.SessionID()`) for observability
+3. Pass clients to aggregator to discover capabilities from all backends
+4. Create `transportsession.Session` for metadata layer (ID, timestamps)
+5. Return behavior-oriented session that embeds transport session + owns clients map + backend session IDs map
 
 **Testing**:
 - Unit tests for `Session` interface methods (CallTool, ReadResource, Close)
@@ -875,7 +1480,7 @@ if config.SessionManagementV2 {
 - Integration tests with feature flag enabled: Create session via `SessionManager.Generate()`, verify session stored
 - Test tool handler routing: Call tool, verify session's `CallTool()` is invoked
 - Test session termination: Call `Terminate()`, verify session closed
-- Test SDK integration: Initialize session via HTTP `/mcp/initialize`, verify tools registered
+- Test SDK integration: Initialize session via `POST /mcp` (method: `initialize`), verify tools registered
 - Verify old code path still works when flag is disabled (no regressions)
 
 **Files Modified**:
@@ -926,7 +1531,13 @@ if config.SessionManagementV2 {
     "event": "session_created",
     "session_id": "sess-123",
     "backends_initialized": 3,
-    "backends_failed": 1
+    "backends_failed": 1,
+    "backend_sessions": {
+      "github": "backend-sess-abc",
+      "slack": "backend-sess-def",
+      "playwright": "backend-sess-ghi"
+    },
+    "failed_backends": ["database"]
   }
   ```
 - Add metrics:
@@ -964,13 +1575,15 @@ Each phase is independently testable and can be rolled back:
 ### Testing Strategy
 
 **Unit Tests** (each phase):
-- Session interface methods (CallTool, ReadResource, Close)
+- Session interface methods (CallTool, ReadResource, Close, BackendSessions)
 - SessionFactory capability discovery and client initialization
+- Backend session ID capture during client initialization
+- Backend session ID updates during re-initialization
 - SessionManager SDK integration (Generate, Terminate)
 - Error handling (partial initialization, client closed, session not found)
 
 **Integration Tests** (Phase 2+):
-- Session creation via HTTP `/mcp/initialize` endpoint
+- Session creation via `POST /mcp` with `initialize` method
 - Tool calls routed to correct backends via session
 - Session expiration triggers client closure
 - Multiple tool calls reuse same clients (no reconnection overhead)
@@ -981,6 +1594,24 @@ Each phase is independently testable and can be rolled back:
 - High-throughput scenarios (verify no connection leaks, reduced latency)
 - Session lifecycle (create, use, expire, cleanup)
 
+**Future Enhancement: Multi-Client Conformance Test Suite** (Not implemented in this RFC):
+
+A comprehensive test suite to validate multi-client session behavior would be valuable, but this is better suited at the **MCP protocol level** rather than vMCP-specific implementation:
+
+- **Ideal approach**: Contribute to MCP specification with standardized multi-client conformance tests
+  - Test suite validates that servers correctly handle multiple concurrent clients
+  - Each client gets unique session ID via `Mcp-Session-Id` header
+  - Sessions are properly isolated (no cross-contamination)
+  - Session lifecycle (init, use, expire, cleanup) works correctly per client
+  - Would benefit the entire MCP ecosystem, not just vMCP
+
+- **vMCP reuse**: Once MCP protocol-level test suite exists, vMCP can:
+  - Run the conformance suite against vMCP implementation
+  - Extend with vMCP-specific scenarios (aggregation, routing, backend failures)
+  - Ensure vMCP behaves as a compliant MCP server
+
+- **Current approach**: Phase 3 E2E tests cover multi-client behavior sufficiently for initial implementation. Protocol-level test suite can be contributed later as ecosystem matures.
+
 ## Documentation
 
 **Architecture Documentation** (in `toolhive` repository):
@@ -1019,6 +1650,7 @@ Each phase is independently testable and can be rolled back:
 ## References
 
 - [MCP Specification](https://modelcontextprotocol.io/specification) - Model Context Protocol specification
+- [MCP Streamable HTTP Transport](https://spec.modelcontextprotocol.io/specification/2025-06-18/basic/transports/) - Streamable HTTP transport specification including session management via `Mcp-Session-Id` header
 - [toolhive#3062](https://github.com/stacklok/toolhive/issues/3062) - Original issue: Per-request client creation overhead
 - [toolhive#2417](https://github.com/stacklok/toolhive/issues/2417) - Session storage architecture and dual-layer model (metadata vs runtime state)
 - [toolhive#2852](https://github.com/stacklok/toolhive/issues/2852) - Missing integration tests below server level
@@ -1068,6 +1700,7 @@ This decorator pattern allows the optimizer to be added without modifying core s
 | 2026-02-04 | @yrobla, @jerm-dro | Draft | Initial submission |
 | 2026-02-09 | @jerm-dro, Copilot | Under Review | Addressed PR #38 comments |
 | 2026-02-12 | @jerm-dro, @ChrisJBurns | Under Review | Major revisions: Integrated with existing transportsession.Manager architecture, added dual-layer storage model, addressed security concerns (session hijacking, credential exposure, resource exhaustion), clarified error handling (removed InitError pattern), added parallel client initialization, simplified SessionManager design |
+| 2026-02-16 | @yrobla, Claude Sonnet 4.5 | Under Review | Multi-client session clarifications: Added explicit multi-client session model section, documented session routing mechanism via Mcp-Session-Id header, added mid-session backend failure handling with automatic recovery and circuit breaker, addressed backend MCP server session expiration with optional keepalive (best effort), added "Alternatives Considered" section explaining why not relying on backend session management, added future enhancement for MCP-level multi-client conformance test suite, clarified single endpoint serves multiple isolated sessions per MCP Streamable HTTP spec |
 
 ### Implementation Tracking