Skip to content

Commit 6eb788a

Browse files
type contracts + impl plan created
1 parent 8c0a2e0 commit 6eb788a

File tree

2 files changed

+94
-0
lines changed

2 files changed

+94
-0
lines changed
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Goal Description
2+
Implement a "Day 2" reliability layer for the simstudioai/sim workflow engine by building a composable Resilience Interceptor/Middleware Pipeline for the MCP `executeTool` logic. This pipeline ensures enterprise-grade stability by introducing a Circuit Breaker State Machine, Zod-based Schema Enforcement for LLM outputs, and detailed Telemetry for latency and failure analysis, while addressing high-concurrency Node/TS environments.
3+
4+
## User Review Required
5+
- Please confirm if `apps/sim/lib/mcp/service.ts` is the correct core injection point for wrapping `executeTool`.
6+
- Note on file path: `apps/sim/lib/workflow/executor.ts` was not found. Instead, `apps/sim/executor/execution/executor.ts` and `apps/sim/tools/workflow/executor.ts` were analyzed. Ensure intercepting `McpService`'s `executeTool` serves your architectural needs.
7+
- Please confirm the schema enforcement approach: we will compile and cache JSON Schemas to Zod validators upon MCP server discovery or lazily, instead of parsing dynamically per request.
8+
9+
## Proposed Changes
10+
11+
We will split the implementation into discrete PRs / Commits to maintain structure.
12+
13+
### Part 1: Telemetry Hooks
14+
Implement the foundation for tracking.
15+
*(Change Rationale: Transitioning to a middleware pattern instead of a monolithic proxy, allowing telemetry to be composed easily).*
16+
#### [NEW] `apps/sim/lib/mcp/resilience/telemetry.ts`
17+
- Implement telemetry middleware hook to capture `latency_ms` and `failure_reason` (e.g., `TIMEOUT`, `VALIDATION_ERROR`, `API_500`).
18+
19+
### Part 2: Circuit Breaker State Machine
20+
Implement the state management logic.
21+
*(Change Rationale: Added a HALF-OPEN concurrency lock (semaphore) to prevent the "thundering herd" issue on the downstream server. Documented that this operates on local, per-instance state using an LRU cache to prevent memory leaks).*
22+
#### [NEW] `apps/sim/lib/mcp/resilience/circuit-breaker.ts`
23+
- Implement the `CircuitBreaker` middleware with states: `CLOSED`, `OPEN`, and `HALF-OPEN`.
24+
- Handle failure thresholds, reset timeouts, and logic for failing fast.
25+
- **Concurrency Lock:** During `HALF-OPEN`, strictly gate the transition so only **one** probe request is allowed through. All other concurrent requests will fail-fast until the probe resolves.
26+
- **Memory & State:** Use an LRU cache or scoped ties for the CircuitBreaker registry, binding the lifecycle of the breaker explicitly to the lifecycle of the MCP connection to prevent memory leaks. Also, this operates on local, per-instance state.
27+
28+
### Part 3: Schema Validation
29+
Implement the Zod validation logic for LLM arguments.
30+
*(Change Rationale: Added schema compilation caching to avoid severe CPU bottlenecking per request, and returning `isError: true` on validation failures to natively trigger LLM self-correction).*
31+
#### [NEW] `apps/sim/lib/mcp/resilience/schema-validator.ts`
32+
- Logic to enforce schemas using `Zod` as a middleware.
33+
- **Schema Caching:** Compile JSON Schemas to Zod schemas and cache them in a registry mapped to `toolId` during the initial discovery phase or lazily on first compile. Flush cached validators dynamically when listening for MCP lifecycle events (e.g., mid-session tool list updates).
34+
- **LLM Self-Correction:** Instead of throwing exceptions that crash the workflow engine when Zod validation fails, intercept validation errors and return a gracefully formatted MCP execution result: `{ isError: true, content: [{ type: "text", text: "Schema validation failed: [Zod Error Details]" }] }`.
35+
36+
### Part 4: Resilience Pipeline Integration
37+
Wrap up the tools via a Pipeline instead of a monolithic proxy.
38+
*(Change Rationale: Switched from a God Object Proxy to a Middleware Pipeline to support granular, per-tool enablement).*
39+
#### [NEW] `apps/sim/lib/mcp/resilience/pipeline.ts`
40+
- Implement a chain of responsibility (interceptor/middleware pipeline) for `executeTool`.
41+
- Provide an API like `executeTool.use(telemetry).use(validate(cachedSchema)).use(circuitBreaker(config))` rather than a sequential sequence inside a rigid class.
42+
- This composable architecture allows enabling or disabling specific middlewares dynamically per tool (e.g., un-trusted vs internal tools).
43+
44+
#### [MODIFY] `apps/sim/lib/mcp/service.ts`
45+
- Update `mcpService.executeTool` to run requests through the configurable `ResiliencePipeline`, rather than hardcoded proxy logic.
46+
47+
## Verification Plan
48+
### Automated Tests
49+
- Create a mock MCP server execution test suite.
50+
- Write tests in `apps/sim/lib/mcp/resilience/pipeline.test.ts` to assert:
51+
- Circuit Breaker trips to `OPEN` on simulated `API_500` and trips to `HALF-OPEN` after a cooldown.
52+
- **New Test:** Verify HALF-OPEN strictly allows exactly **one** simulated concurrent probe request through.
53+
- **New Test:** Schema validation returns `isError: true` standard format for improper LLM args without triggering execution.
54+
- Telemetry correctly logs latency.
55+
56+
### Manual Verification
57+
- Execute tests generating visual output demonstrating the circuit breaker "tripping" and "recovering".
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
import type { McpToolCall, McpToolResult } from '@/lib/mcp/types'
2+
3+
/**
4+
* Context passed through the Resilience Pipeline
5+
*/
6+
export interface McpExecutionContext {
7+
toolCall: McpToolCall
8+
serverId: string
9+
userId: string
10+
workspaceId: string
11+
/**
12+
* Additional parameters passed directly by the executeTool caller
13+
*/
14+
extraHeaders?: Record<string, string>
15+
}
16+
17+
/**
18+
* Standardized function signature for invoking the NEXT component in the pipeline
19+
*/
20+
export type McpMiddlewareNext = (
21+
context: McpExecutionContext
22+
) => Promise<McpToolResult>
23+
24+
/**
25+
* Interface that all Resilience Middlewares must implement
26+
*/
27+
export interface McpMiddleware {
28+
/**
29+
* Execute the middleware logic
30+
* @param context The current execution context
31+
* @param next The next middleware/tool in the chain
32+
*/
33+
execute(
34+
context: McpExecutionContext,
35+
next: McpMiddlewareNext
36+
): Promise<McpToolResult>
37+
}

0 commit comments

Comments
 (0)