Skip to content

Commit 53c3146

Browse files
author
StackMemory Bot (CLI)
committed
feat(mcp): add trace event API + verification commands in harness
- Wire TraceEventStore into MCP server with query/stats/record handlers - Add verification commands to multimodal harness (custom pass/fail checks) - Deterministic critique now checks verification results - Update MCP tool definitions + docs
1 parent 75e2048 commit 53c3146

7 files changed

Lines changed: 320 additions & 10 deletions

File tree

docs/mcp.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,29 +3,36 @@
33
The `plan_and_code` MCP tool lets Claude Code trigger StackMemory’s multi‑agent flow silently and receive a single JSON result. It plans with Claude, implements with Codex or Claude, and critiques the result — with optional retry loops and context recording.
44

55
## What it does
6+
67
- Planner (Claude): generates a concise plan with acceptance criteria and risks.
78
- Implementer (Codex/Claude): applies a focused change per step.
89
- Critic (Claude): returns `{ approved, issues[], suggestions[] }` to gate retries.
10+
- Verification commands: optional task-specific repro/test commands run after each implementation attempt and included in the critic input.
911
- Returns a single JSON payload: `{ plan, implementation, critique, iterations[] }`.
1012

1113
## Tool definition
14+
1215
- name: `plan_and_code`
1316
- arguments:
1417
- `task` (string, required): short task description
1518
- `implementer` ("codex" | "claude", default: `codex`)
1619
- `maxIters` (number, default: `2`): retry loop iterations
1720
- `execute` (boolean, default: `false`): if `false`, implementer is dry‑run
21+
- `verificationCommands` (string[], optional): repro/test commands that must pass after each implementation attempt
1822
- `record` (boolean, default: `false`): write plan/critique as simple context rows
1923
- `recordFrame` (boolean, default: `false`): write a real frame + anchors
2024

2125
## Environment defaults
26+
2227
If not specified in arguments, the MCP handler reads these env vars:
28+
2329
- `STACKMEMORY_MM_PLANNER_MODEL` (e.g., `claude-sonnet-4-20250514`)
2430
- `STACKMEMORY_MM_REVIEWER_MODEL` (defaults to planner model if unset)
2531
- `STACKMEMORY_MM_IMPLEMENTER` (`codex` or `claude`)
2632
- `STACKMEMORY_MM_MAX_ITERS` (e.g., `3`)
2733

2834
## Example (MCP request)
35+
2936
```json
3037
{
3138
"method": "tools/call",
@@ -36,13 +43,17 @@ If not specified in arguments, the MCP handler reads these env vars:
3643
"implementer": "codex",
3744
"maxIters": 2,
3845
"execute": true,
46+
"verificationCommands": [
47+
"npx vitest run src/orchestrators/multimodal/__tests__/determinism.test.ts --reporter=dot"
48+
],
3949
"recordFrame": true
4050
}
4151
}
4252
}
4353
```
4454

4555
Response content is a single `text` item containing a JSON string:
56+
4657
```json
4758
{
4859
"ok": true,
@@ -58,6 +69,7 @@ Response content is a single `text` item containing a JSON string:
5869
```
5970

6071
## Recording behavior
72+
6173
- `record: true` writes two entries into `.stackmemory/context.db` (simple `contexts` table):
6274
- `Plan: <summary>` (importance 0.8)
6375
- `Critique: approved|needs_changes` (importance 0.6)
@@ -68,18 +80,22 @@ Response content is a single `text` item containing a JSON string:
6880
- Both modes are best‑effort. If the DB isn’t ready, handler returns JSON without failing.
6981

7082
## Notes
83+
7184
- Implementer `codex` calls `codex-sm` (must be on PATH). Use `--execute` in CLI, or `execute: true` in MCP, to actually run it; otherwise it’s a dry‑run.
7285
- Audit files are saved to `.stackmemory/build/spike-<timestamp>.json` to support review/debugging.
7386
- You can compare models:
7487
- Planner/critic: override with `STACKMEMORY_MM_PLANNER_MODEL` / `STACKMEMORY_MM_REVIEWER_MODEL`.
7588
- Implementer: set to `claude` to A/B against Codex, or keep `codex` (default).
7689

7790
## CLI equivalents (for quick checks)
91+
7892
- Quiet JSON output:
7993
- `stackmemory build "Refactor config loader" --json`
8094
- `stackmemory skills spike --task "Refactor config loader" --json`
8195
- Execute implementer and record as frame:
8296
- `stackmemory skills spike --task "Refactor" --execute --max-iters 3 --json --record-frame`
97+
- Execute with a task-specific verification harness:
98+
- `stackmemory build "Fix deterministic replay drift" --verify "npm run determinism:test" --execute`
8399

84100
---
85101

@@ -152,11 +168,13 @@ Response (content[0].text is a JSON string):
152168
```
153169

154170
Notes:
171+
155172
- `recordFrame: true` creates a real StackMemory frame + anchors (plan summary, commands, issues, suggestions).
156173
- `execute: true` actually invokes the implementer; otherwise it’s a dry‑run.
157174
- Approval IDs are persisted to `.stackmemory/build/pending.json` so editor restarts don’t lose pending approvals.
158175

159176
### Optional helper tools
177+
160178
- `plan_only`: Returns a plan JSON without running code.
161179
- `call_claude`: Calls Claude directly (prompt/model/system).
162180
- `call_codex`: Calls Codex via `codex-sm` (prompt/args/execute).

src/cli/commands/skills.ts

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,13 @@ function getVersion(): string {
5656
return _version;
5757
}
5858

59+
function collectRepeatedOption(
60+
value: string,
61+
previous: string[] = []
62+
): string[] {
63+
return [...previous, value];
64+
}
65+
5966
// Type-safe environment variable access
6067
function _getEnv(key: string, defaultValue?: string): string {
6168
const value = process.env[key];
@@ -408,6 +415,12 @@ export function createSkillsCommand(): Command {
408415
false
409416
)
410417
.option('--audit-dir <path>', 'Persist spike results to directory')
418+
.option(
419+
'--verify <cmd>',
420+
'Verification command to run after each implementation attempt; repeatable',
421+
collectRepeatedOption,
422+
[]
423+
)
411424
.option('--record-frame', 'Record as real frame with anchors', false)
412425
.option(
413426
'--record',
@@ -435,6 +448,7 @@ export function createSkillsCommand(): Command {
435448
maxIters: parseInt(options.maxIters),
436449
dryRun: !options.execute,
437450
auditDir: options.auditDir,
451+
verificationCommands: options.verify,
438452
recordFrame: Boolean(options.recordFrame),
439453
record: Boolean(options.record),
440454
}

src/integrations/mcp/server.ts

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ import { logger } from '../../core/monitoring/logger.js';
3636
import { isFeatureEnabled } from '../../core/config/feature-flags.js';
3737
import { ContentCache } from '../../core/cache/index.js';
3838
import type { CacheStats } from '../../core/cache/index.js';
39+
import { TraceEventStore } from '../../core/trace/trace-event-store.js';
40+
import type { TraceEvent } from '../../core/trace/trace-event.js';
3941

4042
// Linear types - imported dynamically when needed
4143
type LinearTaskManager =
@@ -142,6 +144,8 @@ class LocalStackMemoryMCP {
142144
private crossSearchHandlers: CrossSearchHandlers;
143145
private pendingPlans: Map<string, any> = new Map();
144146
private contentCache: ContentCache;
147+
private traceEventStore: TraceEventStore;
148+
private sessionId: string;
145149
private sessionTokensSaved = 0;
146150
private sessionCacheHits = 0;
147151
private sessionCacheMisses = 0;
@@ -198,6 +202,10 @@ class LocalStackMemoryMCP {
198202
// Initialize content-hash cache for token deduplication
199203
this.contentCache = new ContentCache(this.db);
200204

205+
// Initialize ASI-shaped trace event store
206+
this.traceEventStore = new TraceEventStore(this.db);
207+
this.sessionId = uuidv4();
208+
201209
// Initialize frame manager
202210
this.frameManager = new FrameManager(this.db, this.projectId);
203211

@@ -343,6 +351,54 @@ class LocalStackMemoryMCP {
343351
};
344352
}
345353

354+
// ------------------------------------------------------------------
355+
// Trace event handlers
356+
// ------------------------------------------------------------------
357+
358+
private handleTraceEvents(args: Record<string, unknown>) {
359+
const events = this.traceEventStore.query({
360+
session_id: args.session_id as string | undefined,
361+
operation: args.operation as string | undefined,
362+
min_score: args.min_score as number | undefined,
363+
has_feedback: args.has_feedback as boolean | undefined,
364+
limit: (args.limit as number) ?? 50,
365+
});
366+
return {
367+
content: [{ type: 'text', text: JSON.stringify(events) }],
368+
isError: false,
369+
};
370+
}
371+
372+
private handleTraceEventStats(args: Record<string, unknown>) {
373+
const stats = this.traceEventStore.getStats({
374+
session_id: args.session_id as string | undefined,
375+
});
376+
return {
377+
content: [{ type: 'text', text: JSON.stringify(stats) }],
378+
isError: false,
379+
};
380+
}
381+
382+
private handleTraceEventAnnotate(args: Record<string, unknown>) {
383+
const id = String(args.id ?? '');
384+
if (!id) {
385+
return {
386+
content: [
387+
{ type: 'text', text: JSON.stringify({ error: 'id is required' }) },
388+
],
389+
isError: true,
390+
};
391+
}
392+
const ok = this.traceEventStore.annotate(id, {
393+
score: args.score as number | undefined,
394+
feedback: args.feedback as string | undefined,
395+
});
396+
return {
397+
content: [{ type: 'text', text: JSON.stringify({ ok, id }) }],
398+
isError: false,
399+
};
400+
}
401+
346402
private findProjectRoot(): string {
347403
let dir = process.cwd();
348404
while (dir !== '/') {
@@ -571,6 +627,12 @@ class LocalStackMemoryMCP {
571627
description: 'Which agent implements code',
572628
},
573629
maxIters: { type: 'number', default: 2 },
630+
verificationCommands: {
631+
type: 'array',
632+
items: { type: 'string' },
633+
description:
634+
'Optional repro/test commands that must pass after implementation',
635+
},
574636
recordFrame: { type: 'boolean', default: true },
575637
execute: { type: 'boolean', default: true },
576638
},
@@ -1424,6 +1486,50 @@ class LocalStackMemoryMCP {
14241486
required: ['content'],
14251487
},
14261488
},
1489+
// Trace event tools
1490+
{
1491+
name: 'trace_events',
1492+
description:
1493+
'Query ASI-shaped trace events. Filter by session, operation, min score, or feedback presence. Returns events with provenance, cost, and token data.',
1494+
inputSchema: {
1495+
type: 'object',
1496+
properties: {
1497+
session_id: { type: 'string' },
1498+
operation: { type: 'string' },
1499+
min_score: { type: 'number' },
1500+
has_feedback: { type: 'boolean' },
1501+
limit: { type: 'number' },
1502+
},
1503+
},
1504+
},
1505+
{
1506+
name: 'trace_event_stats',
1507+
description:
1508+
'Get aggregate trace event statistics: total tokens, cost, operation counts, host distribution.',
1509+
inputSchema: {
1510+
type: 'object',
1511+
properties: {
1512+
session_id: { type: 'string' },
1513+
},
1514+
},
1515+
},
1516+
{
1517+
name: 'trace_event_annotate',
1518+
description:
1519+
'Add a numeric score and/or textual feedback to a trace event. Used by GEPA-class optimizers.',
1520+
inputSchema: {
1521+
type: 'object',
1522+
properties: {
1523+
id: { type: 'string', description: 'Trace event ID' },
1524+
score: { type: 'number', description: 'Numeric score (0-1)' },
1525+
feedback: {
1526+
type: 'string',
1527+
description: 'Textual ASI feedback',
1528+
},
1529+
},
1530+
required: ['id'],
1531+
},
1532+
},
14271533
],
14281534
};
14291535
}
@@ -1791,6 +1897,19 @@ class LocalStackMemoryMCP {
17911897
result = this.handleCacheLookup(args);
17921898
break;
17931899

1900+
// Trace event tools
1901+
case 'trace_events':
1902+
result = this.handleTraceEvents(args);
1903+
break;
1904+
1905+
case 'trace_event_stats':
1906+
result = this.handleTraceEventStats(args);
1907+
break;
1908+
1909+
case 'trace_event_annotate':
1910+
result = this.handleTraceEventAnnotate(args);
1911+
break;
1912+
17941913
default:
17951914
throw new Error(`Unknown tool: ${name}`);
17961915
}
@@ -1843,13 +1962,58 @@ class LocalStackMemoryMCP {
18431962

18441963
// Add to trace detector
18451964
this.traceDetector.addToolCall(toolCall);
1965+
1966+
// --- Record ASI-shaped trace event ---
1967+
try {
1968+
const traceEvent: TraceEvent = {
1969+
timestamp: new Date(startTime).toISOString(),
1970+
session_id: this.sessionId,
1971+
trace_id: callId,
1972+
tenant_id: 'local',
1973+
actor: {
1974+
host: process.env['STACKMEMORY_HOST'] || 'claude-code',
1975+
agent: 'stackmemory-mcp',
1976+
user: process.env['USER'] || 'anonymous',
1977+
},
1978+
operation: name,
1979+
inputs: args as Record<string, unknown>,
1980+
outputs: error
1981+
? { error: error.message }
1982+
: ((result as Record<string, unknown>) ?? {}),
1983+
tokens_in: 0,
1984+
tokens_out: 0,
1985+
cost_usd: 0,
1986+
duration_ms: endTime - startTime,
1987+
error: error?.message,
1988+
provenance: {
1989+
sources: [{ type: 'tool', id: name }],
1990+
derivation: ['mcp-call'],
1991+
confidence: 1.0,
1992+
},
1993+
};
1994+
this.traceEventStore.record(traceEvent);
1995+
} catch {
1996+
// Trace recording is non-fatal
1997+
}
18461998
}
18471999

18482000
return result;
18492001
}
18502002
);
18512003
}
18522004

2005+
private getVerificationCommands(args: any): string[] {
2006+
const commands = args.verificationCommands ?? args.verifyCommands;
2007+
if (Array.isArray(commands)) {
2008+
return commands.map((command) => String(command).trim()).filter(Boolean);
2009+
}
2010+
const single = args.verificationCommand ?? args.verifyCommand;
2011+
if (typeof single === 'string' && single.trim()) {
2012+
return [single.trim()];
2013+
}
2014+
return [];
2015+
}
2016+
18532017
// Handle plan_and_code tool by invoking the mm harness
18542018
private async handlePlanAndCode(args: any) {
18552019
const { runSpike } =
@@ -1873,6 +2037,7 @@ class LocalStackMemoryMCP {
18732037
const record = Boolean(args.record);
18742038
const recordFrame = Boolean(args.recordFrame);
18752039
const compact = Boolean(args.compact);
2040+
const verificationCommands = this.getVerificationCommands(args);
18762041

18772042
const task = String(args.task || 'Plan and implement change');
18782043

@@ -1888,6 +2053,7 @@ class LocalStackMemoryMCP {
18882053
maxIters: isFinite(maxIters) ? Math.max(1, maxIters) : 2,
18892054
dryRun: !execute,
18902055
auditDir: undefined,
2056+
verificationCommands,
18912057
recordFrame,
18922058
}
18932059
);
@@ -2091,6 +2257,7 @@ class LocalStackMemoryMCP {
20912257
);
20922258
const recordFrame = args.recordFrame !== false; // default true
20932259
const execute = args.execute !== false; // default true
2260+
const verificationCommands = this.getVerificationCommands(args);
20942261

20952262
const result = await runSpike(
20962263
{ task: pending.task, repoPath: this.projectRoot },
@@ -2104,6 +2271,7 @@ class LocalStackMemoryMCP {
21042271
implementer: implementer === 'claude' ? 'claude' : 'codex',
21052272
maxIters: isFinite(maxIters) ? Math.max(1, maxIters) : 2,
21062273
dryRun: !execute,
2274+
verificationCommands,
21072275
recordFrame,
21082276
}
21092277
);

0 commit comments

Comments
 (0)