diff --git a/README.md b/README.md index ffd67b4..fd1c1d7 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,28 @@ # OpsOrch Copilot +[![Version](https://img.shields.io/github/v/release/OpsOrch/opsorch-copilot)](https://github.com/OpsOrch/opsorch-copilot/releases) +[![License](https://img.shields.io/github/license/OpsOrch/opsorch-copilot)](https://github.com/OpsOrch/opsorch-copilot/blob/main/LICENSE) +[![CI](https://github.com/OpsOrch/opsorch-copilot/workflows/CI/badge.svg)](https://github.com/OpsOrch/opsorch-copilot/actions) +[![Node Version](https://img.shields.io/badge/node-%3E%3D20-brightgreen)](https://nodejs.org) + OpsOrch Copilot is the AI runtime for OpsOrch. It plans tool calls against `opsorch-mcp`, gathers evidence, and returns structured answers for the Console UI and other clients. Copilot never talks to OpsOrch Core directly. It only uses the MCP tools layer. +## Table of Contents + +- [Status](#status) +- [Quick Start](#quick-start) +- [What Copilot Does](#what-copilot-does) +- [Configuration](#configuration) +- [Architecture](#architecture) +- [HTTP API](#http-api) +- [Stack and Boundaries](#stack-and-boundaries) +- [Development](#development) +- [Testing](#testing) +- [Seeding the Database](#seeding-the-database) +- [License](#license) + ## Status - License: Apache-2.0 @@ -13,25 +32,36 @@ Copilot never talks to OpsOrch Core directly. It only uses the MCP tools layer. ## Quick Start -1. Start `opsorch-core` -2. Start `opsorch-mcp` -3. Start Copilot +### Prerequisites + +- Node.js 20+ +- Running `opsorch-core` instance (port 8080) +- Running `opsorch-mcp` instance (port 7070) + +### Installation and Startup ```bash cd opsorch-copilot npm install + +# Start with mock LLM (no API key required) MCP_URL=http://localhost:7070/mcp \ LLM_PROVIDER=mock \ npm run dev ``` -Health check: +The server will start on `http://localhost:6060`. + +### Verify Installation +Health check: ```bash curl http://localhost:6060/health ``` -Chat request: +Expected response: `{"status":"ok"}` + +### Make Your First Request ```bash curl http://localhost:6060/chat \ @@ -39,75 +69,144 @@ curl http://localhost:6060/chat \ -d '{"message":"What incidents are active right now?"}' ``` +The response includes: +- `chatId` – Conversation identifier for follow-up questions +- `name` – Auto-generated conversation name +- `answer` – Structured answer with conclusion, evidence, and references + ## Configuration -Core runtime settings: +### Core Runtime Settings -- `PORT` - HTTP port for the Copilot API. Default: `6060` -- `MCP_URL` - MCP endpoint URL. Default: `http://localhost:7070/mcp` -- `LLM_PROVIDER` - `mock`, `openai`, `anthropic`, or `gemini`. Default: `mock` +| Variable | Default | Description | +|----------|---------|-------------| +| `PORT` | `6060` | HTTP port for the Copilot API | +| `MCP_URL` | `http://localhost:7070/mcp` | MCP endpoint URL | +| `LLM_PROVIDER` | `mock` | LLM provider: `mock`, `openai`, `anthropic`, or `gemini` | -Provider-specific settings: +### LLM Provider Settings -- `OPENAI_API_KEY` with optional `OPENAI_MODEL` and `OPENAI_BASE_URL` -- `ANTHROPIC_API_KEY` with optional `ANTHROPIC_MODEL` and `ANTHROPIC_BASE_URL` -- `GEMINI_API_KEY` with optional `GEMINI_MODEL` +**OpenAI:** +- `OPENAI_API_KEY` (required) +- `OPENAI_MODEL` (optional, default: `gpt-4o`) +- `OPENAI_BASE_URL` (optional, for custom endpoints) + +**Anthropic:** +- `ANTHROPIC_API_KEY` (required) +- `ANTHROPIC_MODEL` (optional, default: `claude-3-5-sonnet-20241022`) +- `ANTHROPIC_BASE_URL` (optional, for custom endpoints) -Conversation storage: +**Google Gemini:** +- `GEMINI_API_KEY` (required) +- `GEMINI_MODEL` (optional, default: `gemini-2.0-flash-exp`) -- `CONVERSATION_STORE_TYPE` - `memory` or `sqlite`. Default: `memory` -- `SQLITE_DB_PATH` - SQLite DB path when using `sqlite`. Default: `./data/conversations.db` +### Conversation Storage Settings -## What Copilot should do +| Variable | Default | Description | +|----------|---------|-------------| +| `CONVERSATION_STORE_TYPE` | `memory` | Storage backend: `memory` or `sqlite` | +| `SQLITE_DB_PATH` | `./data/conversations.db` | SQLite database file path (when using `sqlite`) | -- Retrieve recent/impactful incidents, surface their context, and include related PagerDuty alerts, linked Jira tickets, and nearby logs/metrics. -- Explain incident history and changes, e.g., "What was the trigger for the severity escalation?" by inspecting timelines and metadata. -- Find patterns, e.g., "Has this service had similar incidents recently?" by querying incidents filtered by service/time/severity. -- Correlate signals, e.g., "Is the spike in p95 latency correlated with CPU, memory, or traffic?" by querying metrics over the same window and comparing trends. -- Use messaging tools to share findings or timelines when needed. +## What Copilot Does -## Question coverage (examples) +Copilot answers operational questions by orchestrating MCP tool calls and synthesizing evidence: -- Basic understanding: summarize an incident; note changes right before start; infer likely root cause from logs/metrics; correlate with deploys; pull last N minutes of related logs. -- Context & relationships: list dependent services; find similar incidents for a service; relate to earlier incidents; identify severity escalation triggers. -- Causal analysis: match error signatures to past incidents; correlate latency spikes with CPU/memory/traffic; distinguish DB vs network vs code issues; compare against prior checkout failures. -- Metrics: explain CPU spikes and latency anomalies; surface metric anomalies for a service in a window; identify pods/nodes contributing most errors. -- Logs: query 500s for a service over a window; extract dominant/error patterns; list IPs with most failed requests; flag unusual log patterns. -- Correlation: align logs and metrics for a service; test hypotheses like memory leaks; find earliest signals of degradation. +- **Incident Analysis** – Retrieve recent/impactful incidents with context including related PagerDuty alerts, linked Jira tickets, and nearby logs/metrics +- **Incident History** – Explain incident changes and timelines, e.g., "What triggered the severity escalation?" +- **Pattern Detection** – Find similar incidents, e.g., "Has this service had similar incidents recently?" +- **Signal Correlation** – Correlate metrics, e.g., "Is the p95 latency spike correlated with CPU, memory, or traffic?" +- **Root Cause Analysis** – Match error signatures to past incidents and identify likely causes +- **Deployment Correlation** – Correlate incidents with recent deployments and code changes +- **Service Dependencies** – Discover service relationships and dependencies +- **Team Context** – Identify on-call teams and escalation paths +- **Messaging Integration** – Share findings via Slack or other messaging tools when needed -## Stack and boundaries +### Question Coverage Examples -- UI: `opsorch-console` -- Copilot runtime: this repo (LLM prompts, reasoning, tool selection loops) -- Tools: `opsorch-mcp` (typed MCP tools around OpsOrch Core) -- Source of truth: `opsorch-core` (incidents, logs, metrics, services, tickets, messaging) +**Basic Understanding:** +- Summarize an incident +- Note changes right before incident start +- Infer likely root cause from logs/metrics +- Correlate with recent deployments +- Pull last N minutes of related logs -## Development notes +**Context & Relationships:** +- List dependent services +- Find similar incidents for a service +- Relate to earlier incidents +- Identify severity escalation triggers -- MCP dev server default: `http://localhost:7070/mcp` -- Copilot communicates only via MCP tools; no direct Core calls. -- See `AGENTS.md` for the layered architecture overview. -- See `DESIGN.md` for capability-handler details. +**Causal Analysis:** +- Match error signatures to past incidents +- Correlate latency spikes with CPU/memory/traffic +- Distinguish DB vs network vs code issues +- Compare against prior failures -Core implementation areas: +**Metrics:** +- Explain CPU spikes and latency anomalies +- Surface metric anomalies for a service in a time window +- Identify pods/nodes contributing most errors -- `src/engine/` - planning, execution, follow-ups, references, synthesis -- `src/llms/` - LLM provider adapters -- `src/mcps/` - MCP client implementations -- `src/stores/` - in-memory and SQLite conversation stores -- `src/server.ts` - HTTP API +**Logs:** +- Query 500 errors for a service over a time window +- Extract dominant error patterns +- List IPs with most failed requests +- Flag unusual log patterns + +**Correlation:** +- Align logs and metrics for a service +- Test hypotheses like memory leaks +- Find earliest signals of degradation + +## Stack and Boundaries + +OpsOrch Copilot is part of a layered architecture: + +- **UI Layer** – `opsorch-console` (Next.js web UI) +- **AI Runtime** – `opsorch-copilot` (this repo) – LLM prompts, reasoning, tool orchestration +- **Tools Layer** – `opsorch-mcp` – Typed MCP tools wrapping Core APIs +- **Core Layer** – `opsorch-core` – Source of truth for incidents, logs, metrics, services, tickets, messaging +- **Adapters** – Provider-specific adapters (PagerDuty, Datadog, Jira, Slack, etc.) + +**Key Principle:** Copilot never talks to OpsOrch Core directly. All interactions go through the MCP tools layer, ensuring a clean separation of concerns and consistent tool-based interface. + +## Architecture + +Copilot implements a multi-step agentic reasoning loop that orchestrates LLM planning, tool execution, and answer synthesis: + +1. **Planning** – LLM analyzes the question and plans which MCP tools to call +2. **Execution** – Tools are called in parallel with retry logic and result caching +3. **Analysis** – Handlers extract entities, detect anomalies, and suggest follow-ups +4. **Refinement** – If needed, additional tool calls are planned based on results +5. **Synthesis** – Final answer is generated with evidence and structured references + +Key architectural components: + +- `CopilotEngine` – Main orchestration engine (max 3 iterations) +- `Planner` – LLM-based tool call planning with heuristic fallback +- `ToolRunner` – Parallel tool execution with caching and retry strategy +- `EntityExtractor` – Extracts IDs, timestamps, and references from results +- `ReferenceResolver` – Resolves pronouns like "that incident" to specific entities +- `FollowUpEngine` – Suggests intelligent next actions based on results +- `AnswerGenerator` – Synthesizes final answers with evidence +- `ConversationManager` – Manages multi-turn conversation history + +See `DESIGN.md` for detailed architecture documentation and `AGENTS.md` for the layered system overview. ### Capability-Based Handler Architecture -Copilot uses a capability-based handler system organized around six core operational domains: +Copilot uses a capability-based handler system organized around nine core operational domains: -**Six Core Capabilities:** +**Nine Core Capabilities:** - `incident/` – Incident query and analysis - `alert/` – Alert monitoring and investigation - `log/` – Log search and analysis - `metric/` – Metrics query and correlation - `service/` – Service discovery and dependencies - `ticket/` – Ticket linking and management +- `deployment/` – Deployment tracking and correlation +- `orchestration/` – Workflow orchestration and automation +- `team/` – Team management and on-call schedules **Handler Types (11 total):** Each capability implements specialized handlers from this set: @@ -126,66 +225,162 @@ Each capability implements specialized handlers from this set: | **ServiceDiscovery** | Discovers available services from MCP | | **ServiceMatching** | Performs fuzzy matching of service names in questions | -**Engine Flow:** - -```mermaid -flowchart TD - Q[User Question] --> P[Planner] - P -->|LLM plans tools| TC[Tool Calls] - P -->|Fallback| HF[Heuristic Fallback] - HF --> TC - TC --> V[Validation Registry] - V -->|Valid| TE[Tool Execution] - V -->|Invalid| TC - TE --> EE[Entity Extraction] - EE --> RR[Reference Resolution] - RR --> FU[Follow-up Suggestion] - FU -->|More tools needed| P - FU -->|Done| SY[Synthesis] - SY --> A[Answer with Evidence] -``` +## Development -All handlers are registered in `capabilityRegistry.ts` and invoked by the engine during tool execution. +### Project Structure -### HTTP API (console/CLI integration) +``` +src/ +├── engine/ # Core orchestration and reasoning +│ ├── handlers/ # Capability-specific handlers +│ │ ├── incident/ # Incident analysis handlers +│ │ ├── alert/ # Alert monitoring handlers +│ │ ├── log/ # Log search handlers +│ │ ├── metric/ # Metrics analysis handlers +│ │ ├── service/ # Service discovery handlers +│ │ ├── ticket/ # Ticket management handlers +│ │ ├── deployment/ # Deployment tracking handlers +│ │ ├── orchestration/ # Workflow handlers +│ │ ├── team/ # Team management handlers +│ │ └── shared/ # Shared utilities +│ ├── copilotEngine.ts # Main orchestration engine +│ ├── planner.ts # LLM-based tool planning +│ ├── toolRunner.ts # Tool execution with retry logic +│ ├── entityExtractor.ts # Entity extraction from results +│ ├── referenceResolver.ts # Reference resolution +│ ├── followUpEngine.ts # Follow-up suggestion engine +│ └── answerGenerator.ts # Answer synthesis +├── llms/ # LLM provider adapters +├── mcps/ # MCP client implementations +├── stores/ # Conversation storage backends +└── server.ts # HTTP API server +``` -- Start server: `npm start` (env: `PORT` default 6060, `MCP_URL` default `http://localhost:7070/mcp`). -- `POST /chat` – body `{ "message": "", "chatId?": "" }` - - Response: `{ "chatId": "", "answer": { conclusion, evidence?, missing?, references?, chatId? } }` - - `answer.references` drives Console deep links and includes buckets for `incidents[]`, `services[]`, `tickets[]`, `alerts[]`, plus structured `metrics[]`/`logs[]` entries (each with expression + window) - - If `chatId` is not provided, the response includes one so callers can persist and reuse it. -- `GET /health` – liveness check: `{ "status": "ok" }` -- `GET /chats` – list saved conversations with previews and pagination -- `GET /chats/search?query=...` – search saved conversations -- `GET /chats/:id` – retrieve a single saved conversation +### Running Locally + +Start the full OpsOrch stack: + +1. **Start Core** (port 8080): + ```bash + cd ../opsorch-core && go run ./cmd/opsorch + ``` + +2. **Start MCP** (port 7070): + ```bash + cd ../opsorch-mcp && npm run dev + ``` + +3. **Start Copilot** (port 6060): + ```bash + cd opsorch-copilot + npm install + MCP_URL=http://localhost:7070/mcp \ + LLM_PROVIDER=mock \ + npm run dev + ``` + +4. **Start Console** (port 3000): + ```bash + cd ../opsorch-console && npm run dev + ``` + +### Available Scripts + +- `npm run dev` – Start development server with hot reload +- `npm start` – Start production server +- `npm test` – Run all tests +- `npm run type-check` – TypeScript type checking +- `npm run lint` – Lint code +- `npm run lint:fix` – Fix linting issues +- `npm run build` – Build for production +- `npm run seed` – Seed database with sample conversations + +### Environment Variables + +See the Configuration section above for all available environment variables. + +### HTTP API + +The Copilot server exposes a REST API for chat interactions and conversation management. + +**Endpoints:** + +- `POST /chat` – Submit a question and get an AI-generated answer + - Request body: `{ "message": "", "chatId?": "" }` + - Response: `{ "chatId": "", "name": "", "answer": { ... } }` + - The `answer` object includes: + - `conclusion` – Short summary answer + - `evidence` – Supporting data and findings + - `references` – Structured references for deep linking: + - `incidents[]` – Incident IDs + - `services[]` – Service names + - `tickets[]` – Ticket IDs + - `alerts[]` – Alert IDs + - `metrics[]` – Metric queries with `{expression, start, end, step}` + - `logs[]` – Log queries with `{query, start, end, service}` + - `missing` – Notes about unavailable data + - If `chatId` is omitted, a new conversation is created and its ID is returned + +- `GET /health` – Health check endpoint + - Response: `{ "status": "ok" }` + +- `GET /chats` – List all saved conversations with pagination + - Query parameters: + - `limit` (optional) – Maximum number of results to return + - `offset` (optional) – Number of results to skip (default: 0) + - Response: `{ "conversations": [...], "pagination": { total, offset, limit, hasMore } }` + - Each conversation includes: `chatId`, `name`, `createdAt`, `lastAccessedAt`, `turnCount`, `preview` + - Results are sorted by most recent access first + +- `GET /chats/search` – Search conversations by content + - Query parameters: + - `query` (required) – Search query string + - `limit` (optional) – Maximum number of results (default: 50) + - Response: `{ "query": "...", "limit": 50, "totalResults": N, "results": [...] }` + - Searches across conversation names, user messages, and assistant responses + +- `GET /chats/:id` – Retrieve a specific conversation by ID + - Response: `{ "conversation": { chatId, name, turns, createdAt, lastAccessedAt } }` + - Returns 404 if conversation not found or expired ### Conversation Storage Copilot supports two storage backends for conversation persistence: #### In-Memory Storage (Default) -- Conversations are stored in memory with LRU eviction + +Best for development and testing: +- Conversations stored in memory with LRU eviction - Data is lost on server restart - No configuration required +- Fast and lightweight + +```bash +# No configuration needed - this is the default +npm run dev +``` #### SQLite Storage + +Best for production and demos: - Conversations persist across server restarts - Stored in a local SQLite database file -- Maintains the same LRU eviction behavior as in-memory storage +- Same LRU eviction behavior as in-memory storage +- Supports full-text search across conversations **Configuration:** -Set the following environment variables to enable SQLite storage: - ```bash # Enable SQLite storage CONVERSATION_STORE_TYPE=sqlite # Optional: specify database file path (default: ./data/conversations.db) SQLITE_DB_PATH=/path/to/conversations.db + +npm run dev ``` -Docker example: +**Docker Example:** ```yaml services: @@ -200,7 +395,7 @@ volumes: copilot-data: ``` -Backup and recovery: +**Backup and Recovery:** For SQLite storage, regular backups of the database file are recommended: @@ -212,35 +407,83 @@ cp /path/to/conversations.db /path/to/backup/conversations-$(date +%Y%m%d).db cp /path/to/backup/conversations-20250122.db /path/to/conversations.db ``` +**Graceful Shutdown:** + +The server handles `SIGTERM` and `SIGINT` signals gracefully, ensuring the SQLite database is properly closed before exit. + ## Testing +### Running Tests + ```bash +# Run all tests npm test + +# Type checking npm run type-check + +# Linting +npm run lint ``` -Coverage includes: +### Test Coverage + +Comprehensive test suites cover: + +**Engine & Orchestration:** +- CopilotEngine – Planning loop, iteration limits, multi-turn conversations +- Planner – LLM planning, JSON fallback, heuristic fallback +- ToolRunner – Tool execution, result normalization, error handling +- ParallelToolRunner – Concurrent execution, ordering, deduplication +- ResultCache – Cache hits/misses, invalidation +- EntityExtractor – Entity extraction from various tool result structures +- ReferenceResolver – Reference resolution with conversation history +- FollowUpEngine – Follow-up suggestion generation and deduplication +- ExecutionTracer – Trace creation, telemetry, and diagnostics + +**Capability Handlers:** +- Intent Classification – Pattern matching, service extraction, tool injection +- Entity Extraction – ID extraction, entity type detection, nested structure handling +- Scope Inference – Scope detection from context, intelligent parameterization +- Reference Handlers – Pronoun resolution, entity linking, temporal references +- Validation – Tool call validation and argument normalization +- Follow-up – Context-aware follow-up suggestions + +**Conversation Management:** +- ConversationManager – Turn storage, retrieval, LRU eviction +- ConversationStore – In-memory and SQLite persistence +- ConversationSearch – Full-text search, filtering, result ranking + +**Analysis & Synthesis:** +- CorrelationDetector – Correlation detection, root cause identification +- AnomalyDetector – Anomaly detection, trend analysis +- TimeWindowExpander – Window expansion, capping calculations +- AnswerFormatter – Evidence aggregation, reference formatting + +**Utilities:** +- ChatNamer – Conversation name generation and synthesis +- ServiceDiscovery – Service lookup and caching +- TimestampUtils – Timestamp parsing and formatting +- MetricUtils – Metric parsing and aggregation +- ToolsSchema – Tool schema validation + +### Testing Patterns -- planner and tool execution loops -- conversation history and storage backends -- MCP integration layers -- capability handlers and follow-ups -- HTTP API behavior +- **MockMcp** – Simulates MCP tool responses without network calls +- **Temporary SQLite databases** – Each SQLite test uses a temporary database file cleaned up after test runs +- **Conversation fixtures** – Pre-built conversation data for testing multi-turn flows +- **Tool result mocking** – Realistic tool responses for testing handlers and synthesis ### Integration Testing + Start the full stack for end-to-end testing: + 1. Start Core: `cd ../opsorch-core && go run ./cmd/opsorch` 2. Start MCP: `cd ../opsorch-mcp && npm run dev` 3. Start Copilot: `npm run dev` 4. Start Console: `cd ../opsorch-console && npm run dev` -Test via Console UI or direct API calls to `http://localhost:6060/chat` - -### Testing Patterns -- **MockMcp**: Simulates MCP tool responses without network calls -- **Temporary SQLite databases**: Each SQLite test uses a temporary database file cleaned up after test runs -- **Conversation fixtures**: Pre-built conversation data for testing multi-turn flows -- **Tool result mocking**: Realistic tool responses for testing handlers and synthesis +Test via Console UI at `http://localhost:3000` or direct API calls to `http://localhost:6060/chat` ## Seeding the Database @@ -250,9 +493,9 @@ To populate the database with realistic sample conversations for testing or demo npm run seed ``` -This will: -- Clear any existing conversations in the database -- Generate 30 realistic operational conversations covering various scenarios: +This command: +- Clears any existing conversations in the database +- Generates 30 realistic operational conversations covering various scenarios: - Incident investigations (high error rates, service outages) - Service health checks and monitoring - Performance issues (latency spikes, memory leaks) @@ -260,11 +503,13 @@ This will: - Deployment verifications - SSL certificate management - Rate limiting and cache issues -- Populate conversations with realistic tool results, timestamps, and entities -- Distribute conversations across the last 30 days +- Populates conversations with realistic tool results, timestamps, and entities +- Distributes conversations across the last 30 days The seed script uses the database path from `SQLITE_DB_PATH` environment variable or defaults to `./data/conversations.db`. +**Note:** Seeding requires SQLite storage. Set `CONVERSATION_STORE_TYPE=sqlite` before running the seed command. + ## License Apache-2.0. See [LICENSE](LICENSE). diff --git a/src/engine/answerGenerator.ts b/src/engine/answerGenerator.ts index 977b1ce..705edeb 100644 --- a/src/engine/answerGenerator.ts +++ b/src/engine/answerGenerator.ts @@ -163,6 +163,12 @@ function extractReferencesFromResults(results: ToolResult[]): CopilotReferences extractLogReferences(result, refs.logs); } + // Extract team IDs + if (result.name.includes('team')) { + refs.teams = refs.teams || []; + extractTeamIds(result.result, refs.teams); + } + // Extract orchestration plan IDs if (result.name.includes('orchestration')) { refs.orchestrationPlans = refs.orchestrationPlans || []; @@ -176,6 +182,7 @@ function extractReferencesFromResults(results: ToolResult[]): CopilotReferences if (refs.alerts) refs.alerts = [...new Set(refs.alerts)]; if (refs.deployments) refs.deployments = [...new Set(refs.deployments)]; if (refs.tickets) refs.tickets = [...new Set(refs.tickets)]; + if (refs.teams) refs.teams = [...new Set(refs.teams)]; // Metrics and logs are complex objects, dedupe by JSON string if (refs.metrics) refs.metrics = dedupeByJson(refs.metrics); if (refs.logs) refs.logs = dedupeByJson(refs.logs); @@ -326,6 +333,31 @@ function extractTicketIds(data: unknown, ids: string[]): void { } } +function extractTeamIds(data: unknown, ids: string[]): void { + if (Array.isArray(data)) { + for (const item of data) { + if (typeof item === 'object' && item !== null) { + const obj = item as Record; + if ('id' in obj) ids.push(String(obj.id)); + else if ('name' in obj) ids.push(String(obj.name)); + } + } + } else if (typeof data === 'object' && data !== null) { + const obj = data as Record; + if ('id' in obj) ids.push(String(obj.id)); + else if ('name' in obj) ids.push(String(obj.name)); + if (Array.isArray(obj.teams)) { + for (const team of obj.teams) { + if (typeof team === 'object' && team !== null) { + const t = team as Record; + if ('id' in t) ids.push(String(t.id)); + else if ('name' in t) ids.push(String(t.name)); + } + } + } + } +} + /** * Extract metric references from query-metrics tool results. * Uses the tool arguments to build a MetricReference with deep-linking metadata. diff --git a/src/engine/capabilityRegistry.ts b/src/engine/capabilityRegistry.ts index d752d97..88be345 100644 --- a/src/engine/capabilityRegistry.ts +++ b/src/engine/capabilityRegistry.ts @@ -168,6 +168,7 @@ function createFollowUpRegistry(): FollowUpRegistry { registry.register("query-logs", logFollowUpHandler); + registry.register("describe-metrics", metricFollowUpHandler); registry.register("query-metrics", metricFollowUpHandler); registry.register("query-tickets", ticketFollowUpHandler); @@ -183,7 +184,7 @@ function createFollowUpRegistry(): FollowUpRegistry { registry.register("get-orchestration-plan", orchestrationFollowUpHandler); console.log( - "[FollowUpRegistry] Registered 17 follow-up handlers for 9 capabilities", + "[FollowUpRegistry] Registered 18 follow-up handlers for 9 capabilities", ); return registry; diff --git a/src/engine/copilotEngine.ts b/src/engine/copilotEngine.ts index 39def83..53cdb0e 100644 --- a/src/engine/copilotEngine.ts +++ b/src/engine/copilotEngine.ts @@ -324,6 +324,7 @@ export class CopilotEngine { const allExtractedEntities: Entity[] = []; let iteration = 0; let isFirstIteration = true; + let hadPlannedCalls = false; // Step 4: Reasoning Loop while (iteration < this.maxIterations) { @@ -421,6 +422,11 @@ export class CopilotEngine { // Limit calls plannedCalls = this.limitToolCalls(plannedCalls); + // Track if any calls were ever planned (before toolRunner filtering) + if (plannedCalls.length > 0) { + hadPlannedCalls = true; + } + // B. Check Stop Condition if (plannedCalls.length === 0) { console.log( @@ -533,6 +539,14 @@ export class CopilotEngine { this.config.llm, ); + // If calls were planned but no results were collected, tool calls were likely + // skipped due to unresolved placeholder arguments (e.g. {{incidentId}}) + if (hadPlannedCalls && allResults.length === 0) { + answer.missing = answer.missing + ? [...answer.missing, 'tool outputs'] + : ['tool outputs']; + } + // Step 6: Create TurnExecutionTrace from ExecutionTrace const turnTrace: TurnExecutionTrace = { traceId: trace.traceId, diff --git a/src/engine/followUpEngine.ts b/src/engine/followUpEngine.ts index 41b1aa5..ad72a7c 100644 --- a/src/engine/followUpEngine.ts +++ b/src/engine/followUpEngine.ts @@ -29,6 +29,7 @@ export class FollowUpEngine { chatId, conversationHistory, userQuestion, + toolResults, ); try { @@ -66,12 +67,13 @@ export class FollowUpEngine { chatId: string, conversationHistory: ConversationTurn[], userQuestion: string, + currentResults: ToolResult[], ): HandlerContext { return { chatId, turnNumber: conversationHistory.length, conversationHistory, - toolResults: [result], + toolResults: currentResults, userQuestion, }; } diff --git a/src/engine/handlers/incident/scopeHandler.ts b/src/engine/handlers/incident/scopeHandler.ts index 228d08a..3a0cee9 100644 --- a/src/engine/handlers/incident/scopeHandler.ts +++ b/src/engine/handlers/incident/scopeHandler.ts @@ -156,6 +156,26 @@ export const incidentScopeInferenceHandler: ScopeHandler = async ( // Note: We no longer look at conversation history toolResults since they're not stored. // Scope from previous turns should be inferred from entities or the current turn's results. + if (!hasScope || !(scope.service && scope.environment && scope.team)) { + for (let i = context.conversationHistory.length - 1; i >= 0; i--) { + const turn = context.conversationHistory[i]; + if (turn.entities) { + for (const entity of turn.entities) { + if (entity.type === "service" && !scope.service) { + scope.service = entity.value; + hasScope = true; + } + if (entity.type === "team" && !scope.team) { + scope.team = entity.value; + hasScope = true; + } + } + } + if (scope.service && scope.environment && scope.team) { + break; + } + } + } return hasScope ? scope : null; }; diff --git a/src/engine/handlers/metric/followUpHandler.ts b/src/engine/handlers/metric/followUpHandler.ts index 9d419ae..3078ccd 100644 --- a/src/engine/handlers/metric/followUpHandler.ts +++ b/src/engine/handlers/metric/followUpHandler.ts @@ -9,7 +9,7 @@ */ import type { FollowUpHandler } from "../handlers.js"; -import type { ToolCall, JsonObject } from "../../../types.js"; +import type { ToolCall, ToolResult, JsonObject } from "../../../types.js"; import { generateSearchExpression } from "../logQueryParser.js"; import { HandlerUtils } from "../utils.js"; @@ -29,12 +29,170 @@ const LATENCY_METRIC_PATTERNS = [ "timeout", ]; +function extractMetricNames(result: ToolResult["result"]): string[] { + if (Array.isArray(result)) { + return result + .map((entry) => { + if (typeof entry === "string") return entry; + if (entry && typeof entry === "object" && "name" in entry) { + const name = (entry as JsonObject).name; + return typeof name === "string" ? name : undefined; + } + return undefined; + }) + .filter((name): name is string => typeof name === "string" && name.length > 0); + } + + if (result && typeof result === "object") { + const metrics = (result as JsonObject).metrics; + if (Array.isArray(metrics)) { + return extractMetricNames(metrics); + } + } + + return []; +} + +function shouldQueryDiscoveredMetrics(question: string): boolean { + const lower = question.toLowerCase(); + const discoveryOnlyPatterns = [ + "available metrics", + "list metrics", + "what metrics", + "which metrics", + ]; + if (discoveryOnlyPatterns.some((pattern) => lower.includes(pattern))) { + return false; + } + + const investigationTerms = [ + "metric", + "metrics", + "cpu", + "memory", + "latency", + "p95", + "p99", + "throughput", + "request", + "error", + ]; + const actionTerms = [ + "check", + "show", + "inspect", + "analy", + "graph", + "trend", + "root cause", + "why", + "high", + ]; + + return ( + investigationTerms.some((term) => lower.includes(term)) && + actionTerms.some((term) => lower.includes(term)) + ); +} + +function selectMetricName(question: string, metricNames: string[]): string | undefined { + const lower = question.toLowerCase(); + const preferredTokens = [ + "cpu", + "memory", + "latency", + "p99", + "p95", + "throughput", + "request", + "error", + ]; + + for (const token of preferredTokens) { + if (!lower.includes(token)) continue; + const match = metricNames.find((name) => name.toLowerCase().includes(token)); + if (match) return match; + } + + return metricNames[0]; +} + +function getIncidentTimeWindow(context: Parameters[0]): { + start: string; + end: string; +} | null { + for (const result of context.toolResults) { + if (result.name !== "query-incidents" && result.name !== "get-incident") { + continue; + } + + const incident = Array.isArray(result.result) + ? result.result[0] + : result.result; + if (!incident || typeof incident !== "object") { + continue; + } + + const incidentObject = incident as JsonObject; + const startValue = incidentObject.startTime ?? incidentObject.createdAt; + const endValue = incidentObject.endTime ?? incidentObject.updatedAt; + if (typeof startValue !== "string" && typeof endValue !== "string") { + continue; + } + + const expanded = HandlerUtils.expandTimeWindow( + typeof startValue === "string" ? startValue : undefined, + typeof endValue === "string" ? endValue : undefined, + 15, + ); + return { + start: expanded.start.toISOString(), + end: expanded.end.toISOString(), + }; + } + + return null; +} + export const metricFollowUpHandler: FollowUpHandler = async ( context, toolResult, ): Promise => { const suggestions: ToolCall[] = []; + if (toolResult.name === "describe-metrics") { + const metricNames = extractMetricNames(toolResult.result); + const scope = toolResult.arguments?.scope as JsonObject | undefined; + const service = scope?.service; + + if ( + typeof service === "string" && + metricNames.length > 0 && + shouldQueryDiscoveredMetrics(context.userQuestion) && + !HandlerUtils.isDuplicateToolCall(context, "query-metrics", service) + ) { + const metricName = selectMetricName(context.userQuestion, metricNames); + if (metricName) { + const incidentWindow = getIncidentTimeWindow(context); + const end = incidentWindow?.end ?? new Date().toISOString(); + const start = incidentWindow?.start ?? new Date(Date.now() - 60 * 60 * 1000).toISOString(); + + suggestions.push({ + name: "query-metrics", + arguments: { + scope: { service }, + expression: { metricName }, + step: 60, + start, + end, + }, + }); + } + } + + return suggestions; + } + if (!toolResult.result || typeof toolResult.result !== "object") { return suggestions; } @@ -155,4 +313,3 @@ export const metricFollowUpHandler: FollowUpHandler = async ( return suggestions; }; - diff --git a/src/engine/handlers/metric/queryBuilder.ts b/src/engine/handlers/metric/queryBuilder.ts index 663c8f0..a88bc89 100644 --- a/src/engine/handlers/metric/queryBuilder.ts +++ b/src/engine/handlers/metric/queryBuilder.ts @@ -20,7 +20,7 @@ import { QueryBuilderHandler } from "../handlers.js"; import { JsonObject } from "../../../types.js"; export const metricQueryBuilder: QueryBuilderHandler = async ( - _context, + context, _toolName, naturalLanguage, ): Promise => { @@ -42,7 +42,30 @@ export const metricQueryBuilder: QueryBuilderHandler = async ( } else if (lower.includes("request") || lower.includes("throughput")) { expression.metricName = "requests"; } - // Note: We don't guess names like "error_rate" that may not exist + + // If no hint from natural language, try to use a discovered metric from describe-metrics results + if (!expression.metricName) { + for (const result of context.toolResults) { + if (result.name === "describe-metrics") { + const data = result.result; + const list = Array.isArray(data) ? data + : (data && typeof data === "object" && Array.isArray((data as Record).metrics)) + ? (data as Record).metrics as unknown[] + : null; + if (list && list.length > 0) { + const first = list[0]; + const name = typeof first === "string" ? first + : (first && typeof first === "object" && "name" in (first as object)) + ? String((first as Record).name) + : undefined; + if (name) { + expression.metricName = name; + break; + } + } + } + } + } // MCP schema: start/end must be ISO 8601 datetime const now = new Date(); diff --git a/src/engine/handlers/metric/validationHandler.ts b/src/engine/handlers/metric/validationHandler.ts index 70ef3c9..43a2990 100644 --- a/src/engine/handlers/metric/validationHandler.ts +++ b/src/engine/handlers/metric/validationHandler.ts @@ -87,68 +87,51 @@ export const metricValidationHandler: ValidationHandler = async ( if (toolName === "query-metrics") { // Check if describe-metrics was called first for this scope - const scope = toolArgs.scope as JsonObject | undefined; - const service = scope?.service as string | undefined; + const scopeArg = toolArgs.scope as { service?: string } | undefined; + const service = scopeArg?.service; + const discoveredMetrics = getDiscoveredMetrics(context, service); - const discovered = getDiscoveredMetrics(context, service); - - if (discovered === null) { - // Reject the call - describe-metrics must be called first - errors.push({ - field: "expression.metricName", - message: `describe-metrics must be called first to discover available metrics${service ? ` for service '${service}'` : ''}`, - code: "PREREQUISITE_NOT_MET", - }); - console.log(`[MetricValidation] Rejecting query-metrics: describe-metrics not called for scope ${service || 'global'}`); + if (discoveredMetrics === null) { return { valid: false, - errors, + errors: [ + { + field: "prerequisite", + message: `describe-metrics must be called before query-metrics for scope '${service || "global"}'`, + code: "PREREQUISITE_NOT_MET", + }, + ], replacementCall: { name: "describe-metrics", - arguments: { scope: service ? { service } : null }, + arguments: service ? { scope: { service } } : { scope: null }, }, }; } - // Strict validation if we have the list - if (Array.isArray(discovered)) { - const expr = toolArgs.expression as JsonObject; - const metricName = expr?.metricName as string; - if (metricName && !discovered.includes(metricName)) { - errors.push({ - field: "expression.metricName", - message: `Metric '${metricName}' not found in discovered metrics. Available: ${discovered.slice(0, 10).join(", ")}${discovered.length > 10 ? "..." : ""}`, - code: "INVALID_METRIC_NAME", - }); - console.log(`[MetricValidation] Rejecting query-metrics: '${metricName}' not in usage list.`); - - // Check if describe-metrics was just called (fresh) to avoid infinite loops - const isFresh = context.toolResults.some((result) => { - if (result.name !== "describe-metrics") return false; - const resultScope = result.arguments?.scope as JsonObject | undefined; - const resultService = resultScope?.service as string | undefined; - // Match if both are undefined/null OR both have the same service - return (service === undefined && resultService === undefined) || - (service !== undefined && resultService === service); - }); - - if (isFresh) { - console.log(`[MetricValidation] Not suggesting replacement because describe-metrics was already called in this turn.`); - return { - valid: false, - errors, - // Do NOT provide replacementCall -> drops the call, forces LLM (or fallback) to handle error - }; - } - - // Re-suggest describe-metrics to refresh the list/context + // If we have a specific list of discovered metrics, validate or auto-populate the metric name + if (Array.isArray(discoveredMetrics) && discoveredMetrics.length > 0) { + const expr = toolArgs.expression as { metricName?: string } | undefined; + if (expr?.metricName && !discoveredMetrics.includes(expr.metricName)) { return { valid: false, - errors, - replacementCall: { - name: "describe-metrics", - arguments: { scope: service ? { service } : null }, - }, + errors: [ + { + field: "expression.metricName", + message: `Metric '${expr.metricName}' was not found in describe-metrics results. Available: ${discoveredMetrics.slice(0, 5).join(", ")}`, + code: "INVALID_METRIC_NAME", + }, + ], + // Do NOT suggest describe-metrics again if it was already called in this turn + replacementCall: undefined, + }; + } + // Auto-populate expression.metricName from discovered metrics if missing + if (!expr?.metricName) { + normalizedArgs.expression = { + ...(typeof normalizedArgs.expression === "object" && normalizedArgs.expression !== null + ? normalizedArgs.expression as JsonObject + : {}), + metricName: discoveredMetrics[0], }; } } diff --git a/src/engine/planRefiner.ts b/src/engine/planRefiner.ts index 3ce02cc..cf3955f 100644 --- a/src/engine/planRefiner.ts +++ b/src/engine/planRefiner.ts @@ -71,7 +71,9 @@ export class PlanRefiner { ); // Also run basic schema validation as a fallback/safety check - const schemaValidation = validateToolCall(call, tool); + // Use normalizedArgs if available (validation handler may have fixed the args) + const argsForSchemaValidation = validation.normalizedArgs ?? call.arguments; + const schemaValidation = validateToolCall({ ...call, arguments: argsForSchemaValidation }, tool); if (validation.valid && schemaValidation.valid) { // Use normalized (fixed) arguments if available @@ -94,7 +96,8 @@ export class PlanRefiner { } } - // Return replacement calls first, then valid calls + // If any replacements were generated, return replacements + other valid calls. + // Only the invalid call is replaced; other valid calls (e.g. query-logs) still run. const validCalls = validatedCalls.filter((v) => v.valid).map((v) => v.call); return [...replacementCalls, ...validCalls]; } diff --git a/src/engine/toolsSchema.ts b/src/engine/toolsSchema.ts index 1b3b0b2..0710d89 100644 --- a/src/engine/toolsSchema.ts +++ b/src/engine/toolsSchema.ts @@ -86,21 +86,24 @@ export function validateToolCall( const propSchema = properties[key] as JsonObject | undefined; if (!propSchema) continue; // Unknown property, skip - const expectedType = propSchema.type as string | undefined; + const rawExpectedType = propSchema.type as string | string[] | undefined; + const expectedTypes = Array.isArray(rawExpectedType) ? rawExpectedType : (rawExpectedType ? [rawExpectedType] : []); const actualType = Array.isArray(value) ? "array" : typeof value; // Special handling for integer: JavaScript typeof returns 'number' for all numbers - const typesMatch = - expectedType === actualType || - (expectedType === "integer" && actualType === "number"); + const typesMatch = expectedTypes.length === 0 || expectedTypes.some(t => + t === actualType || (t === "integer" && actualType === "number") + ); - if (expectedType && !typesMatch) { + if (!typesMatch) { errors.push( - `Field '${key}' has type ${actualType}, expected ${expectedType}`, + `Field '${key}' has type ${actualType}, expected ${expectedTypes.join(",")}`, ); continue; // Skip further validation if type is wrong } + const expectedType = expectedTypes.find(t => t !== "null"); + // Timestamp validation for common time fields if ( typeof value === "string" && @@ -195,8 +198,8 @@ export function validateToolCall( const rawValue = rawArgs[key]; const rawObj = typeof rawValue === "object" && - rawValue !== null && - !Array.isArray(rawValue) + rawValue !== null && + !Array.isArray(rawValue) ? (rawValue as JsonObject) : undefined; const nestedResult = validateObject(value as JsonObject, propSchema, rawObj); @@ -257,12 +260,17 @@ function validateObject( const propSchema = properties[key] as JsonObject | undefined; if (!propSchema) continue; - const expectedType = propSchema.type as string | undefined; + const rawExpectedType = propSchema.type as string | string[] | undefined; + const expectedTypes = Array.isArray(rawExpectedType) ? rawExpectedType : (rawExpectedType ? [rawExpectedType] : []); const actualType = Array.isArray(value) ? "array" : typeof value; - if (expectedType && expectedType !== actualType) { + const typesMatch = expectedTypes.length === 0 || expectedTypes.some(t => + t === actualType || (t === "integer" && actualType === "number") + ); + + if (!typesMatch) { errors.push( - `Field '${key}' has type ${actualType}, expected ${expectedType}`, + `Field '${key}' has type ${actualType}, expected ${expectedTypes.join(",")}`, ); } } diff --git a/src/llms/mock.ts b/src/llms/mock.ts index 0f73402..83dac14 100644 --- a/src/llms/mock.ts +++ b/src/llms/mock.ts @@ -1,91 +1,942 @@ - import { + JsonObject, LlmClient, LlmMessage, LlmResponse, Tool, ToolCall, } from "../types.js"; +import { HandlerUtils } from "../engine/handlers/utils.js"; + +type MockPhase = + | "planner" + | "json-planner" + | "refinement" + | "json-refinement" + | "synthesis"; + +type TimeWindow = { + start: string; + end: string; + step: number; +}; + +const SERVICE_STOP_WORDS = new Set([ + "show", + "find", + "list", + "check", + "investigate", + "query", + "look", + "what", + "why", + "where", + "when", + "recent", + "current", + "today", + "last", + "hour", + "hours", + "minute", + "minutes", + "for", + "with", + "from", + "about", + "into", + "during", + "around", + "errors", + "error", + "latency", + "cpu", + "memory", + "traffic", + "metric", + "metrics", + "logs", + "alerts", + "incidents", + "incident", + "service", + "services", + "ticket", + "tickets", + "deployment", + "deployments", + "team", + "teams", + "runbook", + "runbooks", + "question", + "plan", + "follow", + "concrete", + "arguments", + "returned", + "results", + "count", + "data", + "tool", + "calls", +]); -// Mock LLM that behaves like a real planner: inspects the user message and available tools, -// emits a structured plan (toolCalls) and stable-but-random IDs for conversation/response. export class MockLlm implements LlmClient { async chat(messages: LlmMessage[], tools: Tool[]): Promise { - // If no tools are supplied, we are in synthesis mode; return a structured answer instead of a plan. - if (!tools.length) { - const lastUser = messages.filter((m) => m.role === "user").pop(); - const summary = lastUser?.content?.includes("Tool results:") - ? "Synthesized answer from tool outputs." - : "Synthesized answer."; - return { - content: JSON.stringify({ - conclusion: summary, - evidence: ["mock evidence"], - confidence: 0.9, + const phase = detectPhase(messages, tools); + + switch (phase) { + case "planner": + case "refinement": + return planWithTools(messages, tools, phase === "refinement"); + case "json-planner": + case "json-refinement": + return planAsJson(messages); + case "synthesis": + default: + return synthesize(messages); + } + } +} + +function detectPhase(messages: LlmMessage[], tools: Tool[]): MockPhase { + const systemText = messages + .filter((message) => message.role === "system") + .map((message) => message.content) + .join("\n"); + + if (tools.length > 0) { + if (systemText.includes("(Refinement)")) return "refinement"; + return "planner"; + } + + if (systemText.includes("JSON Planning Mode")) return "json-planner"; + if (systemText.includes("JSON Refinement Mode")) return "json-refinement"; + return "synthesis"; +} + +function planWithTools( + messages: LlmMessage[], + tools: Tool[], + isRefinement: boolean, +): LlmResponse { + const availableTools = tools.map((tool) => tool.name); + const toolSet = new Set(availableTools); + const question = extractQuestion(messages); + const lowerQuestion = question.toLowerCase(); + const lastUser = getLastMessage(messages, "user")?.content ?? ""; + const window = inferTimeWindow(`${question}\n${lastUser}`); + const service = inferService(question) ?? inferService(lastUser); + const incidentId = inferIdentifier(question, /\binc-[a-z0-9-]+\b/i); + const ticketId = inferIdentifier(question, /\b(?:ticket|tkt)-[a-z0-9-]+\b/i); + const usedTools = collectUsedTools(messages); + const calls: ToolCall[] = []; + + const addCall = (name: string, args: JsonObject): void => { + if (!toolSet.has(name) || calls.some((call) => call.name === name)) return; + if (isRefinement && usedTools.has(name)) return; + calls.push({ name, arguments: args }); + }; + + const wantsIncidents = + /\b(incident|incidents|outage|outages|degraded|impact|impacts|sev\d|root cause)\b/i.test(question); + const wantsLogs = + /\b(log|logs|trace|traces|error|errors|500|timeout|timeouts|exception|exceptions)\b/i.test(question); + const wantsMetrics = + /\b(metric|latency|cpu|memory|traffic|throughput|rps|error rate)\b/i.test( + question, + ); + const wantsAlerts = + /\b(alert|alerts|page|pages|pagerduty|detector|detectors)\b/i.test(question); + const wantsServices = /\b(service|services)\b/i.test(question); + const wantsTickets = /\b(ticket|jira)\b/i.test(question); + const wantsDeployments = + /\b(deploy|deployment|release|rollout)\b/i.test(question); + const wantsTeams = /\b(team|owner|on-call|oncall|who owns|who is)\b/i.test(question); + const wantsRunbooks = + /\b(runbook|playbook|orchestration)\b/i.test(question) || + wantsIncidents || + ((wantsLogs || wantsMetrics) && service !== undefined); + const wantsStatus = + /\b(status|health|overview|how is|what.*state)\b/i.test(question) && + !wantsIncidents && !wantsLogs && !wantsMetrics; + const wantsChanges = + /\b(what changed|changes|diff|compare|regression)\b/i.test(question); + const isBroadInvestigation = + wantsIncidents && + (wantsLogs || wantsMetrics || lowerQuestion.includes("what happened")); + + if (wantsStatus && service) { + addCall( + "query-incidents", + compactObject({ + limit: 3, + severities: ["sev1", "sev2"], + service, + start: window.start, + end: window.end, + }), + ); + addCall( + "query-alerts", + compactObject({ + limit: 5, + start: window.start, + end: window.end, + scope: { service } as JsonObject, + }), + ); + if (toolSet.has("describe-metrics") && !usedTools.has("describe-metrics")) { + addCall( + "describe-metrics", + compactObject({ scope: { service } as JsonObject }), + ); + } + addCall( + "query-orchestration-plans", + compactObject({ query: `${service} incident` }), + ); + } + + if (wantsChanges) { + addCall( + "query-deployments", + compactObject({ + start: window.start, + end: window.end, + scope: service ? ({ service } as JsonObject) : undefined, + }), + ); + addCall( + "query-incidents", + compactObject({ + limit: 3, + severities: ["sev1", "sev2"], + service, + start: window.start, + end: window.end, + }), + ); + } + + if (wantsServices) { + addCall("query-services", service ? { query: service } : {}); + } + + if (wantsIncidents) { + addCall( + "query-incidents", + compactObject({ + limit: isBroadInvestigation ? 3 : 2, + severities: lowerQuestion.includes("sev1") ? ["sev1"] : ["sev1", "sev2"], + service, + start: window.start, + end: window.end, + }), + ); + } + + if (incidentId && toolSet.has("get-incident-timeline")) { + addCall("get-incident-timeline", { id: incidentId }); + } + + if (wantsAlerts) { + addCall( + "query-alerts", + compactObject({ + limit: 5, + start: window.start, + end: window.end, + scope: service ? ({ service } as JsonObject) : undefined, + }), + ); + } + + if (wantsMetrics) { + if (toolSet.has("describe-metrics") && !usedTools.has("describe-metrics")) { + addCall( + "describe-metrics", + compactObject({ + scope: service ? ({ service } as JsonObject) : undefined, + }), + ); + } else { + addCall( + "query-metrics", + compactObject({ + expression: inferMetricExpression(lowerQuestion), + start: window.start, + end: window.end, + step: window.step, + scope: service ? ({ service } as JsonObject) : undefined, }), - toolCalls: [], + ); + } + } + + if (wantsLogs || (isRefinement && usedTools.has("query-incidents"))) { + addCall( + "query-logs", + compactObject({ + expression: { + search: inferLogSearch(lowerQuestion, service), + }, + start: window.start, + end: window.end, + scope: service ? ({ service } as JsonObject) : undefined, + }), + ); + } + + if (wantsTickets || ticketId) { + addCall( + "query-tickets", + compactObject({ + query: ticketId ?? service ?? "incident follow-up", + }), + ); + } + + if (wantsDeployments) { + addCall( + "query-deployments", + compactObject({ + start: window.start, + end: window.end, + scope: service ? ({ service } as JsonObject) : undefined, + }), + ); + } + + if (wantsTeams) { + addCall("query-teams", service ? { service } : {}); + } + + if (wantsRunbooks) { + addCall( + "query-orchestration-plans", + compactObject({ + query: service ? `${service} incident` : "incident mitigation", + }), + ); + } + + if (isRefinement) { + const refinementCalls = refinePlan(messages, tools, calls, service, window); + if (refinementCalls.length > 0) { + return { + content: "I found likely follow-up checks based on the previous tool results.", + toolCalls: refinementCalls, }; } + } - const user = messages.filter((m) => m.role === "user").pop(); - const text = (user?.content || "").toLowerCase(); - const toolNames = new Set(tools.map((t) => t.name)); + if (calls.length === 0) { + const fallbackTool = availableTools.find((name) => name !== "health"); + if (fallbackTool) { + addCall( + fallbackTool, + fallbackArguments(fallbackTool, service, window), + ); + } + } - const calls: ToolCall[] = []; + return { + content: buildPlannerNarrative(calls, question, isRefinement), + toolCalls: calls.slice(0, 5), + }; +} - // If asking about incidents/impact, fetch top 2 severe incidents. - if (text.includes("incident") || text.includes("impactful")) { - if (toolNames.has("query-incidents")) { - calls.push({ - name: "query-incidents", - arguments: { limit: 2, severities: ["sev1", "sev2"] }, - }); - } +function planAsJson(messages: LlmMessage[]): LlmResponse { + const availableTools = parseToolsFromMessages(messages); + const toolObjects = availableTools.map((name) => ({ name })); + const planned = planWithTools(messages, toolObjects, isJsonRefinement(messages)); + return { + content: JSON.stringify( + { + reasoning: "Selected concrete tools and arguments from the user request.", + toolCalls: planned.toolCalls ?? [], + }, + null, + 2, + ), + toolCalls: [], + }; +} + +function synthesize(messages: LlmMessage[]): LlmResponse { + const prompt = getLastMessage(messages, "user")?.content ?? ""; + const toolResults = extractToolResultsFromPrompt(prompt); + const incidents = extractMatches(prompt, /\binc-[a-z0-9-]+\b/gi); + const tickets = extractMatches(prompt, /\b(?:ticket|tkt)-[a-z0-9-]+\b/gi); + const services = extractServiceMentions(prompt); + const alerts = extractMatches(prompt, /\balt-[a-z0-9-]+\b/gi); + const deployments = extractMatches(prompt, /\bdep-[a-z0-9-]+\b/gi); + const orchestrationPlans = extractMatches(prompt, /\bplan-[a-z0-9-]+\b/gi); + const teams = extractTeamMentions(prompt); + const evidence = buildEvidence(prompt, incidents, services, toolResults); + const conclusion = buildConclusion(prompt, services, incidents, orchestrationPlans, toolResults); + const response = { + conclusion, + evidence, + missing: evidence.length >= 2 ? [] : ["More tool data would improve confidence."], + actions: + orchestrationPlans.length > 0 + ? [ + { + type: "orchestration_plan" as const, + id: orchestrationPlans[0], + name: buildOrchestrationPlanName(toolResults, orchestrationPlans[0]), + reason: buildOrchestrationPlanReason(services, incidents), + }, + ] + : [], + references: compactObject({ + incidents, + services, + tickets, + alerts, + deployments, + teams, + orchestrationPlans, + }), + confidence: estimateConfidence(prompt, evidence.length), + }; + + return { + content: JSON.stringify(response, null, 2), + toolCalls: [], + }; +} + +function refinePlan( + messages: LlmMessage[], + tools: Tool[], + initialCalls: ToolCall[], + service: string | undefined, + window: TimeWindow, +): ToolCall[] { + const lastUser = getLastMessage(messages, "user")?.content ?? ""; + const usedTools = collectUsedTools(messages); + const toolSet = new Set(tools.map((tool) => tool.name)); + const calls = [...initialCalls]; + + // Extract entities from prior tool result JSON for targeted follow-ups. + // Prefer JSON-extracted services over regex-inferred ones since they come + // from actual tool output rather than prompt text parsing. + const discoveredEntities = extractEntitiesFromToolResults(lastUser); + const effectiveService = discoveredEntities.services[0] ?? service; + + const pushIfAvailable = (name: string, args: JsonObject): void => { + if (!toolSet.has(name) || usedTools.has(name) || calls.some((call) => call.name === name)) { + return; } + calls.push({ name, arguments: args }); + }; - // If asking about logs, request recent error logs (placeholder window). - if (text.includes("log") && toolNames.has("query-logs")) { - calls.push({ - name: "query-logs", - arguments: { - query: "error OR 500", - start: new Date(Date.now() - 15 * 60 * 1000).toISOString(), - end: new Date().toISOString(), - }, - }); + const hasIncidentData = /query-incidents|get-incident-timeline|inc-[a-z0-9-]+/i.test(lastUser); + const hasLogData = /query-logs|error|errors|timeout|timeouts|exception|exceptions/i.test(lastUser); + const hasMetricData = /query-metrics|describe-metrics|latency|cpu|memory|rps/i.test(lastUser); + const hasAlertData = /query-alerts|pagerduty|alert/i.test(lastUser); + const hasDeploymentData = /query-deployments|dep-[a-z0-9-]+/i.test(lastUser); + + if (hasIncidentData && !hasLogData) { + pushIfAvailable( + "query-logs", + compactObject({ + expression: { search: inferLogSearch(lastUser.toLowerCase(), effectiveService) }, + start: window.start, + end: window.end, + scope: effectiveService ? ({ service: effectiveService } as JsonObject) : undefined, + }), + ); + } + + if ((hasIncidentData || hasLogData) && !hasMetricData) { + if (toolSet.has("describe-metrics") && !usedTools.has("describe-metrics")) { + pushIfAvailable( + "describe-metrics", + compactObject({ + scope: effectiveService ? ({ service: effectiveService } as JsonObject) : undefined, + }), + ); + } else { + pushIfAvailable( + "query-metrics", + compactObject({ + expression: { metricName: "latency_p95" }, + start: window.start, + end: window.end, + step: window.step, + scope: effectiveService ? ({ service: effectiveService } as JsonObject) : undefined, + }), + ); } + } - // If asking about metrics/latency/cpu, request key series. - if ( - (text.includes("latency") || - text.includes("cpu") || - text.includes("memory")) && - toolNames.has("query-metrics") - ) { - calls.push({ - name: "query-metrics", - arguments: { - expression: "latency_p95, cpu_usage, memory_usage, rps", - start: new Date(Date.now() - 30 * 60 * 1000).toISOString(), - end: new Date().toISOString(), - step: 60, - }, + if ((hasIncidentData || hasLogData || hasMetricData) && !hasAlertData) { + pushIfAvailable( + "query-alerts", + compactObject({ + limit: 5, + start: window.start, + end: window.end, + scope: effectiveService ? ({ service: effectiveService } as JsonObject) : undefined, + }), + ); + } + + // Deployments often correlate with incidents — check for recent deploys + if (hasIncidentData && !hasDeploymentData) { + pushIfAvailable( + "query-deployments", + compactObject({ + start: window.start, + end: window.end, + scope: effectiveService ? ({ service: effectiveService } as JsonObject) : undefined, + }), + ); + } + + // Discover team ownership when a concrete service is found + if (effectiveService && !usedTools.has("query-teams")) { + pushIfAvailable( + "query-teams", + { service: effectiveService }, + ); + } + + if ( + (hasIncidentData || hasMetricData || hasLogData) && + !usedTools.has("query-orchestration-plans") + ) { + pushIfAvailable( + "query-orchestration-plans", + compactObject({ + query: effectiveService ? `${effectiveService} mitigation` : "incident mitigation", + }), + ); + } + + const enoughData = + [hasIncidentData, hasLogData, hasMetricData].filter(Boolean).length >= 2; + return enoughData ? [] : calls.slice(0, 5); +} + +function buildPlannerNarrative( + calls: ToolCall[], + question: string, + isRefinement: boolean, +): string { + if (calls.length === 0) { + return isRefinement + ? "The existing results appear sufficient, so I would stop tool use here." + : `I could not infer a strong plan from "${question}", so I used a conservative fallback.`; + } + + const toolNames = calls.map((call) => call.name).join(", "); + return isRefinement + ? `I need one more pass to validate the hypothesis. Next tools: ${toolNames}.` + : `I would start with these concrete checks: ${toolNames}.`; +} + +function extractQuestion(messages: LlmMessage[]): string { + const lastUser = getLastMessage(messages, "user")?.content ?? ""; + const questionMatch = lastUser.match(/Question:\s*([\s\S]*?)\nTool results/i); + if (questionMatch) return questionMatch[1].trim(); + + const requestMatch = lastUser.match(/User request:\s*([\s\S]*?)\nReturn only JSON/i); + if (requestMatch) return requestMatch[1].trim(); + + return lastUser.trim(); +} + +function getLastMessage( + messages: LlmMessage[], + role: LlmMessage["role"], +): LlmMessage | undefined { + return [...messages].reverse().find((message) => message.role === role); +} + +function parseToolsFromMessages(messages: LlmMessage[]): string[] { + const systemText = messages + .filter((message) => message.role === "system") + .map((message) => message.content) + .join("\n"); + const matches = systemText.match(/(?:^|\n)(?:- |• )([a-z0-9-]+)/gim) ?? []; + const tools = matches + .map((match) => match.replace(/(?:^|\n)(?:- |• )/, "").trim()) + .filter((name) => name.includes("-")); + return [...new Set(tools)]; +} + +function isJsonRefinement(messages: LlmMessage[]): boolean { + return messages.some( + (message) => + message.role === "system" && + message.content.includes("JSON Refinement Mode"), + ); +} + +function inferTimeWindow(text: string): TimeWindow { + const lower = text.toLowerCase(); + const now = new Date(); + + let minutes = 60; + if (/\b(last|past)\s+15\s*(m|min|minutes)\b/.test(lower)) minutes = 15; + else if (/\b(last|past)\s+30\s*(m|min|minutes)\b/.test(lower)) minutes = 30; + else if (/\b(last|past)\s+2\s*(h|hr|hour|hours)\b/.test(lower)) minutes = 120; + else if (/\b(today|current)\b/.test(lower)) minutes = 6 * 60; + else if (/\b(last|past)\s+24\s*(h|hr|hour|hours)\b/.test(lower)) minutes = 24 * 60; + + const start = new Date(now.getTime() - minutes * 60 * 1000).toISOString(); + return { + start, + end: now.toISOString(), + step: minutes <= 30 ? 60 : 300, + }; +} + +function inferService(text: string): string | undefined { + // First extract all words and filter out stop words and known entity prefixes + const keywords = HandlerUtils.extractKeywords(text).filter((keyword) => { + // Ignore words that look like entity ID prefixes (inc-, dep-, alt-, plan-) + if (/^(inc|dep|alt|plan|tkt|ticket|alert|incident|deployment)[-0-9]*$/i.test(keyword)) { + return false; + } + return !SERVICE_STOP_WORDS.has(keyword); + }); + + // Then try to match a direct generic "in X" or "for X" pattern + const directMatch = text.match( + /\b(?:for|in|on|service|services)\s+([a-z][a-z0-9-]{2,})\b/i, + ); + if (directMatch) { + const candidate = directMatch[1].toLowerCase(); + // Only use if it survived the stop word / prefix filter + if (keywords.includes(candidate)) return candidate; + } + + return keywords.find((keyword) => /^[a-z][a-z0-9-]{2,}$/.test(keyword)); +} + +function inferIdentifier(text: string, pattern: RegExp): string | undefined { + const match = text.match(pattern); + return match?.[0]?.toLowerCase(); +} + +function inferMetricExpression(question: string): JsonObject { + if (question.includes("cpu")) return { metricName: "cpu_usage" }; + if (question.includes("memory")) return { metricName: "memory_usage" }; + if (question.includes("traffic") || question.includes("rps")) { + return { metricName: "request_rate" }; + } + if (question.includes("error rate")) return { metricName: "error_rate" }; + return { metricName: "latency_p95" }; +} + +function inferLogSearch(question: string, service?: string): string { + if (question.includes("timeout")) return service ? `${service} timeout` : "timeout"; + if (question.includes("500")) return service ? `${service} 500` : "500"; + if (question.includes("exception")) { + return service ? `${service} exception` : "exception"; + } + return service ? `${service} error OR timeout` : "error OR timeout OR 500"; +} + +function collectUsedTools(messages: LlmMessage[]): Set { + const text = messages + .filter((message) => message.role !== "system") + .map((message) => message.content) + .join("\n"); + const matches = text.match(/\b(?:query|get|describe)-[a-z0-9-]+\b/gi) ?? []; + return new Set(matches.map((match) => match.toLowerCase())); +} + +function fallbackArguments( + toolName: string, + service: string | undefined, + window: TimeWindow, +): JsonObject { + switch (toolName) { + case "query-incidents": + return compactObject({ limit: 2, service, start: window.start, end: window.end }); + case "query-logs": + return compactObject({ + expression: { search: inferLogSearch("", service) }, + start: window.start, + end: window.end, + }); + case "query-metrics": + return compactObject({ + expression: { metricName: "latency_p95" }, + start: window.start, + end: window.end, + step: window.step, + }); + case "describe-metrics": + return compactObject({ + scope: service ? ({ service } as JsonObject) : undefined, }); + default: + return service ? { service } : {}; + } +} + +function compactObject( + value: Record, +): JsonObject { + const entries = Object.entries(value).filter(([, entry]) => entry !== undefined); + return Object.fromEntries(entries) as JsonObject; +} + +function extractMatches(text: string, pattern: RegExp): string[] { + return [...new Set((text.match(pattern) ?? []).map((value) => value.toLowerCase()))]; +} + +function extractServiceMentions(text: string): string[] { + const matches = text.match(/\b([a-z][a-z0-9-]{2,})\s+service\b/gi) ?? []; + const normalized = matches.map((match) => match.replace(/\s+service$/i, "").toLowerCase()); + const inferred = inferService(text); + // Also extract services from parsed tool result JSON + const toolResults = extractToolResultsFromPrompt(text); + const toolServices: string[] = []; + for (const tr of toolResults) { + if (!Array.isArray(tr.data)) continue; + for (const item of tr.data) { + if (typeof item === "object" && item !== null && "service" in item) { + const svc = String((item as Record).service); + if (svc && svc !== "undefined") toolServices.push(svc.toLowerCase()); + } } + } + return [...new Set([...(inferred ? [inferred] : []), ...normalized, ...toolServices])]; +} - // Fallback: if no actionable calls, still return a noop plan. - const hasNonPlaceholderArgs = calls.some((c) => - Object.values(c.arguments).every( - (v) => typeof v !== "string" || !v.includes("{{"), - ), - ); - const responseText = hasNonPlaceholderArgs - ? "Mock planning complete." - : "Mock plan with placeholders."; +function extractTeamMentions(text: string): string[] { + const teams: string[] = []; + const toolResults = extractToolResultsFromPrompt(text); + for (const tr of toolResults) { + if (!tr.tool.includes("team")) continue; + const items = Array.isArray(tr.data) ? tr.data : [tr.data]; + for (const item of items) { + if (typeof item === "object" && item !== null) { + const obj = item as Record; + const name = obj.name ?? obj.id; + if (typeof name === "string" && name) teams.push(name.toLowerCase()); + } + } + } + return [...new Set(teams)]; +} - return { - content: responseText, - toolCalls: calls, - }; +type ParsedToolResult = { tool: string; data: unknown }; + +function extractToolResultsFromPrompt(text: string): ParsedToolResult[] { + const results: ParsedToolResult[] = []; + // Match lines from the synthesis prompt like: + // "query-incidents: [{...}]" or "- query-incidents => [{...}]" + const linePattern = /(?:^|\n)(?:- )?([a-z][a-z0-9-]+)(?:\s*(?::|=>|returned)\s*)(.+)/gi; + let match: RegExpExecArray | null; + while ((match = linePattern.exec(text)) !== null) { + const tool = match[1].toLowerCase(); + const raw = match[2].trim(); + try { + const parsed = JSON.parse(raw); + results.push({ tool, data: parsed }); + } catch { + // Not valid JSON — skip + } } + return results; +} + +function extractEntitiesFromToolResults(text: string): { + services: string[]; + incidentIds: string[]; + statuses: string[]; +} { + const services: string[] = []; + const incidentIds: string[] = []; + const statuses: string[] = []; + + const processItem = (item: unknown): void => { + if (typeof item !== "object" || item === null) return; + const obj = item as Record; + if (typeof obj.service === "string" && obj.service) { + services.push(obj.service.toLowerCase()); + } + if (typeof obj.id === "string" && /^inc-/i.test(obj.id)) { + incidentIds.push(obj.id.toLowerCase()); + } + if (typeof obj.status === "string" && obj.status) { + statuses.push(obj.status.toLowerCase()); + } + }; + + // Use the line-based parser for reliable JSON extraction from tool result lines + const toolResults = extractToolResultsFromPrompt(text); + for (const tr of toolResults) { + const items = Array.isArray(tr.data) ? tr.data : [tr.data]; + for (const item of items) processItem(item); + } + + // Fallback: try to parse inline JSON arrays + const jsonArrayPattern = /\[.*?\]/gs; + let match: RegExpExecArray | null; + while ((match = jsonArrayPattern.exec(text)) !== null) { + try { + const parsed = JSON.parse(match[0]); + const items = Array.isArray(parsed) ? parsed : [parsed]; + for (const item of items) processItem(item); + } catch { + // Not valid JSON + } + } + return { + services: [...new Set(services)], + incidentIds: [...new Set(incidentIds)], + statuses: [...new Set(statuses)], + }; +} + +function buildEvidence( + prompt: string, + incidents: string[], + services: string[], + toolResults: ParsedToolResult[] = [], +): string[] { + const evidence: string[] = []; + + // Build evidence from parsed tool result data when available + for (const tr of toolResults) { + if (tr.tool.includes("incident") && Array.isArray(tr.data)) { + for (const item of tr.data) { + if (typeof item !== "object" || item === null) continue; + const obj = item as Record; + const parts = [`Incident ${obj.id ?? "unknown"}`]; + if (obj.status) parts.push(`status=${obj.status}`); + if (obj.severity) parts.push(`severity=${obj.severity}`); + if (obj.service) parts.push(`service=${obj.service}`); + evidence.push(`${parts.join(", ")}.`); + } + } + if (tr.tool.includes("alert") && Array.isArray(tr.data)) { + const count = tr.data.length; + if (count > 0) evidence.push(`${count} alert(s) found in the requested window.`); + } + if (tr.tool.includes("deployment") && Array.isArray(tr.data)) { + for (const item of tr.data) { + if (typeof item !== "object" || item === null) continue; + const obj = item as Record; + if (obj.id || obj.service) { + evidence.push(`Deployment ${obj.id ?? ""} for ${obj.service ?? "unknown service"} detected.`); + } + } + } + } + + // Fall back to regex-based evidence when no tool results are parsed + if (evidence.length === 0) { + if (incidents.length > 0) { + evidence.push(`Investigated incident ${incidents[0]} from tool output.`); + } + if (services.length > 0) { + evidence.push(`Observed service scope: ${services[0]}.`); + } + } + + if (/error|timeout|500/i.test(prompt)) { + evidence.push("Logs indicate errors or timeouts in the requested window."); + } + if (/latency|cpu|memory|metric/i.test(prompt)) { + evidence.push("Metrics were included in the evidence used for synthesis."); + } + + return evidence.slice(0, 6); +} + +function buildOrchestrationPlanName( + toolResults: ParsedToolResult[], + planId: string, +): string { + for (const tr of toolResults) { + if (!tr.tool.includes("orchestration")) continue; + const items = Array.isArray(tr.data) ? tr.data : [tr.data]; + for (const item of items) { + if (typeof item !== "object" || item === null) continue; + const obj = item as Record; + if (String(obj.id).toLowerCase() === planId) { + const name = obj.title ?? obj.name ?? obj.displayName; + if (typeof name === "string" && name) return name; + } + } + } + return "Recommended mitigation plan"; +} + +function buildOrchestrationPlanReason( + services: string[], + incidents: string[], +): string { + if (services.length > 0 && incidents.length > 0) { + return `Targets ${services[0]} where ${incidents[0]} is active.`; + } + if (services.length > 0) { + return `Mitigates operational issues observed in ${services[0]}.`; + } + return "It aligns with the incident signals already collected."; +} + +function buildConclusion( + prompt: string, + services: string[], + incidents: string[], + orchestrationPlans: string[], + toolResults: ParsedToolResult[] = [], +): string { + const serviceText = services[0] ?? "the relevant service"; + const runbookText = + orchestrationPlans.length > 0 + ? ` Recommended Action: Run orchestration plan ${orchestrationPlans[0]}.` + : ""; + + // Build a richer incident summary from parsed tool results + let incidentSummary = ""; + for (const tr of toolResults) { + if (!tr.tool.includes("incident") || !Array.isArray(tr.data)) continue; + for (const item of tr.data) { + if (typeof item !== "object" || item === null) continue; + const obj = item as Record; + const id = obj.id ?? "unknown"; + const status = obj.status ?? "unknown"; + const severity = obj.severity; + incidentSummary = severity + ? `Incident ${id} (${severity}, ${status}) appears central to the issue.` + : `Incident ${id} (${status}) appears central to the issue.`; + break; // Use the first incident for the summary + } + } + if (!incidentSummary && incidents.length > 0) { + incidentSummary = `Incident ${incidents[0]} appears central to the issue.`; + } + + if (/latency|cpu|memory|metric/i.test(prompt)) { + return `${serviceText} shows operational signals worth investigating further. ${incidentSummary}${runbookText}`.trim(); + } + + if (/service|incident|alert|log/i.test(prompt)) { + return `${serviceText} has enough collected evidence for a preliminary assessment. ${incidentSummary}${runbookText}`.trim(); + } + + return `This is a synthesized mock answer based on the provided tool results.${runbookText}`.trim(); +} + +function estimateConfidence(prompt: string, evidenceCount: number): number { + let confidence = 0.62 + evidenceCount * 0.08; + if (/missing|unknown|no data/i.test(prompt)) confidence -= 0.1; + if (/incident|error|latency|cpu|memory|alert/i.test(prompt)) confidence += 0.05; + return Math.max(0.35, Math.min(0.93, Number(confidence.toFixed(2)))); } diff --git a/tests/copilotEngine.followups.test.ts b/tests/copilotEngine.followups.test.ts index e73fb5c..f88e2a9 100644 --- a/tests/copilotEngine.followups.test.ts +++ b/tests/copilotEngine.followups.test.ts @@ -107,12 +107,16 @@ test('emits references instead of links with ids and ranges for console', async async listTools() { console.log('StubMcp listTools called'); return [ + { name: 'describe-metrics' } as Tool, { name: 'query-logs' } as Tool, { name: 'query-metrics' } as Tool, ]; }, async callTool(call) { calls.push(call); + if (call.name === 'describe-metrics') { + return { name: call.name, result: ['latency_p95'] }; + } if (call.name === 'query-logs') { return { name: call.name, result: [{ id: 'log-1', message: 'error' }] } as ToolResult; } @@ -189,12 +193,16 @@ test('drills into incident timelines/logs/metrics when user asks for root cause' return [ { name: 'query-incidents' } as Tool, { name: 'get-incident-timeline' } as Tool, + { name: 'describe-metrics' } as Tool, { name: 'query-logs' } as Tool, { name: 'query-metrics' } as Tool, ]; }, async callTool(call): Promise { calls.push(call); + if (call.name === 'describe-metrics') { + return { name: call.name, result: ['cpu_usage'] }; + } if (call.name === 'query-incidents') { return { name: call.name, diff --git a/tests/copilotEngine.planning.test.ts b/tests/copilotEngine.planning.test.ts index d001972..4ff2cdb 100644 --- a/tests/copilotEngine.planning.test.ts +++ b/tests/copilotEngine.planning.test.ts @@ -346,6 +346,9 @@ test('validates invalid LLM calls and provides heuristic fallback', async () => const mcp: StubMcp = { async listTools() { return [ + { + name: 'describe-metrics', + } as Tool, { name: 'query-metrics', inputSchema: { @@ -367,6 +370,9 @@ test('validates invalid LLM calls and provides heuristic fallback', async () => }, async callTool(call) { calls.push(call); + if (call.name === 'describe-metrics') { + return { name: call.name, result: ['cpu_usage'] }; + } return { name: call.name, result: { metrics: [] } }; }, }; @@ -374,15 +380,13 @@ test('validates invalid LLM calls and provides heuristic fallback', async () => const engine = makeEngine(llm, mcp); await engine.answer('check cpu metrics'); - // New behavior: Invalid LLM call is caught, but heuristics inject a valid fallback - console.log('DEBUG: calls length:', calls.length); - if (calls.length > 0) console.log('DEBUG: first call:', calls[0]); - - assert.equal(calls.length, 1, 'heuristics should inject valid query-metrics call after filtering invalid LLM call'); - assert.equal(calls[0].name, 'query-metrics'); + // describe-metrics runs first (replacing the invalid query-metrics call), + // then heuristics inject a valid query-metrics call + const metricsCall = calls.find(c => c.name === 'query-metrics'); + assert.ok(metricsCall, 'heuristics should inject valid query-metrics call after filtering invalid LLM call'); // The heuristic-injected call should have all required fields - const args = calls[0].arguments as JsonObject; + const args = metricsCall!.arguments as JsonObject; assert.ok(args.expression, 'should have expression field'); assert.ok(typeof args.step === 'number', 'should have step field'); assert.ok(args.start, 'should have start field'); @@ -458,8 +462,15 @@ test('filters out internal diagnostic tools from LLM visibility', async () => { test('adds default logs and metrics calls when user explicitly asks for them', async () => { const llm: LlmClient = { - async chat(_messages: LlmMessage[] = [], tools: Tool[] = [], _opts?: { chatId?: string }) { + async chat(messages: LlmMessage[], tools: Tool[], _opts?: { chatId?: string }) { if (tools.length) { + // On follow-up, if describe-metrics results are visible, plan query-metrics + const hasDescribeMetricsResult = messages.some(m => + m.role === 'user' && typeof m.content === 'string' && m.content.includes('describe-metrics') + ); + if (hasDescribeMetricsResult) { + return { content: 'plan', toolCalls: [{ name: 'query-metrics', arguments: { expression: { metricName: 'cpu_usage' }, step: 60 } }], chatId: 'conv-default-logs' }; + } return { content: 'plan', toolCalls: [], chatId: 'conv-default-logs' }; } return { content: JSON.stringify({ conclusion: 'done' }), toolCalls: [], chatId: 'conv-default-logs' }; @@ -469,11 +480,14 @@ test('adds default logs and metrics calls when user explicitly asks for them', a const calls: ToolCall[] = []; const mcp: StubMcp = { async listTools() { - return [{ name: 'query-logs' } as Tool, { name: 'query-metrics' } as Tool]; + return [{ name: 'describe-metrics' } as Tool, { name: 'query-logs' } as Tool, { name: 'query-metrics' } as Tool]; }, async callTool(call) { console.log('Test callTool:', call.name); calls.push(call); + if (call.name === 'describe-metrics') { + return { name: call.name, result: ['cpu_usage'] }; + } return { name: call.name, result: { ok: true } }; }, }; @@ -483,9 +497,9 @@ test('adds default logs and metrics calls when user explicitly asks for them', a console.log('Calls length:', calls.length); console.log('Calls:', calls.map(c => c.name)); - assert.equal(calls.length, 2); - const toolNames = calls.map(c => c.name).sort(); - assert.deepEqual(toolNames, ['query-logs', 'query-metrics']); + const toolNames = calls.map(c => c.name); + assert.ok(toolNames.includes('query-logs'), 'should call query-logs'); + assert.ok(toolNames.includes('query-metrics'), 'should call query-metrics'); const logsCall = calls.find(c => c.name === 'query-logs'); const metricsCall = calls.find(c => c.name === 'query-metrics'); diff --git a/tests/correlationDetector.test.ts b/tests/correlationDetector.test.ts index 5e4a5ad..e5235dc 100644 --- a/tests/correlationDetector.test.ts +++ b/tests/correlationDetector.test.ts @@ -8,13 +8,15 @@ test('CorrelationDetector: extracts events from tool results', () => { const results: ToolResult[] = [ { name: 'query-logs', - result: [ - { timestamp: '2024-01-01T10:00:00Z', message: 'error' }, - { timestamp: '2024-01-01T10:00:01Z', message: 'error' }, - { timestamp: '2024-01-01T10:00:02Z', message: 'error' }, - { timestamp: '2024-01-01T10:00:03Z', message: 'error' }, - { timestamp: '2024-01-01T10:00:04Z', message: 'error' }, - ], + result: { + entries: [ + { timestamp: '2024-01-01T10:00:00Z', message: 'error', severity: 'error' }, + { timestamp: '2024-01-01T10:00:01Z', message: 'error', severity: 'error' }, + { timestamp: '2024-01-01T10:00:02Z', message: 'error', severity: 'error' }, + { timestamp: '2024-01-01T10:00:03Z', message: 'error', severity: 'error' }, + { timestamp: '2024-01-01T10:00:04Z', message: 'error', severity: 'error' }, + ] + }, }, ]; diff --git a/tests/engine/handlers/incident/referenceHandler.test.ts b/tests/engine/handlers/incident/referenceHandler.test.ts index e50dcaf..2ad215a 100644 --- a/tests/engine/handlers/incident/referenceHandler.test.ts +++ b/tests/engine/handlers/incident/referenceHandler.test.ts @@ -1,128 +1,81 @@ import assert from 'node:assert/strict'; import { test } from 'node:test'; import { incidentReferenceHandler } from '../../../../src/engine/handlers/incident/referenceHandler.js'; -import { HandlerContext } from '../../../../src/types.js'; +import { HandlerContext, ToolResult } from '../../../../src/types.js'; test('incidentReferenceHandler', async (t) => { - const context: HandlerContext = { + const createContext = (toolResults: ToolResult[] = []): HandlerContext => ({ chatId: 'test', turnNumber: 1, conversationHistory: [], - toolResults: [], + toolResults, userQuestion: 'test' - }; + }); await t.test('resolves ID from recent tool result', async () => { - const testContext = { - ...context, - conversationHistory: [{ - role: 'assistant', - content: '', - userMessage: '', - timestamp: Date.now(), - toolResults: [{ - name: 'get-incident', - result: { id: 'INC-123', title: 'Test' }, - arguments: {} - }] - }] - }; + const toolResults: ToolResult[] = [{ + name: 'get-incident', + result: { id: 'INC-123', title: 'Test' }, + arguments: {} + }]; + const context = createContext(toolResults); - const result = await incidentReferenceHandler(testContext, ''); + const result = await incidentReferenceHandler(context, 'that incident'); assert.equal(result, 'INC-123'); }); await t.test('resolves ID from query array result', async () => { - const testContext = { - ...context, - conversationHistory: [{ - role: 'assistant', - content: '', - userMessage: '', - timestamp: Date.now(), - toolResults: [{ - name: 'query-incidents', - result: [ - { id: 'INC-111' }, - { id: 'INC-222' } // 222 is last in array, but prominence is equal - ], - arguments: {} - }] - }] - }; + const toolResults: ToolResult[] = [{ + name: 'query-incidents', + result: [ + { id: 'INC-111' }, + { id: 'INC-222' } + ], + arguments: {} + }]; + const context = createContext(toolResults); - const result = await incidentReferenceHandler(testContext, ''); - assert.equal(result, 'INC-111'); // First one pushed is index 0. logic? - // Actually code pushes all. - // sort by recency and prominence. - // timestamp is same for all in one turn. - // stable sort? - // incidentEntities[0] is returned. - // It pushes in order. sorting might keep order if equal? - // Let's see what happens. + const result = await incidentReferenceHandler(context, 'that incident'); + assert.equal(result, 'INC-111'); // First one in array }); await t.test('refines using variable in query', async () => { - const testContext = { - ...context, - conversationHistory: [{ - role: 'assistant', - content: '', - userMessage: '', - timestamp: Date.now(), - toolResults: [{ - name: 'query-incidents', - result: [ - { id: 'INC-ABC' }, - { id: 'INC-XYZ' } - ], - arguments: {} - }] - }] - }; + const toolResults: ToolResult[] = [{ + name: 'query-incidents', + result: [ + { id: 'INC-ABC' }, + { id: 'INC-XYZ' } + ], + arguments: {} + }]; + const context = createContext(toolResults); - const result = await incidentReferenceHandler(testContext, 'incident INC-XYZ'); + const result = await incidentReferenceHandler(context, 'incident INC-XYZ'); assert.equal(result, 'INC-XYZ'); }); await t.test('returns null for mismatching object reference', async () => { - const testContext = { - ...context, - conversationHistory: [{ - role: 'assistant', - content: '', - userMessage: '', - timestamp: Date.now(), - toolResults: [{ - name: 'get-incident', - result: { id: 'INC-123' }, - arguments: {} - }] - }] - }; + const toolResults: ToolResult[] = [{ + name: 'get-incident', + result: { id: 'INC-123' }, + arguments: {} + }]; + const context = createContext(toolResults); // User asks about "that service" but context has incident - const result = await incidentReferenceHandler(testContext, 'show that service details'); + const result = await incidentReferenceHandler(context, 'show that service details'); assert.equal(result, null); }); await t.test('extracts ID from tool arguments', async () => { - const testContext = { - ...context, - conversationHistory: [{ - role: 'assistant', - content: '', - userMessage: '', - timestamp: Date.now(), - toolResults: [{ - name: 'get-incident', - result: null, // failed result maybe - arguments: { id: 'INC-FAILED' } - }] - }] - }; + const toolResults: ToolResult[] = [{ + name: 'get-incident', + result: null, // failed result maybe + arguments: { id: 'INC-FAILED' } + }]; + const context = createContext(toolResults); - const result = await incidentReferenceHandler(testContext, ''); + const result = await incidentReferenceHandler(context, 'that incident'); assert.equal(result, 'INC-FAILED'); }); }); diff --git a/tests/engine/handlers/incident/scopeHandler.test.ts b/tests/engine/handlers/incident/scopeHandler.test.ts index 67df2ab..0e1fd18 100644 --- a/tests/engine/handlers/incident/scopeHandler.test.ts +++ b/tests/engine/handlers/incident/scopeHandler.test.ts @@ -75,14 +75,13 @@ test('incidentScopeInferenceHandler', async (t) => { const testContext = { ...context, conversationHistory: [{ - role: 'assistant', - content: '', - userMessage: '', + userMessage: 'previous question', timestamp: Date.now(), - toolResults: [{ - name: 'query-incidents', - arguments: {}, - result: [{ id: '1', service: 'legacy-api', metadata: { environment: 'staging' } }] + entities: [{ + type: 'service' as const, + value: 'legacy-api', + extractedAt: Date.now(), + source: 'query-incidents' }] }], toolResults: [] @@ -92,7 +91,6 @@ test('incidentScopeInferenceHandler', async (t) => { assert.ok(result); assert.equal(result?.service, 'legacy-api'); - assert.equal(result?.environment, 'staging'); }); await t.test('ignores non-incident tools', async () => { diff --git a/tests/engine/handlers/metric/followUpHandler.test.ts b/tests/engine/handlers/metric/followUpHandler.test.ts index 0a58127..add382d 100644 --- a/tests/engine/handlers/metric/followUpHandler.test.ts +++ b/tests/engine/handlers/metric/followUpHandler.test.ts @@ -72,7 +72,7 @@ test('metricFollowUpHandler', async (t) => { // The previous file content shows: if (metricName.toLowerCase().includes("latency") || metricName.toLowerCase().includes("error")) // So cpu_usage should NOT trigger log query. // With enhanced LogQueryGenerator, cpu_usage should triggers 'cpu' related logs (and alerts) - assert.equal(suggestions.length, 2); + assert.equal(suggestions.length, 3); const logsSuggestion = suggestions.find(s => s.name === 'query-logs'); assert.ok(logsSuggestion); const logsArgs = logsSuggestion.arguments as unknown as LogQueryArgs; @@ -186,5 +186,58 @@ test('metricFollowUpHandler', async (t) => { const deploymentsSuggestion = suggestions.find(s => s.name === 'query-deployments'); assert.ok(!deploymentsSuggestion, 'should NOT duplicate query-deployments'); }); -}); + await t.test('should turn describe-metrics into query-metrics for investigative requests', async () => { + const context: HandlerContext = { + ...baseContext, + userQuestion: 'find the root cause and check cpu metrics', + toolResults: [{ + name: 'query-incidents', + arguments: { limit: 1 }, + result: [{ + id: 'INC-200', + service: 'svc-api', + startTime: '2024-01-01T00:00:00Z', + endTime: '2024-01-01T00:30:00Z', + }], + }], + }; + const result: ToolResult = { + name: 'describe-metrics', + arguments: { scope: { service: 'svc-api' } }, + result: ['cpu_usage', 'memory_usage'], + }; + + const suggestions = await metricFollowUpHandler(context, result); + const metricsSuggestion = suggestions.find(s => s.name === 'query-metrics'); + + assert.ok(metricsSuggestion, 'should suggest query-metrics after metric discovery'); + const args = metricsSuggestion!.arguments as { + scope: { service: string }; + expression: { metricName: string }; + start: string; + end: string; + step: number; + }; + assert.equal(args.scope.service, 'svc-api'); + assert.equal(args.expression.metricName, 'cpu_usage'); + assert.equal(args.step, 60); + assert.ok(Number.isFinite(Date.parse(args.start))); + assert.ok(Number.isFinite(Date.parse(args.end))); + }); + + await t.test('should not query metrics for discovery-only requests', async () => { + const context: HandlerContext = { + ...baseContext, + userQuestion: 'what metrics are available for svc-api', + }; + const result: ToolResult = { + name: 'describe-metrics', + arguments: { scope: { service: 'svc-api' } }, + result: ['cpu_usage', 'memory_usage'], + }; + + const suggestions = await metricFollowUpHandler(context, result); + assert.equal(suggestions.some(s => s.name === 'query-metrics'), false); + }); +}); diff --git a/tests/engine/handlers/team/referenceHandler.test.ts b/tests/engine/handlers/team/referenceHandler.test.ts index 8a79ea1..84bd833 100644 --- a/tests/engine/handlers/team/referenceHandler.test.ts +++ b/tests/engine/handlers/team/referenceHandler.test.ts @@ -37,13 +37,13 @@ function createTurnWithToolsAndEntities( } test("teamReferenceHandler", async (t) => { - const createContext = (conversationHistory: ConversationTurn[] = []): HandlerContext => + const createContext = (conversationHistory: ConversationTurn[] = [], toolResults: ToolResult[] = []): HandlerContext => ({ chatId: "chat", turnNumber: 1, userQuestion: "test", conversationHistory, - toolResults: [], + toolResults, }) as HandlerContext; await t.test("should return null when no team entities exist", async () => { @@ -53,39 +53,39 @@ test("teamReferenceHandler", async (t) => { }); await t.test("should extract team from query-teams tool result", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: [ { id: "team-velocity", name: "Velocity Team" }, { id: "team-platform", name: "Platform Team" } ] - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "that team"); assert.equal(result, "team-velocity"); // Should return first/most prominent }); await t.test("should extract team from get-team tool result", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "get-team", arguments: { id: "team-velocity" }, result: { id: "team-velocity", name: "Velocity Team" } - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "this team"); assert.equal(result, "team-velocity"); }); await t.test("should extract team from get-team-members tool arguments", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "get-team-members", arguments: { id: "team-velocity" }, result: [{ id: "user1", name: "John Doe" }] - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "that team"); assert.equal(result, "team-velocity"); @@ -103,15 +103,15 @@ test("teamReferenceHandler", async (t) => { }); await t.test("should prioritize exact name matches in reference text", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: [ { id: "team-velocity", name: "Velocity Team" }, { id: "team-platform", name: "Platform Team" } ] - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "the platform team"); assert.equal(result, "team-platform"); @@ -130,7 +130,14 @@ test("teamReferenceHandler", async (t) => { result: { id: "team-recent", name: "Recent Team" } }], [], Date.now()); - const context = createContext([oldTurn, recentTurn]); + // Also add current tool results for immediate context + const toolResults: ToolResult[] = [{ + name: "get-team", + arguments: {}, + result: { id: "team-recent", name: "Recent Team" } + }]; + + const context = createContext([oldTurn, recentTurn], toolResults); const result = await teamReferenceHandler(context, "that team"); assert.equal(result, "team-recent"); @@ -149,87 +156,87 @@ test("teamReferenceHandler", async (t) => { }); await t.test("should extract team name from 'the velocity team' pattern", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: [ { id: "team-velocity", name: "velocity" }, { id: "team-platform", name: "platform" } ] - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "the velocity team"); assert.equal(result, "team-velocity"); }); await t.test("should extract team name from 'velocity team' pattern", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: [{ id: "team-velocity", name: "velocity" }] - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "velocity team"); assert.equal(result, "team-velocity"); }); await t.test("should extract team name from 'team velocity' pattern", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: [{ id: "team-velocity", name: "velocity" }] - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "team velocity"); assert.equal(result, "team-velocity"); }); await t.test("should extract team name from 'team-velocity' pattern", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: [{ id: "team-velocity", name: "velocity" }] - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "team-velocity"); assert.equal(result, "team-velocity"); }); await t.test("should return null for domain mismatch", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: [{ id: "team-velocity", name: "Velocity Team" }] - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "that service"); assert.equal(result, null); }); await t.test("should return null for incident reference", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: [{ id: "team-velocity", name: "Velocity Team" }] - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "that incident"); assert.equal(result, null); }); await t.test("should allow team reference even with other entity words if 'team' is present", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: [{ id: "team-velocity", name: "Velocity Team" }] - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "that team service"); assert.equal(result, "team-velocity"); @@ -259,36 +266,36 @@ test("teamReferenceHandler", async (t) => { }); await t.test("should handle case insensitive matching", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: [{ id: "team-velocity", name: "Velocity Team" }] - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "THE VELOCITY TEAM"); assert.equal(result, "team-velocity"); }); await t.test("should handle teams with no name gracefully", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: [{ id: "team-velocity" }] // No name field - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "that team"); assert.equal(result, "team-velocity"); }); await t.test("should handle invalid tool results gracefully", async () => { - const turn = createTurnWithToolsAndEntities([{ + const toolResults: ToolResult[] = [{ name: "query-teams", arguments: {}, result: null - }]); - const context = createContext([turn]); + }]; + const context = createContext([], toolResults); const result = await teamReferenceHandler(context, "that team"); assert.equal(result, null); diff --git a/tests/engine/handlers/ticket/followUpHandler.test.ts b/tests/engine/handlers/ticket/followUpHandler.test.ts index d6c152d..77cd2c1 100644 --- a/tests/engine/handlers/ticket/followUpHandler.test.ts +++ b/tests/engine/handlers/ticket/followUpHandler.test.ts @@ -105,9 +105,11 @@ test('ticketFollowUpHandler', async (t) => { conversationHistory: [{ userMessage: 'previous question', timestamp: Date.now() - 1000, - toolResults: [{ - name: 'query-tickets', - result: [{ id: 'TICKET-1', title: 'Already seen ticket' }], + entities: [{ + type: 'ticket' as const, + value: 'TICKET-1', + extractedAt: Date.now() - 1000, + source: 'query-tickets' }] }] }; diff --git a/tests/engine/handlers/ticket/referenceHandler.test.ts b/tests/engine/handlers/ticket/referenceHandler.test.ts index 0f81484..a98b6b1 100644 --- a/tests/engine/handlers/ticket/referenceHandler.test.ts +++ b/tests/engine/handlers/ticket/referenceHandler.test.ts @@ -2,98 +2,74 @@ import assert from 'node:assert/strict'; import { test } from 'node:test'; import { ticketReferenceHandler } from '../../../../src/engine/handlers/ticket/referenceHandler.js'; -import { HandlerContext, ConversationTurn, ToolResult } from '../../../../src/types.js'; +import { HandlerContext, ToolResult } from '../../../../src/types.js'; test('ticketReferenceHandler', async (t) => { - // Helper to create context with history - const createCtx = (toolResults: ToolResult[] = [], refText = ''): HandlerContext => ({ + const createContext = (toolResults: ToolResult[] = []): HandlerContext => ({ chatId: 'test', turnNumber: 1, - userQuestion: refText, - // The handler looks at conversationHistory, not just current turn results - // usually. But here the logic iterates specific turns. - conversationHistory: [ - { - userMessage: 'prev question', - timestamp: 1000, - toolResults: toolResults - } as ConversationTurn - ], - toolResults: [], // current turn results + userQuestion: 'test', + conversationHistory: [], + toolResults, }); await t.test('should return null if no tickets in history', async () => { - const ctx = createCtx(); + const ctx = createContext(); const ref = await ticketReferenceHandler(ctx, 'that ticket'); assert.equal(ref, null); }); await t.test('should return most recent ticket from history', async () => { - - // Logic sorts by timestamp descending. - // In the same turn, prominence is 1.0 for all. - // But the handler implementation pushes them in order. - // Wait, the handler sorts: - // ticketEntities.sort((a, b) => ... b.timestamp - a.timestamp) - // If they have same timestamp (turn timestamp), it's stable sort or undefined order? - // Let's check implementation: - // timestamp: turn.timestamp || Date.now() - // If they are in the same turn, they have same timestamp. - // It's likely the order in array matters differently or they are equal. - // Actually, normally "most recent" implies time. - // If logic doesn't distinguish intra-turn time, it might return the first one pushed? - // Let's test single ticket first to be sure. - - const singleResult: ToolResult[] = [{ + const toolResults: ToolResult[] = [{ name: 'query-tickets', result: [{ id: 'TICKET-1', title: 'one' }] }]; - const ctx1 = createCtx(singleResult); - const ref = await ticketReferenceHandler(ctx1, 'that ticket'); + const ctx = createContext(toolResults); + const ref = await ticketReferenceHandler(ctx, 'that ticket'); assert.equal(ref, 'TICKET-1'); }); await t.test('should resolve specific ticket ID in reference text', async () => { - const results: ToolResult[] = [{ + const toolResults: ToolResult[] = [{ name: 'query-tickets', result: [ { id: 'TICKET-A', title: 'A' }, { id: 'TICKET-B', title: 'B' } ] }]; - const ctx = createCtx(results); + const ctx = createContext(toolResults); // User explicitly asks for B const ref = await ticketReferenceHandler(ctx, 'check ticket-b please'); assert.equal(ref, 'TICKET-B'); }); await t.test('should return null if domain does not match (e.g. incident)', async () => { - const results: ToolResult[] = [{ + const toolResults: ToolResult[] = [{ name: 'query-tickets', result: [{ id: 'TICKET-1', title: 'one' }] }]; - const ctx = createCtx(results); + const ctx = createContext(toolResults); const ref = await ticketReferenceHandler(ctx, 'show that incident'); // "incident" in text -> mismatch unless "ticket" also in text assert.equal(ref, null); }); await t.test('should still resolve if domain match (e.g. ticket)', async () => { - const results: ToolResult[] = [{ + const toolResults: ToolResult[] = [{ name: 'query-tickets', result: [{ id: 'TICKET-1', title: 'one' }] }]; - const ctx = createCtx(results); + const ctx = createContext(toolResults); const ref = await ticketReferenceHandler(ctx, 'show that ticket'); assert.equal(ref, 'TICKET-1'); }); await t.test('should handle get-ticket single result', async () => { - const results: ToolResult[] = [{ + const toolResults: ToolResult[] = [{ name: 'get-ticket', result: { id: 'TICKET-SINGLE', title: 'one' } }]; - const ctx = createCtx(results); + const ctx = createContext(toolResults); const ref = await ticketReferenceHandler(ctx, 'details on this'); assert.equal(ref, 'TICKET-SINGLE'); }); diff --git a/tests/engine/planRefiner.test.ts b/tests/engine/planRefiner.test.ts index 1ba4af5..a216cb5 100644 --- a/tests/engine/planRefiner.test.ts +++ b/tests/engine/planRefiner.test.ts @@ -356,7 +356,7 @@ test('ScopeInferer: uses conversation history for scope inference', async () => const inferer = new ScopeInferer(); // Simulate a follow-up question where previous turn had incident context - const conversationHistory: ConversationTurn[] = [createTurnWithTools([{ + const turn = createTurnWithTools([{ name: 'query-incidents', arguments: {}, result: [{ @@ -365,7 +365,17 @@ test('ScopeInferer: uses conversation history for scope inference', async () => status: 'open', severity: 'sev3' }] - }])]; + }]); + + // Explicitly add entities to simulate extraction from the tool result + turn.entities = [{ + type: 'service', + value: 'svc-realtime', + source: 'test', + extractedAt: Date.now() + }]; + + const conversationHistory: ConversationTurn[] = [turn]; const inference = await inferer.inferScope( 'any alerts', // follow-up question diff --git a/tests/followUpEngine.test.ts b/tests/followUpEngine.test.ts index a0be864..e5aea2d 100644 --- a/tests/followUpEngine.test.ts +++ b/tests/followUpEngine.test.ts @@ -11,17 +11,16 @@ test('FollowUpEngine', async (t) => { const results: ToolResult[] = [ { name: 'query-incidents', - result: { incidents: [] }, + result: { incidents: [] }, // Empty incidents array arguments: { service: 'payment-api' }, }, ]; - - const refined = await engine.applyFollowUps(results, 'test-chat', [], 'Show incidents'); - // Should not include duplicate - assert.strictEqual(refined.length, 0); + // Should not include duplicate query-incidents call since it was already executed + const duplicateIncidentCalls = refined.filter(call => call.name === 'query-incidents'); + assert.strictEqual(duplicateIncidentCalls.length, 0); }); await t.test('applyFollowUps generates follow-up suggestions for incidents', async () => { diff --git a/tests/mockLlm.test.ts b/tests/mockLlm.test.ts new file mode 100644 index 0000000..0201b89 --- /dev/null +++ b/tests/mockLlm.test.ts @@ -0,0 +1,256 @@ +import assert from "node:assert/strict"; +import test from "node:test"; +import { MockLlm } from "../src/llms/mock.js"; +import { LlmMessage, Tool } from "../src/types.js"; +import { + buildFinalAnswerPrompt, + buildJsonPlannerPrompt, + buildPlannerPrompt, + buildRefinementPrompt, + buildToolContext, +} from "../src/prompts.js"; + +test("mock llm creates concrete multi-tool plans for broad investigations", async () => { + const llm = new MockLlm(); + const tools: Tool[] = [ + { name: "query-incidents" }, + { name: "query-logs" }, + { name: "describe-metrics" }, + { name: "query-metrics" }, + { name: "query-orchestration-plans" }, + ]; + const messages: LlmMessage[] = [ + { role: "system", content: buildPlannerPrompt(buildToolContext(tools)) }, + { + role: "user", + content: "Investigate payments latency and errors from the last 30 minutes", + }, + ]; + + const response = await llm.chat(messages, tools); + + assert.ok((response.toolCalls?.length ?? 0) >= 3); + assert.equal(response.toolCalls?.[0]?.name, "describe-metrics"); + assert.ok(response.toolCalls?.some((call) => call.name === "query-logs")); + assert.ok( + response.toolCalls?.some((call) => call.name === "query-orchestration-plans"), + ); + + const logsCall = response.toolCalls?.find((call) => call.name === "query-logs"); + assert.equal(typeof logsCall?.arguments.start, "string"); + assert.equal(typeof logsCall?.arguments.end, "string"); + assert.match(String(logsCall?.arguments.start), /^\d{4}-\d{2}-\d{2}T/); +}); + +test("mock llm emits parseable JSON plans in json planning mode", async () => { + const llm = new MockLlm(); + const messages: LlmMessage[] = [ + { + role: "system", + content: buildJsonPlannerPrompt( + ["query-incidents", "query-logs", "query-alerts"].map((name) => `- ${name}`).join("\n"), + ), + }, + { + role: "user", + content: "User request: Show recent incidents and related logs for checkout\nReturn only JSON.", + }, + ]; + + const response = await llm.chat(messages, []); + const parsed = JSON.parse(response.content) as { + reasoning: string; + toolCalls: Array<{ name: string; arguments: Record }>; + }; + + assert.equal(typeof parsed.reasoning, "string"); + assert.ok(parsed.toolCalls.length >= 2); + assert.ok(parsed.toolCalls.some((call) => call.name === "query-incidents")); + assert.ok(parsed.toolCalls.some((call) => call.name === "query-logs")); +}); + +test("mock llm suggests follow-up tools from prior results", async () => { + const llm = new MockLlm(); + const tools: Tool[] = [ + { name: "query-incidents" }, + { name: "query-logs" }, + { name: "describe-metrics" }, + { name: "query-metrics" }, + { name: "query-alerts" }, + ]; + const messages: LlmMessage[] = [ + { role: "system", content: buildRefinementPrompt(buildToolContext(tools), 1) }, + { + role: "user", + content: + "Question: Investigate payments incident\n" + + "Tool results (count=1):\n" + + "query-incidents returned [{\"id\":\"inc-123\",\"service\":\"payments\"}]\n" + + "Plan follow-up tool calls with concrete arguments.", + }, + ]; + + const response = await llm.chat(messages, tools); + + assert.ok((response.toolCalls?.length ?? 0) >= 1); + assert.ok(response.toolCalls?.some((call) => call.name === "query-logs")); + assert.ok( + response.toolCalls?.some( + (call) => call.name === "describe-metrics" || call.name === "query-metrics", + ), + ); +}); + +test("mock llm returns structured synthesis output with references", async () => { + const llm = new MockLlm(); + const messages: LlmMessage[] = [ + { role: "system", content: buildFinalAnswerPrompt() }, + { + role: "user", + content: + "Question: Investigate payments incident\n" + + "Tool results:\n" + + "- query-incidents => [{\"id\":\"inc-123\",\"service\":\"payments\"}]\n" + + "- query-orchestration-plans => [{\"id\":\"plan-42\",\"name\":\"payments recovery\"}]", + }, + ]; + + const response = await llm.chat(messages, []); + const parsed = JSON.parse(response.content) as { + conclusion: string; + evidence: string[]; + references: { incidents?: string[]; services?: string[]; orchestrationPlans?: string[] }; + actions: Array<{ type: string; id?: string }>; + confidence: number; + }; + + assert.match(parsed.conclusion, /payments|plan-42|incident/i); + assert.ok(parsed.evidence.length >= 1); + assert.deepEqual(parsed.references.incidents, ["inc-123"]); + assert.ok(parsed.references.orchestrationPlans?.includes("plan-42")); + assert.equal(parsed.actions[0]?.type, "orchestration_plan"); + assert.equal(parsed.confidence > 0.6, true); +}); + +test("mock llm synthesis parses structured tool results for richer evidence", async () => { + const llm = new MockLlm(); + const messages: LlmMessage[] = [ + { role: "system", content: buildFinalAnswerPrompt() }, + { + role: "user", + content: + "Question: Investigate payments incident\n" + + "Tool Results:\n" + + 'query-incidents: [{"id":"inc-456","service":"payments","status":"active","severity":"sev1"}]\n' + + '- query-orchestration-plans => [{"id":"plan-99","name":"payments recovery"}]', + }, + ]; + + const response = await llm.chat(messages, []); + const parsed = JSON.parse(response.content) as { + conclusion: string; + evidence: string[]; + references: { incidents?: string[]; services?: string[]; orchestrationPlans?: string[] }; + actions: Array<{ type: string; id?: string; name?: string; reason?: string }>; + }; + + // Evidence should reference actual data from tool results + assert.ok(parsed.evidence.some((e) => e.includes("inc-456"))); + assert.ok(parsed.evidence.some((e) => e.includes("status=active") || e.includes("severity=sev1"))); + // Services should be discovered from tool result JSON + assert.ok(parsed.references.services?.includes("payments")); + // Orchestration action should use the plan name from tool results + assert.ok(parsed.actions.length > 0); + assert.equal(parsed.actions[0]?.name, "payments recovery"); + // Conclusion should include richer incident context + assert.match(parsed.conclusion, /inc-456|sev1|active/i); +}); + +test("mock llm refinement extracts service from prior tool results", async () => { + const llm = new MockLlm(); + const tools: Tool[] = [ + { name: "query-incidents" }, + { name: "query-logs" }, + { name: "describe-metrics" }, + { name: "query-metrics" }, + { name: "query-alerts" }, + { name: "query-deployments" }, + { name: "query-teams" }, + ]; + const messages: LlmMessage[] = [ + { role: "system", content: buildRefinementPrompt(buildToolContext(tools), 1) }, + { + role: "user", + content: + "Question: Investigate incident\n" + + "Tool results (count=1):\n" + + 'query-incidents returned [{"id":"inc-789","service":"checkout","status":"active"}]\n' + + "Plan follow-up tool calls with concrete arguments.", + }, + ]; + + const response = await llm.chat(messages, tools); + + // Follow-up should scope to the discovered service "checkout" + const logCall = response.toolCalls?.find((c) => c.name === "query-logs"); + assert.ok(logCall, "Should suggest query-logs follow-up"); + assert.deepEqual(logCall?.arguments.scope, { service: "checkout" }); + + // Should add deployment follow-up when incident data is present + assert.ok( + response.toolCalls?.some((c) => c.name === "query-deployments"), + "Should suggest query-deployments follow-up", + ); + + // Should add teams follow-up when service is discovered + assert.ok( + response.toolCalls?.some((c) => c.name === "query-teams"), + "Should suggest query-teams follow-up for discovered service", + ); +}); + +test("mock llm handles status/health queries", async () => { + const llm = new MockLlm(); + const tools: Tool[] = [ + { name: "query-incidents" }, + { name: "query-alerts" }, + { name: "describe-metrics" }, + { name: "query-orchestration-plans" }, + ]; + const messages: LlmMessage[] = [ + { role: "system", content: buildPlannerPrompt(buildToolContext(tools)) }, + { + role: "user", + content: "What is the status of the payments service?", + }, + ]; + + const response = await llm.chat(messages, tools); + + // Status queries should trigger incidents, alerts, and metrics checks + assert.ok(response.toolCalls?.some((c) => c.name === "query-incidents")); + assert.ok(response.toolCalls?.some((c) => c.name === "query-alerts")); + assert.ok(response.toolCalls?.some((c) => c.name === "describe-metrics")); +}); + +test("mock llm handles change detection queries", async () => { + const llm = new MockLlm(); + const tools: Tool[] = [ + { name: "query-incidents" }, + { name: "query-deployments" }, + { name: "query-logs" }, + ]; + const messages: LlmMessage[] = [ + { role: "system", content: buildPlannerPrompt(buildToolContext(tools)) }, + { + role: "user", + content: "What changed in the last hour for the checkout service?", + }, + ]; + + const response = await llm.chat(messages, tools); + + // Change queries should prioritize deployments + assert.ok(response.toolCalls?.some((c) => c.name === "query-deployments")); + assert.ok(response.toolCalls?.some((c) => c.name === "query-incidents")); +}); diff --git a/tests/referenceResolver.test.ts b/tests/referenceResolver.test.ts index e9d759b..1213d93 100644 --- a/tests/referenceResolver.test.ts +++ b/tests/referenceResolver.test.ts @@ -11,24 +11,27 @@ test('ReferenceResolver: resolves "that incident" reference', async () => { entities: new Map(), }; + // Create a simple conversation history with the new format const conversationHistory = [ { userMessage: 'show me incidents', assistantResponse: 'Here are the incidents', - toolResults: [ + timestamp: Date.now(), + entities: [ { - name: 'query-incidents', - result: [ - { id: 'INC-999', title: 'Test incident' } - ] + type: 'incident' as const, + value: 'INC-999', + prominence: 1.0, + extractedAt: Date.now(), + source: 'query-incidents' } - ], - timestamp: Date.now() + ] } ]; const resolutions = await resolver.resolveReferences('What caused that incident?', context, conversationHistory); + // The resolver should identify the reference pattern and resolve it using entities assert.ok(resolutions.has('that incident')); assert.equal(resolutions.get('that incident'), 'INC-999'); }); @@ -44,15 +47,16 @@ test('ReferenceResolver: resolves "this service" reference', async () => { { userMessage: 'show me services', assistantResponse: 'Here are the services', - toolResults: [ + timestamp: Date.now(), + entities: [ { - name: 'query-services', - result: [ - { name: 'payment-api', status: 'healthy' } - ] + type: 'service' as const, + value: 'payment-api', + prominence: 1.0, + extractedAt: Date.now(), + source: 'query-services' } - ], - timestamp: Date.now() + ] } ]; @@ -74,15 +78,16 @@ test('ReferenceResolver: resolves "since then" time reference', async () => { { userMessage: 'show me incident timeline', assistantResponse: 'Here is the timeline', - toolResults: [ + timestamp: Date.now(), + entities: [ { - name: 'get-incident-timeline', - result: [ - { at: baseTime, kind: 'incident started', body: 'Incident began' } - ] + type: 'timestamp' as const, + value: baseTime, + prominence: 1.0, + extractedAt: Date.now(), + source: 'get-incident-timeline' } - ], - timestamp: Date.now() + ] } ]; @@ -128,28 +133,30 @@ test('ReferenceResolver: returns most recent entity when multiple exist', async { userMessage: 'show me incidents', assistantResponse: 'Here are the incidents', - toolResults: [ + timestamp: now - 1000, + entities: [ { - name: 'query-incidents', - result: [ - { id: 'INC-100', title: 'Old incident' } - ] + type: 'incident' as const, + value: 'INC-100', + prominence: 1.0, + extractedAt: now - 1000, + source: 'query-incidents' } - ], - timestamp: now - 1000 + ] }, { userMessage: 'show me more incidents', assistantResponse: 'Here are more', - toolResults: [ + timestamp: now, + entities: [ { - name: 'query-incidents', - result: [ - { id: 'INC-200', title: 'Recent incident' } - ] + type: 'incident' as const, + value: 'INC-200', + prominence: 1.0, + extractedAt: now, + source: 'query-incidents' } - ], - timestamp: now + ] } ]; @@ -182,15 +189,16 @@ test('ReferenceResolver: handles "before that" time reference', async () => { { userMessage: 'show me incident timeline', assistantResponse: 'Here is the timeline', - toolResults: [ + timestamp: Date.now(), + entities: [ { - name: 'get-incident-timeline', - result: [ - { at: baseTime, kind: 'incident started', body: 'Incident began' } - ] + type: 'timestamp' as const, + value: baseTime, + prominence: 1.0, + extractedAt: Date.now(), + source: 'get-incident-timeline' } - ], - timestamp: Date.now() + ] } ]; @@ -217,28 +225,30 @@ test('ReferenceResolver: handles multiple references in one question', async () { userMessage: 'show me incidents', assistantResponse: 'Here they are', - toolResults: [ + timestamp: Date.now(), + entities: [ { - name: 'query-incidents', - result: [ - { id: 'INC-999', title: 'Critical incident' } - ] + type: 'incident' as const, + value: 'INC-999', + prominence: 1.0, + extractedAt: Date.now(), + source: 'query-incidents' } - ], - timestamp: Date.now() + ] }, { userMessage: 'and services', assistantResponse: 'Here are services', - toolResults: [ + timestamp: Date.now(), + entities: [ { - name: 'query-services', - result: [ - { name: 'payment-api', status: 'healthy' } - ] + type: 'service' as const, + value: 'payment-api', + prominence: 1.0, + extractedAt: Date.now(), + source: 'query-services' } - ], - timestamp: Date.now() + ] } ]; @@ -261,29 +271,42 @@ test('ReferenceResolver: uses prominence as tiebreaker when timestamps are equal entities: new Map(), }; - // Test that when we have multiple incidents in one result, we get one of them + // Test that when we have multiple incidents in one result, we get the most prominent one const conversationHistory = [ { userMessage: 'show me incidents', assistantResponse: 'Here are the incidents', - toolResults: [ + timestamp: now, + entities: [ { - name: 'query-incidents', - result: [ - { id: 'inc-002', title: 'Minor incident' }, - { id: 'inc-005', title: 'Major incident' }, - { id: 'inc-008', title: 'Medium incident' } - ] + type: 'incident' as const, + value: 'inc-002', + prominence: 0.5, + extractedAt: now, + source: 'query-incidents' + }, + { + type: 'incident' as const, + value: 'inc-005', + prominence: 0.9, // Highest prominence + extractedAt: now, + source: 'query-incidents' + }, + { + type: 'incident' as const, + value: 'inc-008', + prominence: 0.7, + extractedAt: now, + source: 'query-incidents' } - ], - timestamp: now + ] } ]; const resolutions = await resolver.resolveReferences('tell me more about that incident', context, conversationHistory); assert.ok(resolutions.has('that incident')); - // Should pick one of the incidents (handlers return the first one found) + // Should pick the incident with highest prominence const resolved = resolutions.get('that incident'); - assert.ok(resolved === 'inc-002' || resolved === 'inc-005' || resolved === 'inc-008'); + assert.equal(resolved, 'inc-005'); });