diff --git a/README.md b/README.md
index ffd67b4..fd1c1d7 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,28 @@
 # OpsOrch Copilot
 
+[![Version](https://img.shields.io/github/v/release/OpsOrch/opsorch-copilot)](https://github.com/OpsOrch/opsorch-copilot/releases)
+[![License](https://img.shields.io/github/license/OpsOrch/opsorch-copilot)](https://github.com/OpsOrch/opsorch-copilot/blob/main/LICENSE)
+[![CI](https://github.com/OpsOrch/opsorch-copilot/workflows/CI/badge.svg)](https://github.com/OpsOrch/opsorch-copilot/actions)
+[![Node Version](https://img.shields.io/badge/node-%3E%3D20-brightgreen)](https://nodejs.org)
+
 OpsOrch Copilot is the AI runtime for OpsOrch. It plans tool calls against `opsorch-mcp`, gathers evidence, and returns structured answers for the Console UI and other clients.
 
 Copilot never talks to OpsOrch Core directly. It only uses the MCP tools layer.
 
+## Table of Contents
+
+- [Status](#status)
+- [Quick Start](#quick-start)
+- [What Copilot Does](#what-copilot-does)
+- [Configuration](#configuration)
+- [Architecture](#architecture)
+- [HTTP API](#http-api)
+- [Stack and Boundaries](#stack-and-boundaries)
+- [Development](#development)
+- [Testing](#testing)
+- [Seeding the Database](#seeding-the-database)
+- [License](#license)
+
 ## Status
 
 - License: Apache-2.0
@@ -13,25 +32,36 @@ Copilot never talks to OpsOrch Core directly. It only uses the MCP tools layer.
 
 ## Quick Start
 
-1. Start `opsorch-core`
-2. Start `opsorch-mcp`
-3. Start Copilot
+### Prerequisites
+
+- Node.js 20+
+- Running `opsorch-core` instance (port 8080)
+- Running `opsorch-mcp` instance (port 7070)
+
+### Installation and Startup
 
 ```bash
 cd opsorch-copilot
 npm install
+
+# Start with mock LLM (no API key required)
 MCP_URL=http://localhost:7070/mcp \
 LLM_PROVIDER=mock \
 npm run dev
 ```
 
-Health check:
+The server will start on `http://localhost:6060`.
+
+### Verify Installation
 
+Health check:
 ```bash
 curl http://localhost:6060/health
 ```
 
-Chat request:
+Expected response: `{"status":"ok"}`
+
+### Make Your First Request
 
 ```bash
 curl http://localhost:6060/chat \
@@ -39,75 +69,144 @@ curl http://localhost:6060/chat \
   -d '{"message":"What incidents are active right now?"}'
 ```
 
+The response includes:
+- `chatId` – Conversation identifier for follow-up questions
+- `name` – Auto-generated conversation name
+- `answer` – Structured answer with conclusion, evidence, and references
+
 ## Configuration
 
-Core runtime settings:
+### Core Runtime Settings
 
-- `PORT` - HTTP port for the Copilot API. Default: `6060`
-- `MCP_URL` - MCP endpoint URL. Default: `http://localhost:7070/mcp`
-- `LLM_PROVIDER` - `mock`, `openai`, `anthropic`, or `gemini`. Default: `mock`
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `PORT` | `6060` | HTTP port for the Copilot API |
+| `MCP_URL` | `http://localhost:7070/mcp` | MCP endpoint URL |
+| `LLM_PROVIDER` | `mock` | LLM provider: `mock`, `openai`, `anthropic`, or `gemini` |
 
-Provider-specific settings:
+### LLM Provider Settings
 
-- `OPENAI_API_KEY` with optional `OPENAI_MODEL` and `OPENAI_BASE_URL`
-- `ANTHROPIC_API_KEY` with optional `ANTHROPIC_MODEL` and `ANTHROPIC_BASE_URL`
-- `GEMINI_API_KEY` with optional `GEMINI_MODEL`
+**OpenAI:**
+- `OPENAI_API_KEY` (required)
+- `OPENAI_MODEL` (optional, default: `gpt-4o`)
+- `OPENAI_BASE_URL` (optional, for custom endpoints)
+
+**Anthropic:**
+- `ANTHROPIC_API_KEY` (required)
+- `ANTHROPIC_MODEL` (optional, default: `claude-3-5-sonnet-20241022`)
+- `ANTHROPIC_BASE_URL` (optional, for custom endpoints)
 
-Conversation storage:
+**Google Gemini:**
+- `GEMINI_API_KEY` (required)
+- `GEMINI_MODEL` (optional, default: `gemini-2.0-flash-exp`)
 
-- `CONVERSATION_STORE_TYPE` - `memory` or `sqlite`. Default: `memory`
-- `SQLITE_DB_PATH` - SQLite DB path when using `sqlite`. Default: `./data/conversations.db`
+### Conversation Storage Settings
 
-## What Copilot should do
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `CONVERSATION_STORE_TYPE` | `memory` | Storage backend: `memory` or `sqlite` |
+| `SQLITE_DB_PATH` | `./data/conversations.db` | SQLite database file path (when using `sqlite`) |
 
-- Retrieve recent/impactful incidents, surface their context, and include related PagerDuty alerts, linked Jira tickets, and nearby logs/metrics.
-- Explain incident history and changes, e.g., "What was the trigger for the severity escalation?" by inspecting timelines and metadata.
-- Find patterns, e.g., "Has this service had similar incidents recently?" by querying incidents filtered by service/time/severity.
-- Correlate signals, e.g., "Is the spike in p95 latency correlated with CPU, memory, or traffic?" by querying metrics over the same window and comparing trends.
-- Use messaging tools to share findings or timelines when needed.
+## What Copilot Does
 
-## Question coverage (examples)
+Copilot answers operational questions by orchestrating MCP tool calls and synthesizing evidence:
 
-- Basic understanding: summarize an incident; note changes right before start; infer likely root cause from logs/metrics; correlate with deploys; pull last N minutes of related logs.
-- Context & relationships: list dependent services; find similar incidents for a service; relate to earlier incidents; identify severity escalation triggers.
-- Causal analysis: match error signatures to past incidents; correlate latency spikes with CPU/memory/traffic; distinguish DB vs network vs code issues; compare against prior checkout failures.
-- Metrics: explain CPU spikes and latency anomalies; surface metric anomalies for a service in a window; identify pods/nodes contributing most errors.
-- Logs: query 500s for a service over a window; extract dominant/error patterns; list IPs with most failed requests; flag unusual log patterns.
-- Correlation: align logs and metrics for a service; test hypotheses like memory leaks; find earliest signals of degradation.
+- **Incident Analysis** – Retrieve recent/impactful incidents with context including related PagerDuty alerts, linked Jira tickets, and nearby logs/metrics
+- **Incident History** – Explain incident changes and timelines, e.g., "What triggered the severity escalation?"
+- **Pattern Detection** – Find similar incidents, e.g., "Has this service had similar incidents recently?"
+- **Signal Correlation** – Correlate metrics, e.g., "Is the p95 latency spike correlated with CPU, memory, or traffic?"
+- **Root Cause Analysis** – Match error signatures to past incidents and identify likely causes
+- **Deployment Correlation** – Correlate incidents with recent deployments and code changes
+- **Service Dependencies** – Discover service relationships and dependencies
+- **Team Context** – Identify on-call teams and escalation paths
+- **Messaging Integration** – Share findings via Slack or other messaging tools when needed
 
-## Stack and boundaries
+### Question Coverage Examples
 
-- UI: `opsorch-console`
-- Copilot runtime: this repo (LLM prompts, reasoning, tool selection loops)
-- Tools: `opsorch-mcp` (typed MCP tools around OpsOrch Core)
-- Source of truth: `opsorch-core` (incidents, logs, metrics, services, tickets, messaging)
+**Basic Understanding:**
+- Summarize an incident
+- Note changes right before incident start
+- Infer likely root cause from logs/metrics
+- Correlate with recent deployments
+- Pull last N minutes of related logs
 
-## Development notes
+**Context & Relationships:**
+- List dependent services
+- Find similar incidents for a service
+- Relate to earlier incidents
+- Identify severity escalation triggers
 
-- MCP dev server default: `http://localhost:7070/mcp`
-- Copilot communicates only via MCP tools; no direct Core calls.
-- See `AGENTS.md` for the layered architecture overview.
-- See `DESIGN.md` for capability-handler details.
+**Causal Analysis:**
+- Match error signatures to past incidents
+- Correlate latency spikes with CPU/memory/traffic
+- Distinguish DB vs network vs code issues
+- Compare against prior failures
 
-Core implementation areas:
+**Metrics:**
+- Explain CPU spikes and latency anomalies
+- Surface metric anomalies for a service in a time window
+- Identify pods/nodes contributing most errors
 
-- `src/engine/` - planning, execution, follow-ups, references, synthesis
-- `src/llms/` - LLM provider adapters
-- `src/mcps/` - MCP client implementations
-- `src/stores/` - in-memory and SQLite conversation stores
-- `src/server.ts` - HTTP API
+**Logs:**
+- Query 500 errors for a service over a time window
+- Extract dominant error patterns
+- List IPs with most failed requests
+- Flag unusual log patterns
+
+**Correlation:**
+- Align logs and metrics for a service
+- Test hypotheses like memory leaks
+- Find earliest signals of degradation
+
+## Stack and Boundaries
+
+OpsOrch Copilot is part of a layered architecture:
+
+- **UI Layer** – `opsorch-console` (Next.js web UI)
+- **AI Runtime** – `opsorch-copilot` (this repo) – LLM prompts, reasoning, tool orchestration
+- **Tools Layer** – `opsorch-mcp` – Typed MCP tools wrapping Core APIs
+- **Core Layer** – `opsorch-core` – Source of truth for incidents, logs, metrics, services, tickets, messaging
+- **Adapters** – Provider-specific adapters (PagerDuty, Datadog, Jira, Slack, etc.)
+
+**Key Principle:** Copilot never talks to OpsOrch Core directly. All interactions go through the MCP tools layer, ensuring a clean separation of concerns and consistent tool-based interface.
+
+## Architecture
+
+Copilot implements a multi-step agentic reasoning loop that orchestrates LLM planning, tool execution, and answer synthesis:
+
+1. **Planning** – LLM analyzes the question and plans which MCP tools to call
+2. **Execution** – Tools are called in parallel with retry logic and result caching
+3. **Analysis** – Handlers extract entities, detect anomalies, and suggest follow-ups
+4. **Refinement** – If needed, additional tool calls are planned based on results
+5. **Synthesis** – Final answer is generated with evidence and structured references
+
+Key architectural components:
+
+- `CopilotEngine` – Main orchestration engine (max 3 iterations)
+- `Planner` – LLM-based tool call planning with heuristic fallback
+- `ToolRunner` – Parallel tool execution with caching and retry strategy
+- `EntityExtractor` – Extracts IDs, timestamps, and references from results
+- `ReferenceResolver` – Resolves pronouns like "that incident" to specific entities
+- `FollowUpEngine` – Suggests intelligent next actions based on results
+- `AnswerGenerator` – Synthesizes final answers with evidence
+- `ConversationManager` – Manages multi-turn conversation history
+
+See `DESIGN.md` for detailed architecture documentation and `AGENTS.md` for the layered system overview.
 
 ### Capability-Based Handler Architecture
 
-Copilot uses a capability-based handler system organized around six core operational domains:
+Copilot uses a capability-based handler system organized around nine core operational domains:
 
-**Six Core Capabilities:**
+**Nine Core Capabilities:**
 - `incident/` – Incident query and analysis
 - `alert/` – Alert monitoring and investigation
 - `log/` – Log search and analysis
 - `metric/` – Metrics query and correlation
 - `service/` – Service discovery and dependencies
 - `ticket/` – Ticket linking and management
+- `deployment/` – Deployment tracking and correlation
+- `orchestration/` – Workflow orchestration and automation
+- `team/` – Team management and on-call schedules
 
 **Handler Types (11 total):**
 Each capability implements specialized handlers from this set:
@@ -126,66 +225,162 @@ Each capability implements specialized handlers from this set:
 | **ServiceDiscovery** | Discovers available services from MCP |
 | **ServiceMatching** | Performs fuzzy matching of service names in questions |
 
-**Engine Flow:**
-
-```mermaid
-flowchart TD
-    Q[User Question] --> P[Planner]
-    P -->|LLM plans tools| TC[Tool Calls]
-    P -->|Fallback| HF[Heuristic Fallback]
-    HF --> TC
-    TC --> V[Validation Registry]
-    V -->|Valid| TE[Tool Execution]
-    V -->|Invalid| TC
-    TE --> EE[Entity Extraction]
-    EE --> RR[Reference Resolution]
-    RR --> FU[Follow-up Suggestion]
-    FU -->|More tools needed| P
-    FU -->|Done| SY[Synthesis]
-    SY --> A[Answer with Evidence]
-```
+## Development
 
-All handlers are registered in `capabilityRegistry.ts` and invoked by the engine during tool execution.
+### Project Structure
 
-### HTTP API (console/CLI integration)
+```
+src/
+├── engine/              # Core orchestration and reasoning
+│   ├── handlers/        # Capability-specific handlers
+│   │   ├── incident/    # Incident analysis handlers
+│   │   ├── alert/       # Alert monitoring handlers
+│   │   ├── log/         # Log search handlers
+│   │   ├── metric/      # Metrics analysis handlers
+│   │   ├── service/     # Service discovery handlers
+│   │   ├── ticket/      # Ticket management handlers
+│   │   ├── deployment/  # Deployment tracking handlers
+│   │   ├── orchestration/ # Workflow handlers
+│   │   ├── team/        # Team management handlers
+│   │   └── shared/      # Shared utilities
+│   ├── copilotEngine.ts # Main orchestration engine
+│   ├── planner.ts       # LLM-based tool planning
+│   ├── toolRunner.ts    # Tool execution with retry logic
+│   ├── entityExtractor.ts # Entity extraction from results
+│   ├── referenceResolver.ts # Reference resolution
+│   ├── followUpEngine.ts # Follow-up suggestion engine
+│   └── answerGenerator.ts # Answer synthesis
+├── llms/                # LLM provider adapters
+├── mcps/                # MCP client implementations
+├── stores/              # Conversation storage backends
+└── server.ts            # HTTP API server
+```
 
-- Start server: `npm start` (env: `PORT` default 6060, `MCP_URL` default `http://localhost:7070/mcp`).
-- `POST /chat` – body `{ "message": "<question>", "chatId?": "<reuse-id>" }`
-  - Response: `{ "chatId": "<id>", "answer": { conclusion, evidence?, missing?, references?, chatId? } }`
-  - `answer.references` drives Console deep links and includes buckets for `incidents[]`, `services[]`, `tickets[]`, `alerts[]`, plus structured `metrics[]`/`logs[]` entries (each with expression + window)
-  - If `chatId` is not provided, the response includes one so callers can persist and reuse it.
-- `GET /health` – liveness check: `{ "status": "ok" }`
-- `GET /chats` – list saved conversations with previews and pagination
-- `GET /chats/search?query=...` – search saved conversations
-- `GET /chats/:id` – retrieve a single saved conversation
+### Running Locally
+
+Start the full OpsOrch stack:
+
+1. **Start Core** (port 8080):
+   ```bash
+   cd ../opsorch-core && go run ./cmd/opsorch
+   ```
+
+2. **Start MCP** (port 7070):
+   ```bash
+   cd ../opsorch-mcp && npm run dev
+   ```
+
+3. **Start Copilot** (port 6060):
+   ```bash
+   cd opsorch-copilot
+   npm install
+   MCP_URL=http://localhost:7070/mcp \
+   LLM_PROVIDER=mock \
+   npm run dev
+   ```
+
+4. **Start Console** (port 3000):
+   ```bash
+   cd ../opsorch-console && npm run dev
+   ```
+
+### Available Scripts
+
+- `npm run dev` – Start development server with hot reload
+- `npm start` – Start production server
+- `npm test` – Run all tests
+- `npm run type-check` – TypeScript type checking
+- `npm run lint` – Lint code
+- `npm run lint:fix` – Fix linting issues
+- `npm run build` – Build for production
+- `npm run seed` – Seed database with sample conversations
+
+### Environment Variables
+
+See the Configuration section above for all available environment variables.
+
+### HTTP API
+
+The Copilot server exposes a REST API for chat interactions and conversation management.
+
+**Endpoints:**
+
+- `POST /chat` – Submit a question and get an AI-generated answer
+  - Request body: `{ "message": "<question>", "chatId?": "<optional-conversation-id>" }`
+  - Response: `{ "chatId": "<id>", "name": "<conversation-name>", "answer": { ... } }`
+  - The `answer` object includes:
+    - `conclusion` – Short summary answer
+    - `evidence` – Supporting data and findings
+    - `references` – Structured references for deep linking:
+      - `incidents[]` – Incident IDs
+      - `services[]` – Service names
+      - `tickets[]` – Ticket IDs
+      - `alerts[]` – Alert IDs
+      - `metrics[]` – Metric queries with `{expression, start, end, step}`
+      - `logs[]` – Log queries with `{query, start, end, service}`
+    - `missing` – Notes about unavailable data
+  - If `chatId` is omitted, a new conversation is created and its ID is returned
+
+- `GET /health` – Health check endpoint
+  - Response: `{ "status": "ok" }`
+
+- `GET /chats` – List all saved conversations with pagination
+  - Query parameters:
+    - `limit` (optional) – Maximum number of results to return
+    - `offset` (optional) – Number of results to skip (default: 0)
+  - Response: `{ "conversations": [...], "pagination": { total, offset, limit, hasMore } }`
+  - Each conversation includes: `chatId`, `name`, `createdAt`, `lastAccessedAt`, `turnCount`, `preview`
+  - Results are sorted by most recent access first
+
+- `GET /chats/search` – Search conversations by content
+  - Query parameters:
+    - `query` (required) – Search query string
+    - `limit` (optional) – Maximum number of results (default: 50)
+  - Response: `{ "query": "...", "limit": 50, "totalResults": N, "results": [...] }`
+  - Searches across conversation names, user messages, and assistant responses
+
+- `GET /chats/:id` – Retrieve a specific conversation by ID
+  - Response: `{ "conversation": { chatId, name, turns, createdAt, lastAccessedAt } }`
+  - Returns 404 if conversation not found or expired
 
 ### Conversation Storage
 
 Copilot supports two storage backends for conversation persistence:
 
 #### In-Memory Storage (Default)
-- Conversations are stored in memory with LRU eviction
+
+Best for development and testing:
+- Conversations stored in memory with LRU eviction
 - Data is lost on server restart
 - No configuration required
+- Fast and lightweight
+
+```bash
+# No configuration needed - this is the default
+npm run dev
+```
 
 #### SQLite Storage
+
+Best for production and demos:
 - Conversations persist across server restarts
 - Stored in a local SQLite database file
-- Maintains the same LRU eviction behavior as in-memory storage
+- Same LRU eviction behavior as in-memory storage
+- Supports full-text search across conversations
 
 **Configuration:**
 
-Set the following environment variables to enable SQLite storage:
-
 ```bash
 # Enable SQLite storage
 CONVERSATION_STORE_TYPE=sqlite
 
 # Optional: specify database file path (default: ./data/conversations.db)
 SQLITE_DB_PATH=/path/to/conversations.db
+
+npm run dev
 ```
 
-Docker example:
+**Docker Example:**
 
 ```yaml
 services:
@@ -200,7 +395,7 @@ volumes:
   copilot-data:
 ```
 
-Backup and recovery:
+**Backup and Recovery:**
 
 For SQLite storage, regular backups of the database file are recommended:
 
@@ -212,35 +407,83 @@ cp /path/to/conversations.db /path/to/backup/conversations-$(date +%Y%m%d).db
 cp /path/to/backup/conversations-20250122.db /path/to/conversations.db
 ```
 
+**Graceful Shutdown:**
+
+The server handles `SIGTERM` and `SIGINT` signals gracefully, ensuring the SQLite database is properly closed before exit.
+
 ## Testing
 
+### Running Tests
+
 ```bash
+# Run all tests
 npm test
+
+# Type checking
 npm run type-check
+
+# Linting
+npm run lint
 ```
 
-Coverage includes:
+### Test Coverage
+
+Comprehensive test suites cover:
+
+**Engine & Orchestration:**
+- CopilotEngine – Planning loop, iteration limits, multi-turn conversations
+- Planner – LLM planning, JSON fallback, heuristic fallback
+- ToolRunner – Tool execution, result normalization, error handling
+- ParallelToolRunner – Concurrent execution, ordering, deduplication
+- ResultCache – Cache hits/misses, invalidation
+- EntityExtractor – Entity extraction from various tool result structures
+- ReferenceResolver – Reference resolution with conversation history
+- FollowUpEngine – Follow-up suggestion generation and deduplication
+- ExecutionTracer – Trace creation, telemetry, and diagnostics
+
+**Capability Handlers:**
+- Intent Classification – Pattern matching, service extraction, tool injection
+- Entity Extraction – ID extraction, entity type detection, nested structure handling
+- Scope Inference – Scope detection from context, intelligent parameterization
+- Reference Handlers – Pronoun resolution, entity linking, temporal references
+- Validation – Tool call validation and argument normalization
+- Follow-up – Context-aware follow-up suggestions
+
+**Conversation Management:**
+- ConversationManager – Turn storage, retrieval, LRU eviction
+- ConversationStore – In-memory and SQLite persistence
+- ConversationSearch – Full-text search, filtering, result ranking
+
+**Analysis & Synthesis:**
+- CorrelationDetector – Correlation detection, root cause identification
+- AnomalyDetector – Anomaly detection, trend analysis
+- TimeWindowExpander – Window expansion, capping calculations
+- AnswerFormatter – Evidence aggregation, reference formatting
+
+**Utilities:**
+- ChatNamer – Conversation name generation and synthesis
+- ServiceDiscovery – Service lookup and caching
+- TimestampUtils – Timestamp parsing and formatting
+- MetricUtils – Metric parsing and aggregation
+- ToolsSchema – Tool schema validation
+
+### Testing Patterns
 
-- planner and tool execution loops
-- conversation history and storage backends
-- MCP integration layers
-- capability handlers and follow-ups
-- HTTP API behavior
+- **MockMcp** – Simulates MCP tool responses without network calls
+- **Temporary SQLite databases** – Each SQLite test uses a temporary database file cleaned up after test runs
+- **Conversation fixtures** – Pre-built conversation data for testing multi-turn flows
+- **Tool result mocking** – Realistic tool responses for testing handlers and synthesis
 
 ### Integration Testing
+
 Start the full stack for end-to-end testing:
+
 1. Start Core: `cd ../opsorch-core && go run ./cmd/opsorch`
 2. Start MCP: `cd ../opsorch-mcp && npm run dev`
 3. Start Copilot: `npm run dev`
 4. Start Console: `cd ../opsorch-console && npm run dev`
 
-Test via Console UI or direct API calls to `http://localhost:6060/chat`
-
-### Testing Patterns
-- **MockMcp**: Simulates MCP tool responses without network calls
-- **Temporary SQLite databases**: Each SQLite test uses a temporary database file cleaned up after test runs
-- **Conversation fixtures**: Pre-built conversation data for testing multi-turn flows
-- **Tool result mocking**: Realistic tool responses for testing handlers and synthesis
+Test via Console UI at `http://localhost:3000` or direct API calls to `http://localhost:6060/chat`
 
 ## Seeding the Database
 
@@ -250,9 +493,9 @@ To populate the database with realistic sample conversations for testing or demo
 npm run seed
 ```
 
-This will:
-- Clear any existing conversations in the database
-- Generate 30 realistic operational conversations covering various scenarios:
+This command:
+- Clears any existing conversations in the database
+- Generates 30 realistic operational conversations covering various scenarios:
   - Incident investigations (high error rates, service outages)
   - Service health checks and monitoring
   - Performance issues (latency spikes, memory leaks)
@@ -260,11 +503,13 @@ This will:
   - Deployment verifications
   - SSL certificate management
   - Rate limiting and cache issues
-- Populate conversations with realistic tool results, timestamps, and entities
-- Distribute conversations across the last 30 days
+- Populates conversations with realistic tool results, timestamps, and entities
+- Distributes conversations across the last 30 days
 
 The seed script uses the database path from `SQLITE_DB_PATH` environment variable or defaults to `./data/conversations.db`.
 
+**Note:** Seeding requires SQLite storage. Set `CONVERSATION_STORE_TYPE=sqlite` before running the seed command.
+
 ## License
 
 Apache-2.0. See [LICENSE](LICENSE).
diff --git a/src/engine/answerGenerator.ts b/src/engine/answerGenerator.ts
index 977b1ce..705edeb 100644
--- a/src/engine/answerGenerator.ts
+++ b/src/engine/answerGenerator.ts
@@ -163,6 +163,12 @@ function extractReferencesFromResults(results: ToolResult[]): CopilotReferences
       extractLogReferences(result, refs.logs);
     }
 
+    // Extract team IDs
+    if (result.name.includes('team')) {
+      refs.teams = refs.teams || [];
+      extractTeamIds(result.result, refs.teams);
+    }
+
     // Extract orchestration plan IDs
     if (result.name.includes('orchestration')) {
       refs.orchestrationPlans = refs.orchestrationPlans || [];
@@ -176,6 +182,7 @@ function extractReferencesFromResults(results: ToolResult[]): CopilotReferences
   if (refs.alerts) refs.alerts = [...new Set(refs.alerts)];
   if (refs.deployments) refs.deployments = [...new Set(refs.deployments)];
   if (refs.tickets) refs.tickets = [...new Set(refs.tickets)];
+  if (refs.teams) refs.teams = [...new Set(refs.teams)];
   // Metrics and logs are complex objects, dedupe by JSON string
   if (refs.metrics) refs.metrics = dedupeByJson(refs.metrics);
   if (refs.logs) refs.logs = dedupeByJson(refs.logs);
@@ -326,6 +333,31 @@ function extractTicketIds(data: unknown, ids: string[]): void {
   }
 }
 
+function extractTeamIds(data: unknown, ids: string[]): void {
+  if (Array.isArray(data)) {
+    for (const item of data) {
+      if (typeof item === 'object' && item !== null) {
+        const obj = item as Record<string, unknown>;
+        if ('id' in obj) ids.push(String(obj.id));
+        else if ('name' in obj) ids.push(String(obj.name));
+      }
+    }
+  } else if (typeof data === 'object' && data !== null) {
+    const obj = data as Record<string, unknown>;
+    if ('id' in obj) ids.push(String(obj.id));
+    else if ('name' in obj) ids.push(String(obj.name));
+    if (Array.isArray(obj.teams)) {
+      for (const team of obj.teams) {
+        if (typeof team === 'object' && team !== null) {
+          const t = team as Record<string, unknown>;
+          if ('id' in t) ids.push(String(t.id));
+          else if ('name' in t) ids.push(String(t.name));
+        }
+      }
+    }
+  }
+}
+
 /**
  * Extract metric references from query-metrics tool results.
  * Uses the tool arguments to build a MetricReference with deep-linking metadata.
diff --git a/src/engine/capabilityRegistry.ts b/src/engine/capabilityRegistry.ts
index d752d97..88be345 100644
--- a/src/engine/capabilityRegistry.ts
+++ b/src/engine/capabilityRegistry.ts
@@ -168,6 +168,7 @@ function createFollowUpRegistry(): FollowUpRegistry {
 
   registry.register("query-logs", logFollowUpHandler);
 
+  registry.register("describe-metrics", metricFollowUpHandler);
   registry.register("query-metrics", metricFollowUpHandler);
 
   registry.register("query-tickets", ticketFollowUpHandler);
@@ -183,7 +184,7 @@ function createFollowUpRegistry(): FollowUpRegistry {
   registry.register("get-orchestration-plan", orchestrationFollowUpHandler);
 
   console.log(
-    "[FollowUpRegistry] Registered 17 follow-up handlers for 9 capabilities",
+    "[FollowUpRegistry] Registered 18 follow-up handlers for 9 capabilities",
   );
 
   return registry;
diff --git a/src/engine/copilotEngine.ts b/src/engine/copilotEngine.ts
index 39def83..53cdb0e 100644
--- a/src/engine/copilotEngine.ts
+++ b/src/engine/copilotEngine.ts
@@ -324,6 +324,7 @@ export class CopilotEngine {
     const allExtractedEntities: Entity[] = [];
     let iteration = 0;
     let isFirstIteration = true;
+    let hadPlannedCalls = false;
 
     // Step 4: Reasoning Loop
     while (iteration < this.maxIterations) {
@@ -421,6 +422,11 @@ export class CopilotEngine {
       // Limit calls
       plannedCalls = this.limitToolCalls(plannedCalls);
 
+      // Track if any calls were ever planned (before toolRunner filtering)
+      if (plannedCalls.length > 0) {
+        hadPlannedCalls = true;
+      }
+
       // B. Check Stop Condition
       if (plannedCalls.length === 0) {
         console.log(
@@ -533,6 +539,14 @@ export class CopilotEngine {
       this.config.llm,
     );
 
+    // If calls were planned but no results were collected, tool calls were likely
+    // skipped due to unresolved placeholder arguments (e.g. {{incidentId}})
+    if (hadPlannedCalls && allResults.length === 0) {
+      answer.missing = answer.missing
+        ? [...answer.missing, 'tool outputs']
+        : ['tool outputs'];
+    }
+
     // Step 6: Create TurnExecutionTrace from ExecutionTrace
     const turnTrace: TurnExecutionTrace = {
       traceId: trace.traceId,
diff --git a/src/engine/followUpEngine.ts b/src/engine/followUpEngine.ts
index 41b1aa5..ad72a7c 100644
--- a/src/engine/followUpEngine.ts
+++ b/src/engine/followUpEngine.ts
@@ -29,6 +29,7 @@ export class FollowUpEngine {
           chatId,
           conversationHistory,
           userQuestion,
+          toolResults,
         );
 
         try {
@@ -66,12 +67,13 @@ export class FollowUpEngine {
     chatId: string,
     conversationHistory: ConversationTurn[],
     userQuestion: string,
+    currentResults: ToolResult[],
   ): HandlerContext {
     return {
       chatId,
       turnNumber: conversationHistory.length,
       conversationHistory,
-      toolResults: [result],
+      toolResults: currentResults,
       userQuestion,
     };
   }
diff --git a/src/engine/handlers/incident/scopeHandler.ts b/src/engine/handlers/incident/scopeHandler.ts
index 228d08a..3a0cee9 100644
--- a/src/engine/handlers/incident/scopeHandler.ts
+++ b/src/engine/handlers/incident/scopeHandler.ts
@@ -156,6 +156,26 @@ export const incidentScopeInferenceHandler: ScopeHandler = async (
 
   // Note: We no longer look at conversation history toolResults since they're not stored.
   // Scope from previous turns should be inferred from entities or the current turn's results.
+  if (!hasScope || !(scope.service && scope.environment && scope.team)) {
+    for (let i = context.conversationHistory.length - 1; i >= 0; i--) {
+      const turn = context.conversationHistory[i];
+      if (turn.entities) {
+        for (const entity of turn.entities) {
+          if (entity.type === "service" && !scope.service) {
+            scope.service = entity.value;
+            hasScope = true;
+          }
+          if (entity.type === "team" && !scope.team) {
+            scope.team = entity.value;
+            hasScope = true;
+          }
+        }
+      }
+      if (scope.service && scope.environment && scope.team) {
+        break;
+      }
+    }
+  }
 
   return hasScope ? scope : null;
 };
diff --git a/src/engine/handlers/metric/followUpHandler.ts b/src/engine/handlers/metric/followUpHandler.ts
index 9d419ae..3078ccd 100644
--- a/src/engine/handlers/metric/followUpHandler.ts
+++ b/src/engine/handlers/metric/followUpHandler.ts
@@ -9,7 +9,7 @@
  */
 
 import type { FollowUpHandler } from "../handlers.js";
-import type { ToolCall, JsonObject } from "../../../types.js";
+import type { ToolCall, ToolResult, JsonObject } from "../../../types.js";
 import { generateSearchExpression } from "../logQueryParser.js";
 import { HandlerUtils } from "../utils.js";
 
@@ -29,12 +29,170 @@ const LATENCY_METRIC_PATTERNS = [
   "timeout",
 ];
 
+function extractMetricNames(result: ToolResult["result"]): string[] {
+  if (Array.isArray(result)) {
+    return result
+      .map((entry) => {
+        if (typeof entry === "string") return entry;
+        if (entry && typeof entry === "object" && "name" in entry) {
+          const name = (entry as JsonObject).name;
+          return typeof name === "string" ? name : undefined;
+        }
+        return undefined;
+      })
+      .filter((name): name is string => typeof name === "string" && name.length > 0);
+  }
+
+  if (result && typeof result === "object") {
+    const metrics = (result as JsonObject).metrics;
+    if (Array.isArray(metrics)) {
+      return extractMetricNames(metrics);
+    }
+  }
+
+  return [];
+}
+
+function shouldQueryDiscoveredMetrics(question: string): boolean {
+  const lower = question.toLowerCase();
+  const discoveryOnlyPatterns = [
+    "available metrics",
+    "list metrics",
+    "what metrics",
+    "which metrics",
+  ];
+  if (discoveryOnlyPatterns.some((pattern) => lower.includes(pattern))) {
+    return false;
+  }
+
+  const investigationTerms = [
+    "metric",
+    "metrics",
+    "cpu",
+    "memory",
+    "latency",
+    "p95",
+    "p99",
+    "throughput",
+    "request",
+    "error",
+  ];
+  const actionTerms = [
+    "check",
+    "show",
+    "inspect",
+    "analy",
+    "graph",
+    "trend",
+    "root cause",
+    "why",
+    "high",
+  ];
+
+  return (
+    investigationTerms.some((term) => lower.includes(term)) &&
+    actionTerms.some((term) => lower.includes(term))
+  );
+}
+
+function selectMetricName(question: string, metricNames: string[]): string | undefined {
+  const lower = question.toLowerCase();
+  const preferredTokens = [
+    "cpu",
+    "memory",
+    "latency",
+    "p99",
+    "p95",
+    "throughput",
+    "request",
+    "error",
+  ];
+
+  for (const token of preferredTokens) {
+    if (!lower.includes(token)) continue;
+    const match = metricNames.find((name) => name.toLowerCase().includes(token));
+    if (match) return match;
+  }
+
+  return metricNames[0];
+}
+
+function getIncidentTimeWindow(context: Parameters<FollowUpHandler>[0]): {
+  start: string;
+  end: string;
+} | null {
+  for (const result of context.toolResults) {
+    if (result.name !== "query-incidents" && result.name !== "get-incident") {
+      continue;
+    }
+
+    const incident = Array.isArray(result.result)
+      ? result.result[0]
+      : result.result;
+    if (!incident || typeof incident !== "object") {
+      continue;
+    }
+
+    const incidentObject = incident as JsonObject;
+    const startValue = incidentObject.startTime ?? incidentObject.createdAt;
+    const endValue = incidentObject.endTime ?? incidentObject.updatedAt;
+    if (typeof startValue !== "string" && typeof endValue !== "string") {
+      continue;
+    }
+
+    const expanded = HandlerUtils.expandTimeWindow(
+      typeof startValue === "string" ? startValue : undefined,
+      typeof endValue === "string" ? endValue : undefined,
+      15,
+    );
+    return {
+      start: expanded.start.toISOString(),
+      end: expanded.end.toISOString(),
+    };
+  }
+
+  return null;
+}
+
 export const metricFollowUpHandler: FollowUpHandler = async (
   context,
   toolResult,
 ): Promise<ToolCall[]> => {
   const suggestions: ToolCall[] = [];
 
+  if (toolResult.name === "describe-metrics") {
+    const metricNames = extractMetricNames(toolResult.result);
+    const scope = toolResult.arguments?.scope as JsonObject | undefined;
+    const service = scope?.service;
+
+    if (
+      typeof service === "string" &&
+      metricNames.length > 0 &&
+      shouldQueryDiscoveredMetrics(context.userQuestion) &&
+      !HandlerUtils.isDuplicateToolCall(context, "query-metrics", service)
+    ) {
+      const metricName = selectMetricName(context.userQuestion, metricNames);
+      if (metricName) {
+        const incidentWindow = getIncidentTimeWindow(context);
+        const end = incidentWindow?.end ?? new Date().toISOString();
+        const start = incidentWindow?.start ?? new Date(Date.now() - 60 * 60 * 1000).toISOString();
+
+        suggestions.push({
+          name: "query-metrics",
+          arguments: {
+            scope: { service },
+            expression: { metricName },
+            step: 60,
+            start,
+            end,
+          },
+        });
+      }
+    }
+
+    return suggestions;
+  }
+
   if (!toolResult.result || typeof toolResult.result !== "object") {
     return suggestions;
   }
@@ -155,4 +313,3 @@ export const metricFollowUpHandler: FollowUpHandler = async (
 
   return suggestions;
 };
-
diff --git a/src/engine/handlers/metric/queryBuilder.ts b/src/engine/handlers/metric/queryBuilder.ts
index 663c8f0..a88bc89 100644
--- a/src/engine/handlers/metric/queryBuilder.ts
+++ b/src/engine/handlers/metric/queryBuilder.ts
@@ -20,7 +20,7 @@ import { QueryBuilderHandler } from "../handlers.js";
 import { JsonObject } from "../../../types.js";
 
 export const metricQueryBuilder: QueryBuilderHandler = async (
-    _context,
+    context,
     _toolName,
     naturalLanguage,
 ): Promise<JsonObject> => {
@@ -42,7 +42,30 @@ export const metricQueryBuilder: QueryBuilderHandler = async (
     } else if (lower.includes("request") || lower.includes("throughput")) {
         expression.metricName = "requests";
     }
-    // Note: We don't guess names like "error_rate" that may not exist
+
+    // If no hint from natural language, try to use a discovered metric from describe-metrics results
+    if (!expression.metricName) {
+        for (const result of context.toolResults) {
+            if (result.name === "describe-metrics") {
+                const data = result.result;
+                const list = Array.isArray(data) ? data
+                    : (data && typeof data === "object" && Array.isArray((data as Record<string, unknown>).metrics))
+                        ? (data as Record<string, unknown>).metrics as unknown[]
+                        : null;
+                if (list && list.length > 0) {
+                    const first = list[0];
+                    const name = typeof first === "string" ? first
+                        : (first && typeof first === "object" && "name" in (first as object))
+                            ? String((first as Record<string, unknown>).name)
+                            : undefined;
+                    if (name) {
+                        expression.metricName = name;
+                        break;
+                    }
+                }
+            }
+        }
+    }
 
     // MCP schema: start/end must be ISO 8601 datetime
     const now = new Date();
diff --git a/src/engine/handlers/metric/validationHandler.ts b/src/engine/handlers/metric/validationHandler.ts
index 70ef3c9..43a2990 100644
--- a/src/engine/handlers/metric/validationHandler.ts
+++ b/src/engine/handlers/metric/validationHandler.ts
@@ -87,68 +87,51 @@ export const metricValidationHandler: ValidationHandler = async (
 
   if (toolName === "query-metrics") {
     // Check if describe-metrics was called first for this scope
-    const scope = toolArgs.scope as JsonObject | undefined;
-    const service = scope?.service as string | undefined;
+    const scopeArg = toolArgs.scope as { service?: string } | undefined;
+    const service = scopeArg?.service;
+    const discoveredMetrics = getDiscoveredMetrics(context, service);
 
-    const discovered = getDiscoveredMetrics(context, service);
-
-    if (discovered === null) {
-      // Reject the call - describe-metrics must be called first
-      errors.push({
-        field: "expression.metricName",
-        message: `describe-metrics must be called first to discover available metrics${service ? ` for service '${service}'` : ''}`,
-        code: "PREREQUISITE_NOT_MET",
-      });
-      console.log(`[MetricValidation] Rejecting query-metrics: describe-metrics not called for scope ${service || 'global'}`);
+    if (discoveredMetrics === null) {
       return {
         valid: false,
-        errors,
+        errors: [
+          {
+            field: "prerequisite",
+            message: `describe-metrics must be called before query-metrics for scope '${service || "global"}'`,
+            code: "PREREQUISITE_NOT_MET",
+          },
+        ],
         replacementCall: {
           name: "describe-metrics",
-          arguments: { scope: service ? { service } : null },
+          arguments: service ? { scope: { service } } : { scope: null },
         },
       };
     }
 
-    // Strict validation if we have the list
-    if (Array.isArray(discovered)) {
-      const expr = toolArgs.expression as JsonObject;
-      const metricName = expr?.metricName as string;
-      if (metricName && !discovered.includes(metricName)) {
-        errors.push({
-          field: "expression.metricName",
-          message: `Metric '${metricName}' not found in discovered metrics. Available: ${discovered.slice(0, 10).join(", ")}${discovered.length > 10 ? "..." : ""}`,
-          code: "INVALID_METRIC_NAME",
-        });
-        console.log(`[MetricValidation] Rejecting query-metrics: '${metricName}' not in usage list.`);
-
-        // Check if describe-metrics was just called (fresh) to avoid infinite loops
-        const isFresh = context.toolResults.some((result) => {
-          if (result.name !== "describe-metrics") return false;
-          const resultScope = result.arguments?.scope as JsonObject | undefined;
-          const resultService = resultScope?.service as string | undefined;
-          // Match if both are undefined/null OR both have the same service
-          return (service === undefined && resultService === undefined) ||
-            (service !== undefined && resultService === service);
-        });
-
-        if (isFresh) {
-          console.log(`[MetricValidation] Not suggesting replacement because describe-metrics was already called in this turn.`);
-          return {
-            valid: false,
-            errors,
-            // Do NOT provide replacementCall -> drops the call, forces LLM (or fallback) to handle error
-          };
-        }
-
-        // Re-suggest describe-metrics to refresh the list/context
+    // If we have a specific list of discovered metrics, validate or auto-populate the metric name
+    if (Array.isArray(discoveredMetrics) && discoveredMetrics.length > 0) {
+      const expr = toolArgs.expression as { metricName?: string } | undefined;
+      if (expr?.metricName && !discoveredMetrics.includes(expr.metricName)) {
         return {
           valid: false,
-          errors,
-          replacementCall: {
-            name: "describe-metrics",
-            arguments: { scope: service ? { service } : null },
-          },
+          errors: [
+            {
+              field: "expression.metricName",
+              message: `Metric '${expr.metricName}' was not found in describe-metrics results. Available: ${discoveredMetrics.slice(0, 5).join(", ")}`,
+              code: "INVALID_METRIC_NAME",
+            },
+          ],
+          // Do NOT suggest describe-metrics again if it was already called in this turn
+          replacementCall: undefined,
+        };
+      }
+      // Auto-populate expression.metricName from discovered metrics if missing
+      if (!expr?.metricName) {
+        normalizedArgs.expression = {
+          ...(typeof normalizedArgs.expression === "object" && normalizedArgs.expression !== null
+            ? normalizedArgs.expression as JsonObject
+            : {}),
+          metricName: discoveredMetrics[0],
         };
       }
     }
diff --git a/src/engine/planRefiner.ts b/src/engine/planRefiner.ts
index 3ce02cc..cf3955f 100644
--- a/src/engine/planRefiner.ts
+++ b/src/engine/planRefiner.ts
@@ -71,7 +71,9 @@ export class PlanRefiner {
       );
 
       // Also run basic schema validation as a fallback/safety check
-      const schemaValidation = validateToolCall(call, tool);
+      // Use normalizedArgs if available (validation handler may have fixed the args)
+      const argsForSchemaValidation = validation.normalizedArgs ?? call.arguments;
+      const schemaValidation = validateToolCall({ ...call, arguments: argsForSchemaValidation }, tool);
 
       if (validation.valid && schemaValidation.valid) {
         // Use normalized (fixed) arguments if available
@@ -94,7 +96,8 @@ export class PlanRefiner {
       }
     }
 
-    // Return replacement calls first, then valid calls
+    // If any replacements were generated, return replacements + other valid calls.
+    // Only the invalid call is replaced; other valid calls (e.g. query-logs) still run.
     const validCalls = validatedCalls.filter((v) => v.valid).map((v) => v.call);
     return [...replacementCalls, ...validCalls];
   }
diff --git a/src/engine/toolsSchema.ts b/src/engine/toolsSchema.ts
index 1b3b0b2..0710d89 100644
--- a/src/engine/toolsSchema.ts
+++ b/src/engine/toolsSchema.ts
@@ -86,21 +86,24 @@ export function validateToolCall(
       const propSchema = properties[key] as JsonObject | undefined;
       if (!propSchema) continue; // Unknown property, skip
 
-      const expectedType = propSchema.type as string | undefined;
+      const rawExpectedType = propSchema.type as string | string[] | undefined;
+      const expectedTypes = Array.isArray(rawExpectedType) ? rawExpectedType : (rawExpectedType ? [rawExpectedType] : []);
       const actualType = Array.isArray(value) ? "array" : typeof value;
 
       // Special handling for integer: JavaScript typeof returns 'number' for all numbers
-      const typesMatch =
-        expectedType === actualType ||
-        (expectedType === "integer" && actualType === "number");
+      const typesMatch = expectedTypes.length === 0 || expectedTypes.some(t =>
+        t === actualType || (t === "integer" && actualType === "number")
+      );
 
-      if (expectedType && !typesMatch) {
+      if (!typesMatch) {
         errors.push(
-          `Field '${key}' has type ${actualType}, expected ${expectedType}`,
+          `Field '${key}' has type ${actualType}, expected ${expectedTypes.join(",")}`,
         );
         continue; // Skip further validation if type is wrong
       }
 
+      const expectedType = expectedTypes.find(t => t !== "null");
+
       // Timestamp validation for common time fields
       if (
         typeof value === "string" &&
@@ -195,8 +198,8 @@ export function validateToolCall(
         const rawValue = rawArgs[key];
         const rawObj =
           typeof rawValue === "object" &&
-          rawValue !== null &&
-          !Array.isArray(rawValue)
+            rawValue !== null &&
+            !Array.isArray(rawValue)
             ? (rawValue as JsonObject)
             : undefined;
         const nestedResult = validateObject(value as JsonObject, propSchema, rawObj);
@@ -257,12 +260,17 @@ function validateObject(
     const propSchema = properties[key] as JsonObject | undefined;
     if (!propSchema) continue;
 
-    const expectedType = propSchema.type as string | undefined;
+    const rawExpectedType = propSchema.type as string | string[] | undefined;
+    const expectedTypes = Array.isArray(rawExpectedType) ? rawExpectedType : (rawExpectedType ? [rawExpectedType] : []);
     const actualType = Array.isArray(value) ? "array" : typeof value;
 
-    if (expectedType && expectedType !== actualType) {
+    const typesMatch = expectedTypes.length === 0 || expectedTypes.some(t =>
+      t === actualType || (t === "integer" && actualType === "number")
+    );
+
+    if (!typesMatch) {
       errors.push(
-        `Field '${key}' has type ${actualType}, expected ${expectedType}`,
+        `Field '${key}' has type ${actualType}, expected ${expectedTypes.join(",")}`,
       );
     }
   }
diff --git a/src/llms/mock.ts b/src/llms/mock.ts
index 0f73402..83dac14 100644
--- a/src/llms/mock.ts
+++ b/src/llms/mock.ts
@@ -1,91 +1,942 @@
-
 import {
+  JsonObject,
   LlmClient,
   LlmMessage,
   LlmResponse,
   Tool,
   ToolCall,
 } from "../types.js";
+import { HandlerUtils } from "../engine/handlers/utils.js";
+
+type MockPhase =
+  | "planner"
+  | "json-planner"
+  | "refinement"
+  | "json-refinement"
+  | "synthesis";
+
+type TimeWindow = {
+  start: string;
+  end: string;
+  step: number;
+};
+
+const SERVICE_STOP_WORDS = new Set([
+  "show",
+  "find",
+  "list",
+  "check",
+  "investigate",
+  "query",
+  "look",
+  "what",
+  "why",
+  "where",
+  "when",
+  "recent",
+  "current",
+  "today",
+  "last",
+  "hour",
+  "hours",
+  "minute",
+  "minutes",
+  "for",
+  "with",
+  "from",
+  "about",
+  "into",
+  "during",
+  "around",
+  "errors",
+  "error",
+  "latency",
+  "cpu",
+  "memory",
+  "traffic",
+  "metric",
+  "metrics",
+  "logs",
+  "alerts",
+  "incidents",
+  "incident",
+  "service",
+  "services",
+  "ticket",
+  "tickets",
+  "deployment",
+  "deployments",
+  "team",
+  "teams",
+  "runbook",
+  "runbooks",
+  "question",
+  "plan",
+  "follow",
+  "concrete",
+  "arguments",
+  "returned",
+  "results",
+  "count",
+  "data",
+  "tool",
+  "calls",
+]);
 
-// Mock LLM that behaves like a real planner: inspects the user message and available tools,
-// emits a structured plan (toolCalls) and stable-but-random IDs for conversation/response.
 export class MockLlm implements LlmClient {
   async chat(messages: LlmMessage[], tools: Tool[]): Promise<LlmResponse> {
-    // If no tools are supplied, we are in synthesis mode; return a structured answer instead of a plan.
-    if (!tools.length) {
-      const lastUser = messages.filter((m) => m.role === "user").pop();
-      const summary = lastUser?.content?.includes("Tool results:")
-        ? "Synthesized answer from tool outputs."
-        : "Synthesized answer.";
-      return {
-        content: JSON.stringify({
-          conclusion: summary,
-          evidence: ["mock evidence"],
-          confidence: 0.9,
+    const phase = detectPhase(messages, tools);
+
+    switch (phase) {
+      case "planner":
+      case "refinement":
+        return planWithTools(messages, tools, phase === "refinement");
+      case "json-planner":
+      case "json-refinement":
+        return planAsJson(messages);
+      case "synthesis":
+      default:
+        return synthesize(messages);
+    }
+  }
+}
+
+function detectPhase(messages: LlmMessage[], tools: Tool[]): MockPhase {
+  const systemText = messages
+    .filter((message) => message.role === "system")
+    .map((message) => message.content)
+    .join("\n");
+
+  if (tools.length > 0) {
+    if (systemText.includes("(Refinement)")) return "refinement";
+    return "planner";
+  }
+
+  if (systemText.includes("JSON Planning Mode")) return "json-planner";
+  if (systemText.includes("JSON Refinement Mode")) return "json-refinement";
+  return "synthesis";
+}
+
+function planWithTools(
+  messages: LlmMessage[],
+  tools: Tool[],
+  isRefinement: boolean,
+): LlmResponse {
+  const availableTools = tools.map((tool) => tool.name);
+  const toolSet = new Set(availableTools);
+  const question = extractQuestion(messages);
+  const lowerQuestion = question.toLowerCase();
+  const lastUser = getLastMessage(messages, "user")?.content ?? "";
+  const window = inferTimeWindow(`${question}\n${lastUser}`);
+  const service = inferService(question) ?? inferService(lastUser);
+  const incidentId = inferIdentifier(question, /\binc-[a-z0-9-]+\b/i);
+  const ticketId = inferIdentifier(question, /\b(?:ticket|tkt)-[a-z0-9-]+\b/i);
+  const usedTools = collectUsedTools(messages);
+  const calls: ToolCall[] = [];
+
+  const addCall = (name: string, args: JsonObject): void => {
+    if (!toolSet.has(name) || calls.some((call) => call.name === name)) return;
+    if (isRefinement && usedTools.has(name)) return;
+    calls.push({ name, arguments: args });
+  };
+
+  const wantsIncidents =
+    /\b(incident|incidents|outage|outages|degraded|impact|impacts|sev\d|root cause)\b/i.test(question);
+  const wantsLogs =
+    /\b(log|logs|trace|traces|error|errors|500|timeout|timeouts|exception|exceptions)\b/i.test(question);
+  const wantsMetrics =
+    /\b(metric|latency|cpu|memory|traffic|throughput|rps|error rate)\b/i.test(
+      question,
+    );
+  const wantsAlerts =
+    /\b(alert|alerts|page|pages|pagerduty|detector|detectors)\b/i.test(question);
+  const wantsServices = /\b(service|services)\b/i.test(question);
+  const wantsTickets = /\b(ticket|jira)\b/i.test(question);
+  const wantsDeployments =
+    /\b(deploy|deployment|release|rollout)\b/i.test(question);
+  const wantsTeams = /\b(team|owner|on-call|oncall|who owns|who is)\b/i.test(question);
+  const wantsRunbooks =
+    /\b(runbook|playbook|orchestration)\b/i.test(question) ||
+    wantsIncidents ||
+    ((wantsLogs || wantsMetrics) && service !== undefined);
+  const wantsStatus =
+    /\b(status|health|overview|how is|what.*state)\b/i.test(question) &&
+    !wantsIncidents && !wantsLogs && !wantsMetrics;
+  const wantsChanges =
+    /\b(what changed|changes|diff|compare|regression)\b/i.test(question);
+  const isBroadInvestigation =
+    wantsIncidents &&
+    (wantsLogs || wantsMetrics || lowerQuestion.includes("what happened"));
+
+  if (wantsStatus && service) {
+    addCall(
+      "query-incidents",
+      compactObject({
+        limit: 3,
+        severities: ["sev1", "sev2"],
+        service,
+        start: window.start,
+        end: window.end,
+      }),
+    );
+    addCall(
+      "query-alerts",
+      compactObject({
+        limit: 5,
+        start: window.start,
+        end: window.end,
+        scope: { service } as JsonObject,
+      }),
+    );
+    if (toolSet.has("describe-metrics") && !usedTools.has("describe-metrics")) {
+      addCall(
+        "describe-metrics",
+        compactObject({ scope: { service } as JsonObject }),
+      );
+    }
+    addCall(
+      "query-orchestration-plans",
+      compactObject({ query: `${service} incident` }),
+    );
+  }
+
+  if (wantsChanges) {
+    addCall(
+      "query-deployments",
+      compactObject({
+        start: window.start,
+        end: window.end,
+        scope: service ? ({ service } as JsonObject) : undefined,
+      }),
+    );
+    addCall(
+      "query-incidents",
+      compactObject({
+        limit: 3,
+        severities: ["sev1", "sev2"],
+        service,
+        start: window.start,
+        end: window.end,
+      }),
+    );
+  }
+
+  if (wantsServices) {
+    addCall("query-services", service ? { query: service } : {});
+  }
+
+  if (wantsIncidents) {
+    addCall(
+      "query-incidents",
+      compactObject({
+        limit: isBroadInvestigation ? 3 : 2,
+        severities: lowerQuestion.includes("sev1") ? ["sev1"] : ["sev1", "sev2"],
+        service,
+        start: window.start,
+        end: window.end,
+      }),
+    );
+  }
+
+  if (incidentId && toolSet.has("get-incident-timeline")) {
+    addCall("get-incident-timeline", { id: incidentId });
+  }
+
+  if (wantsAlerts) {
+    addCall(
+      "query-alerts",
+      compactObject({
+        limit: 5,
+        start: window.start,
+        end: window.end,
+        scope: service ? ({ service } as JsonObject) : undefined,
+      }),
+    );
+  }
+
+  if (wantsMetrics) {
+    if (toolSet.has("describe-metrics") && !usedTools.has("describe-metrics")) {
+      addCall(
+        "describe-metrics",
+        compactObject({
+          scope: service ? ({ service } as JsonObject) : undefined,
+        }),
+      );
+    } else {
+      addCall(
+        "query-metrics",
+        compactObject({
+          expression: inferMetricExpression(lowerQuestion),
+          start: window.start,
+          end: window.end,
+          step: window.step,
+          scope: service ? ({ service } as JsonObject) : undefined,
         }),
-        toolCalls: [],
+      );
+    }
+  }
+
+  if (wantsLogs || (isRefinement && usedTools.has("query-incidents"))) {
+    addCall(
+      "query-logs",
+      compactObject({
+        expression: {
+          search: inferLogSearch(lowerQuestion, service),
+        },
+        start: window.start,
+        end: window.end,
+        scope: service ? ({ service } as JsonObject) : undefined,
+      }),
+    );
+  }
+
+  if (wantsTickets || ticketId) {
+    addCall(
+      "query-tickets",
+      compactObject({
+        query: ticketId ?? service ?? "incident follow-up",
+      }),
+    );
+  }
+
+  if (wantsDeployments) {
+    addCall(
+      "query-deployments",
+      compactObject({
+        start: window.start,
+        end: window.end,
+        scope: service ? ({ service } as JsonObject) : undefined,
+      }),
+    );
+  }
+
+  if (wantsTeams) {
+    addCall("query-teams", service ? { service } : {});
+  }
+
+  if (wantsRunbooks) {
+    addCall(
+      "query-orchestration-plans",
+      compactObject({
+        query: service ? `${service} incident` : "incident mitigation",
+      }),
+    );
+  }
+
+  if (isRefinement) {
+    const refinementCalls = refinePlan(messages, tools, calls, service, window);
+    if (refinementCalls.length > 0) {
+      return {
+        content: "I found likely follow-up checks based on the previous tool results.",
+        toolCalls: refinementCalls,
       };
     }
+  }
 
-    const user = messages.filter((m) => m.role === "user").pop();
-    const text = (user?.content || "").toLowerCase();
-    const toolNames = new Set(tools.map((t) => t.name));
+  if (calls.length === 0) {
+    const fallbackTool = availableTools.find((name) => name !== "health");
+    if (fallbackTool) {
+      addCall(
+        fallbackTool,
+        fallbackArguments(fallbackTool, service, window),
+      );
+    }
+  }
 
-    const calls: ToolCall[] = [];
+  return {
+    content: buildPlannerNarrative(calls, question, isRefinement),
+    toolCalls: calls.slice(0, 5),
+  };
+}
 
-    // If asking about incidents/impact, fetch top 2 severe incidents.
-    if (text.includes("incident") || text.includes("impactful")) {
-      if (toolNames.has("query-incidents")) {
-        calls.push({
-          name: "query-incidents",
-          arguments: { limit: 2, severities: ["sev1", "sev2"] },
-        });
-      }
+function planAsJson(messages: LlmMessage[]): LlmResponse {
+  const availableTools = parseToolsFromMessages(messages);
+  const toolObjects = availableTools.map((name) => ({ name }));
+  const planned = planWithTools(messages, toolObjects, isJsonRefinement(messages));
+  return {
+    content: JSON.stringify(
+      {
+        reasoning: "Selected concrete tools and arguments from the user request.",
+        toolCalls: planned.toolCalls ?? [],
+      },
+      null,
+      2,
+    ),
+    toolCalls: [],
+  };
+}
+
+function synthesize(messages: LlmMessage[]): LlmResponse {
+  const prompt = getLastMessage(messages, "user")?.content ?? "";
+  const toolResults = extractToolResultsFromPrompt(prompt);
+  const incidents = extractMatches(prompt, /\binc-[a-z0-9-]+\b/gi);
+  const tickets = extractMatches(prompt, /\b(?:ticket|tkt)-[a-z0-9-]+\b/gi);
+  const services = extractServiceMentions(prompt);
+  const alerts = extractMatches(prompt, /\balt-[a-z0-9-]+\b/gi);
+  const deployments = extractMatches(prompt, /\bdep-[a-z0-9-]+\b/gi);
+  const orchestrationPlans = extractMatches(prompt, /\bplan-[a-z0-9-]+\b/gi);
+  const teams = extractTeamMentions(prompt);
+  const evidence = buildEvidence(prompt, incidents, services, toolResults);
+  const conclusion = buildConclusion(prompt, services, incidents, orchestrationPlans, toolResults);
+  const response = {
+    conclusion,
+    evidence,
+    missing: evidence.length >= 2 ? [] : ["More tool data would improve confidence."],
+    actions:
+      orchestrationPlans.length > 0
+        ? [
+          {
+            type: "orchestration_plan" as const,
+            id: orchestrationPlans[0],
+            name: buildOrchestrationPlanName(toolResults, orchestrationPlans[0]),
+            reason: buildOrchestrationPlanReason(services, incidents),
+          },
+        ]
+        : [],
+    references: compactObject({
+      incidents,
+      services,
+      tickets,
+      alerts,
+      deployments,
+      teams,
+      orchestrationPlans,
+    }),
+    confidence: estimateConfidence(prompt, evidence.length),
+  };
+
+  return {
+    content: JSON.stringify(response, null, 2),
+    toolCalls: [],
+  };
+}
+
+function refinePlan(
+  messages: LlmMessage[],
+  tools: Tool[],
+  initialCalls: ToolCall[],
+  service: string | undefined,
+  window: TimeWindow,
+): ToolCall[] {
+  const lastUser = getLastMessage(messages, "user")?.content ?? "";
+  const usedTools = collectUsedTools(messages);
+  const toolSet = new Set(tools.map((tool) => tool.name));
+  const calls = [...initialCalls];
+
+  // Extract entities from prior tool result JSON for targeted follow-ups.
+  // Prefer JSON-extracted services over regex-inferred ones since they come
+  // from actual tool output rather than prompt text parsing.
+  const discoveredEntities = extractEntitiesFromToolResults(lastUser);
+  const effectiveService = discoveredEntities.services[0] ?? service;
+
+  const pushIfAvailable = (name: string, args: JsonObject): void => {
+    if (!toolSet.has(name) || usedTools.has(name) || calls.some((call) => call.name === name)) {
+      return;
     }
+    calls.push({ name, arguments: args });
+  };
 
-    // If asking about logs, request recent error logs (placeholder window).
-    if (text.includes("log") && toolNames.has("query-logs")) {
-      calls.push({
-        name: "query-logs",
-        arguments: {
-          query: "error OR 500",
-          start: new Date(Date.now() - 15 * 60 * 1000).toISOString(),
-          end: new Date().toISOString(),
-        },
-      });
+  const hasIncidentData = /query-incidents|get-incident-timeline|inc-[a-z0-9-]+/i.test(lastUser);
+  const hasLogData = /query-logs|error|errors|timeout|timeouts|exception|exceptions/i.test(lastUser);
+  const hasMetricData = /query-metrics|describe-metrics|latency|cpu|memory|rps/i.test(lastUser);
+  const hasAlertData = /query-alerts|pagerduty|alert/i.test(lastUser);
+  const hasDeploymentData = /query-deployments|dep-[a-z0-9-]+/i.test(lastUser);
+
+  if (hasIncidentData && !hasLogData) {
+    pushIfAvailable(
+      "query-logs",
+      compactObject({
+        expression: { search: inferLogSearch(lastUser.toLowerCase(), effectiveService) },
+        start: window.start,
+        end: window.end,
+        scope: effectiveService ? ({ service: effectiveService } as JsonObject) : undefined,
+      }),
+    );
+  }
+
+  if ((hasIncidentData || hasLogData) && !hasMetricData) {
+    if (toolSet.has("describe-metrics") && !usedTools.has("describe-metrics")) {
+      pushIfAvailable(
+        "describe-metrics",
+        compactObject({
+          scope: effectiveService ? ({ service: effectiveService } as JsonObject) : undefined,
+        }),
+      );
+    } else {
+      pushIfAvailable(
+        "query-metrics",
+        compactObject({
+          expression: { metricName: "latency_p95" },
+          start: window.start,
+          end: window.end,
+          step: window.step,
+          scope: effectiveService ? ({ service: effectiveService } as JsonObject) : undefined,
+        }),
+      );
     }
+  }
 
-    // If asking about metrics/latency/cpu, request key series.
-    if (
-      (text.includes("latency") ||
-        text.includes("cpu") ||
-        text.includes("memory")) &&
-      toolNames.has("query-metrics")
-    ) {
-      calls.push({
-        name: "query-metrics",
-        arguments: {
-          expression: "latency_p95, cpu_usage, memory_usage, rps",
-          start: new Date(Date.now() - 30 * 60 * 1000).toISOString(),
-          end: new Date().toISOString(),
-          step: 60,
-        },
+  if ((hasIncidentData || hasLogData || hasMetricData) && !hasAlertData) {
+    pushIfAvailable(
+      "query-alerts",
+      compactObject({
+        limit: 5,
+        start: window.start,
+        end: window.end,
+        scope: effectiveService ? ({ service: effectiveService } as JsonObject) : undefined,
+      }),
+    );
+  }
+
+  // Deployments often correlate with incidents — check for recent deploys
+  if (hasIncidentData && !hasDeploymentData) {
+    pushIfAvailable(
+      "query-deployments",
+      compactObject({
+        start: window.start,
+        end: window.end,
+        scope: effectiveService ? ({ service: effectiveService } as JsonObject) : undefined,
+      }),
+    );
+  }
+
+  // Discover team ownership when a concrete service is found
+  if (effectiveService && !usedTools.has("query-teams")) {
+    pushIfAvailable(
+      "query-teams",
+      { service: effectiveService },
+    );
+  }
+
+  if (
+    (hasIncidentData || hasMetricData || hasLogData) &&
+    !usedTools.has("query-orchestration-plans")
+  ) {
+    pushIfAvailable(
+      "query-orchestration-plans",
+      compactObject({
+        query: effectiveService ? `${effectiveService} mitigation` : "incident mitigation",
+      }),
+    );
+  }
+
+  const enoughData =
+    [hasIncidentData, hasLogData, hasMetricData].filter(Boolean).length >= 2;
+  return enoughData ? [] : calls.slice(0, 5);
+}
+
+function buildPlannerNarrative(
+  calls: ToolCall[],
+  question: string,
+  isRefinement: boolean,
+): string {
+  if (calls.length === 0) {
+    return isRefinement
+      ? "The existing results appear sufficient, so I would stop tool use here."
+      : `I could not infer a strong plan from "${question}", so I used a conservative fallback.`;
+  }
+
+  const toolNames = calls.map((call) => call.name).join(", ");
+  return isRefinement
+    ? `I need one more pass to validate the hypothesis. Next tools: ${toolNames}.`
+    : `I would start with these concrete checks: ${toolNames}.`;
+}
+
+function extractQuestion(messages: LlmMessage[]): string {
+  const lastUser = getLastMessage(messages, "user")?.content ?? "";
+  const questionMatch = lastUser.match(/Question:\s*([\s\S]*?)\nTool results/i);
+  if (questionMatch) return questionMatch[1].trim();
+
+  const requestMatch = lastUser.match(/User request:\s*([\s\S]*?)\nReturn only JSON/i);
+  if (requestMatch) return requestMatch[1].trim();
+
+  return lastUser.trim();
+}
+
+function getLastMessage(
+  messages: LlmMessage[],
+  role: LlmMessage["role"],
+): LlmMessage | undefined {
+  return [...messages].reverse().find((message) => message.role === role);
+}
+
+function parseToolsFromMessages(messages: LlmMessage[]): string[] {
+  const systemText = messages
+    .filter((message) => message.role === "system")
+    .map((message) => message.content)
+    .join("\n");
+  const matches = systemText.match(/(?:^|\n)(?:- |• )([a-z0-9-]+)/gim) ?? [];
+  const tools = matches
+    .map((match) => match.replace(/(?:^|\n)(?:- |• )/, "").trim())
+    .filter((name) => name.includes("-"));
+  return [...new Set(tools)];
+}
+
+function isJsonRefinement(messages: LlmMessage[]): boolean {
+  return messages.some(
+    (message) =>
+      message.role === "system" &&
+      message.content.includes("JSON Refinement Mode"),
+  );
+}
+
+function inferTimeWindow(text: string): TimeWindow {
+  const lower = text.toLowerCase();
+  const now = new Date();
+
+  let minutes = 60;
+  if (/\b(last|past)\s+15\s*(m|min|minutes)\b/.test(lower)) minutes = 15;
+  else if (/\b(last|past)\s+30\s*(m|min|minutes)\b/.test(lower)) minutes = 30;
+  else if (/\b(last|past)\s+2\s*(h|hr|hour|hours)\b/.test(lower)) minutes = 120;
+  else if (/\b(today|current)\b/.test(lower)) minutes = 6 * 60;
+  else if (/\b(last|past)\s+24\s*(h|hr|hour|hours)\b/.test(lower)) minutes = 24 * 60;
+
+  const start = new Date(now.getTime() - minutes * 60 * 1000).toISOString();
+  return {
+    start,
+    end: now.toISOString(),
+    step: minutes <= 30 ? 60 : 300,
+  };
+}
+
+function inferService(text: string): string | undefined {
+  // First extract all words and filter out stop words and known entity prefixes
+  const keywords = HandlerUtils.extractKeywords(text).filter((keyword) => {
+    // Ignore words that look like entity ID prefixes (inc-, dep-, alt-, plan-)
+    if (/^(inc|dep|alt|plan|tkt|ticket|alert|incident|deployment)[-0-9]*$/i.test(keyword)) {
+      return false;
+    }
+    return !SERVICE_STOP_WORDS.has(keyword);
+  });
+
+  // Then try to match a direct generic "in X" or "for X" pattern
+  const directMatch = text.match(
+    /\b(?:for|in|on|service|services)\s+([a-z][a-z0-9-]{2,})\b/i,
+  );
+  if (directMatch) {
+    const candidate = directMatch[1].toLowerCase();
+    // Only use if it survived the stop word / prefix filter
+    if (keywords.includes(candidate)) return candidate;
+  }
+
+  return keywords.find((keyword) => /^[a-z][a-z0-9-]{2,}$/.test(keyword));
+}
+
+function inferIdentifier(text: string, pattern: RegExp): string | undefined {
+  const match = text.match(pattern);
+  return match?.[0]?.toLowerCase();
+}
+
+function inferMetricExpression(question: string): JsonObject {
+  if (question.includes("cpu")) return { metricName: "cpu_usage" };
+  if (question.includes("memory")) return { metricName: "memory_usage" };
+  if (question.includes("traffic") || question.includes("rps")) {
+    return { metricName: "request_rate" };
+  }
+  if (question.includes("error rate")) return { metricName: "error_rate" };
+  return { metricName: "latency_p95" };
+}
+
+function inferLogSearch(question: string, service?: string): string {
+  if (question.includes("timeout")) return service ? `${service} timeout` : "timeout";
+  if (question.includes("500")) return service ? `${service} 500` : "500";
+  if (question.includes("exception")) {
+    return service ? `${service} exception` : "exception";
+  }
+  return service ? `${service} error OR timeout` : "error OR timeout OR 500";
+}
+
+function collectUsedTools(messages: LlmMessage[]): Set<string> {
+  const text = messages
+    .filter((message) => message.role !== "system")
+    .map((message) => message.content)
+    .join("\n");
+  const matches = text.match(/\b(?:query|get|describe)-[a-z0-9-]+\b/gi) ?? [];
+  return new Set(matches.map((match) => match.toLowerCase()));
+}
+
+function fallbackArguments(
+  toolName: string,
+  service: string | undefined,
+  window: TimeWindow,
+): JsonObject {
+  switch (toolName) {
+    case "query-incidents":
+      return compactObject({ limit: 2, service, start: window.start, end: window.end });
+    case "query-logs":
+      return compactObject({
+        expression: { search: inferLogSearch("", service) },
+        start: window.start,
+        end: window.end,
+      });
+    case "query-metrics":
+      return compactObject({
+        expression: { metricName: "latency_p95" },
+        start: window.start,
+        end: window.end,
+        step: window.step,
+      });
+    case "describe-metrics":
+      return compactObject({
+        scope: service ? ({ service } as JsonObject) : undefined,
       });
+    default:
+      return service ? { service } : {};
+  }
+}
+
+function compactObject(
+  value: Record<string, JsonObject | string | number | string[] | undefined>,
+): JsonObject {
+  const entries = Object.entries(value).filter(([, entry]) => entry !== undefined);
+  return Object.fromEntries(entries) as JsonObject;
+}
+
+function extractMatches(text: string, pattern: RegExp): string[] {
+  return [...new Set((text.match(pattern) ?? []).map((value) => value.toLowerCase()))];
+}
+
+function extractServiceMentions(text: string): string[] {
+  const matches = text.match(/\b([a-z][a-z0-9-]{2,})\s+service\b/gi) ?? [];
+  const normalized = matches.map((match) => match.replace(/\s+service$/i, "").toLowerCase());
+  const inferred = inferService(text);
+  // Also extract services from parsed tool result JSON
+  const toolResults = extractToolResultsFromPrompt(text);
+  const toolServices: string[] = [];
+  for (const tr of toolResults) {
+    if (!Array.isArray(tr.data)) continue;
+    for (const item of tr.data) {
+      if (typeof item === "object" && item !== null && "service" in item) {
+        const svc = String((item as Record<string, unknown>).service);
+        if (svc && svc !== "undefined") toolServices.push(svc.toLowerCase());
+      }
     }
+  }
+  return [...new Set([...(inferred ? [inferred] : []), ...normalized, ...toolServices])];
+}
 
-    // Fallback: if no actionable calls, still return a noop plan.
-    const hasNonPlaceholderArgs = calls.some((c) =>
-      Object.values(c.arguments).every(
-        (v) => typeof v !== "string" || !v.includes("{{"),
-      ),
-    );
-    const responseText = hasNonPlaceholderArgs
-      ? "Mock planning complete."
-      : "Mock plan with placeholders.";
+function extractTeamMentions(text: string): string[] {
+  const teams: string[] = [];
+  const toolResults = extractToolResultsFromPrompt(text);
+  for (const tr of toolResults) {
+    if (!tr.tool.includes("team")) continue;
+    const items = Array.isArray(tr.data) ? tr.data : [tr.data];
+    for (const item of items) {
+      if (typeof item === "object" && item !== null) {
+        const obj = item as Record<string, unknown>;
+        const name = obj.name ?? obj.id;
+        if (typeof name === "string" && name) teams.push(name.toLowerCase());
+      }
+    }
+  }
+  return [...new Set(teams)];
+}
 
-    return {
-      content: responseText,
-      toolCalls: calls,
-    };
+type ParsedToolResult = { tool: string; data: unknown };
+
+function extractToolResultsFromPrompt(text: string): ParsedToolResult[] {
+  const results: ParsedToolResult[] = [];
+  // Match lines from the synthesis prompt like:
+  // "query-incidents: [{...}]" or "- query-incidents => [{...}]"
+  const linePattern = /(?:^|\n)(?:- )?([a-z][a-z0-9-]+)(?:\s*(?::|=>|returned)\s*)(.+)/gi;
+  let match: RegExpExecArray | null;
+  while ((match = linePattern.exec(text)) !== null) {
+    const tool = match[1].toLowerCase();
+    const raw = match[2].trim();
+    try {
+      const parsed = JSON.parse(raw);
+      results.push({ tool, data: parsed });
+    } catch {
+      // Not valid JSON — skip
+    }
   }
+  return results;
+}
+
+function extractEntitiesFromToolResults(text: string): {
+  services: string[];
+  incidentIds: string[];
+  statuses: string[];
+} {
+  const services: string[] = [];
+  const incidentIds: string[] = [];
+  const statuses: string[] = [];
+
+  const processItem = (item: unknown): void => {
+    if (typeof item !== "object" || item === null) return;
+    const obj = item as Record<string, unknown>;
+    if (typeof obj.service === "string" && obj.service) {
+      services.push(obj.service.toLowerCase());
+    }
+    if (typeof obj.id === "string" && /^inc-/i.test(obj.id)) {
+      incidentIds.push(obj.id.toLowerCase());
+    }
+    if (typeof obj.status === "string" && obj.status) {
+      statuses.push(obj.status.toLowerCase());
+    }
+  };
+
+  // Use the line-based parser for reliable JSON extraction from tool result lines
+  const toolResults = extractToolResultsFromPrompt(text);
+  for (const tr of toolResults) {
+    const items = Array.isArray(tr.data) ? tr.data : [tr.data];
+    for (const item of items) processItem(item);
+  }
+
+  // Fallback: try to parse inline JSON arrays
+  const jsonArrayPattern = /\[.*?\]/gs;
+  let match: RegExpExecArray | null;
+  while ((match = jsonArrayPattern.exec(text)) !== null) {
+    try {
+      const parsed = JSON.parse(match[0]);
+      const items = Array.isArray(parsed) ? parsed : [parsed];
+      for (const item of items) processItem(item);
+    } catch {
+      // Not valid JSON
+    }
+  }
+  return {
+    services: [...new Set(services)],
+    incidentIds: [...new Set(incidentIds)],
+    statuses: [...new Set(statuses)],
+  };
+}
+
+function buildEvidence(
+  prompt: string,
+  incidents: string[],
+  services: string[],
+  toolResults: ParsedToolResult[] = [],
+): string[] {
+  const evidence: string[] = [];
+
+  // Build evidence from parsed tool result data when available
+  for (const tr of toolResults) {
+    if (tr.tool.includes("incident") && Array.isArray(tr.data)) {
+      for (const item of tr.data) {
+        if (typeof item !== "object" || item === null) continue;
+        const obj = item as Record<string, unknown>;
+        const parts = [`Incident ${obj.id ?? "unknown"}`];
+        if (obj.status) parts.push(`status=${obj.status}`);
+        if (obj.severity) parts.push(`severity=${obj.severity}`);
+        if (obj.service) parts.push(`service=${obj.service}`);
+        evidence.push(`${parts.join(", ")}.`);
+      }
+    }
+    if (tr.tool.includes("alert") && Array.isArray(tr.data)) {
+      const count = tr.data.length;
+      if (count > 0) evidence.push(`${count} alert(s) found in the requested window.`);
+    }
+    if (tr.tool.includes("deployment") && Array.isArray(tr.data)) {
+      for (const item of tr.data) {
+        if (typeof item !== "object" || item === null) continue;
+        const obj = item as Record<string, unknown>;
+        if (obj.id || obj.service) {
+          evidence.push(`Deployment ${obj.id ?? ""} for ${obj.service ?? "unknown service"} detected.`);
+        }
+      }
+    }
+  }
+
+  // Fall back to regex-based evidence when no tool results are parsed
+  if (evidence.length === 0) {
+    if (incidents.length > 0) {
+      evidence.push(`Investigated incident ${incidents[0]} from tool output.`);
+    }
+    if (services.length > 0) {
+      evidence.push(`Observed service scope: ${services[0]}.`);
+    }
+  }
+
+  if (/error|timeout|500/i.test(prompt)) {
+    evidence.push("Logs indicate errors or timeouts in the requested window.");
+  }
+  if (/latency|cpu|memory|metric/i.test(prompt)) {
+    evidence.push("Metrics were included in the evidence used for synthesis.");
+  }
+
+  return evidence.slice(0, 6);
+}
+
+function buildOrchestrationPlanName(
+  toolResults: ParsedToolResult[],
+  planId: string,
+): string {
+  for (const tr of toolResults) {
+    if (!tr.tool.includes("orchestration")) continue;
+    const items = Array.isArray(tr.data) ? tr.data : [tr.data];
+    for (const item of items) {
+      if (typeof item !== "object" || item === null) continue;
+      const obj = item as Record<string, unknown>;
+      if (String(obj.id).toLowerCase() === planId) {
+        const name = obj.title ?? obj.name ?? obj.displayName;
+        if (typeof name === "string" && name) return name;
+      }
+    }
+  }
+  return "Recommended mitigation plan";
+}
+
+function buildOrchestrationPlanReason(
+  services: string[],
+  incidents: string[],
+): string {
+  if (services.length > 0 && incidents.length > 0) {
+    return `Targets ${services[0]} where ${incidents[0]} is active.`;
+  }
+  if (services.length > 0) {
+    return `Mitigates operational issues observed in ${services[0]}.`;
+  }
+  return "It aligns with the incident signals already collected.";
+}
+
+function buildConclusion(
+  prompt: string,
+  services: string[],
+  incidents: string[],
+  orchestrationPlans: string[],
+  toolResults: ParsedToolResult[] = [],
+): string {
+  const serviceText = services[0] ?? "the relevant service";
+  const runbookText =
+    orchestrationPlans.length > 0
+      ? ` Recommended Action: Run orchestration plan ${orchestrationPlans[0]}.`
+      : "";
+
+  // Build a richer incident summary from parsed tool results
+  let incidentSummary = "";
+  for (const tr of toolResults) {
+    if (!tr.tool.includes("incident") || !Array.isArray(tr.data)) continue;
+    for (const item of tr.data) {
+      if (typeof item !== "object" || item === null) continue;
+      const obj = item as Record<string, unknown>;
+      const id = obj.id ?? "unknown";
+      const status = obj.status ?? "unknown";
+      const severity = obj.severity;
+      incidentSummary = severity
+        ? `Incident ${id} (${severity}, ${status}) appears central to the issue.`
+        : `Incident ${id} (${status}) appears central to the issue.`;
+      break; // Use the first incident for the summary
+    }
+  }
+  if (!incidentSummary && incidents.length > 0) {
+    incidentSummary = `Incident ${incidents[0]} appears central to the issue.`;
+  }
+
+  if (/latency|cpu|memory|metric/i.test(prompt)) {
+    return `${serviceText} shows operational signals worth investigating further. ${incidentSummary}${runbookText}`.trim();
+  }
+
+  if (/service|incident|alert|log/i.test(prompt)) {
+    return `${serviceText} has enough collected evidence for a preliminary assessment. ${incidentSummary}${runbookText}`.trim();
+  }
+
+  return `This is a synthesized mock answer based on the provided tool results.${runbookText}`.trim();
+}
+
+function estimateConfidence(prompt: string, evidenceCount: number): number {
+  let confidence = 0.62 + evidenceCount * 0.08;
+  if (/missing|unknown|no data/i.test(prompt)) confidence -= 0.1;
+  if (/incident|error|latency|cpu|memory|alert/i.test(prompt)) confidence += 0.05;
+  return Math.max(0.35, Math.min(0.93, Number(confidence.toFixed(2))));
 }
diff --git a/tests/copilotEngine.followups.test.ts b/tests/copilotEngine.followups.test.ts
index e73fb5c..f88e2a9 100644
--- a/tests/copilotEngine.followups.test.ts
+++ b/tests/copilotEngine.followups.test.ts
@@ -107,12 +107,16 @@ test('emits references instead of links with ids and ranges for console', async
     async listTools() {
       console.log('StubMcp listTools called');
       return [
+        { name: 'describe-metrics' } as Tool,
         { name: 'query-logs' } as Tool,
         { name: 'query-metrics' } as Tool,
       ];
     },
     async callTool(call) {
       calls.push(call);
+      if (call.name === 'describe-metrics') {
+        return { name: call.name, result: ['latency_p95'] };
+      }
       if (call.name === 'query-logs') {
         return { name: call.name, result: [{ id: 'log-1', message: 'error' }] } as ToolResult;
       }
@@ -189,12 +193,16 @@ test('drills into incident timelines/logs/metrics when user asks for root cause'
       return [
         { name: 'query-incidents' } as Tool,
         { name: 'get-incident-timeline' } as Tool,
+        { name: 'describe-metrics' } as Tool,
         { name: 'query-logs' } as Tool,
         { name: 'query-metrics' } as Tool,
       ];
     },
     async callTool(call): Promise<ToolResult> {
       calls.push(call);
+      if (call.name === 'describe-metrics') {
+        return { name: call.name, result: ['cpu_usage'] };
+      }
       if (call.name === 'query-incidents') {
         return {
           name: call.name,
diff --git a/tests/copilotEngine.planning.test.ts b/tests/copilotEngine.planning.test.ts
index d001972..4ff2cdb 100644
--- a/tests/copilotEngine.planning.test.ts
+++ b/tests/copilotEngine.planning.test.ts
@@ -346,6 +346,9 @@ test('validates invalid LLM calls and provides heuristic fallback', async () =>
   const mcp: StubMcp = {
     async listTools() {
       return [
+        {
+          name: 'describe-metrics',
+        } as Tool,
         {
           name: 'query-metrics',
           inputSchema: {
@@ -367,6 +370,9 @@ test('validates invalid LLM calls and provides heuristic fallback', async () =>
     },
     async callTool(call) {
       calls.push(call);
+      if (call.name === 'describe-metrics') {
+        return { name: call.name, result: ['cpu_usage'] };
+      }
       return { name: call.name, result: { metrics: [] } };
     },
   };
@@ -374,15 +380,13 @@ test('validates invalid LLM calls and provides heuristic fallback', async () =>
   const engine = makeEngine(llm, mcp);
   await engine.answer('check cpu metrics');
 
-  // New behavior: Invalid LLM call is caught, but heuristics inject a valid fallback
-  console.log('DEBUG: calls length:', calls.length);
-  if (calls.length > 0) console.log('DEBUG: first call:', calls[0]);
-
-  assert.equal(calls.length, 1, 'heuristics should inject valid query-metrics call after filtering invalid LLM call');
-  assert.equal(calls[0].name, 'query-metrics');
+  // describe-metrics runs first (replacing the invalid query-metrics call),
+  // then heuristics inject a valid query-metrics call
+  const metricsCall = calls.find(c => c.name === 'query-metrics');
+  assert.ok(metricsCall, 'heuristics should inject valid query-metrics call after filtering invalid LLM call');
 
   // The heuristic-injected call should have all required fields
-  const args = calls[0].arguments as JsonObject;
+  const args = metricsCall!.arguments as JsonObject;
   assert.ok(args.expression, 'should have expression field');
   assert.ok(typeof args.step === 'number', 'should have step field');
   assert.ok(args.start, 'should have start field');
@@ -458,8 +462,15 @@ test('filters out internal diagnostic tools from LLM visibility', async () => {
 
 test('adds default logs and metrics calls when user explicitly asks for them', async () => {
   const llm: LlmClient = {
-    async chat(_messages: LlmMessage[] = [], tools: Tool[] = [], _opts?: { chatId?: string }) {
+    async chat(messages: LlmMessage[], tools: Tool[], _opts?: { chatId?: string }) {
       if (tools.length) {
+        // On follow-up, if describe-metrics results are visible, plan query-metrics
+        const hasDescribeMetricsResult = messages.some(m =>
+          m.role === 'user' && typeof m.content === 'string' && m.content.includes('describe-metrics')
+        );
+        if (hasDescribeMetricsResult) {
+          return { content: 'plan', toolCalls: [{ name: 'query-metrics', arguments: { expression: { metricName: 'cpu_usage' }, step: 60 } }], chatId: 'conv-default-logs' };
+        }
         return { content: 'plan', toolCalls: [], chatId: 'conv-default-logs' };
       }
       return { content: JSON.stringify({ conclusion: 'done' }), toolCalls: [], chatId: 'conv-default-logs' };
@@ -469,11 +480,14 @@ test('adds default logs and metrics calls when user explicitly asks for them', a
   const calls: ToolCall[] = [];
   const mcp: StubMcp = {
     async listTools() {
-      return [{ name: 'query-logs' } as Tool, { name: 'query-metrics' } as Tool];
+      return [{ name: 'describe-metrics' } as Tool, { name: 'query-logs' } as Tool, { name: 'query-metrics' } as Tool];
     },
     async callTool(call) {
       console.log('Test callTool:', call.name);
       calls.push(call);
+      if (call.name === 'describe-metrics') {
+        return { name: call.name, result: ['cpu_usage'] };
+      }
       return { name: call.name, result: { ok: true } };
     },
   };
@@ -483,9 +497,9 @@ test('adds default logs and metrics calls when user explicitly asks for them', a
 
   console.log('Calls length:', calls.length);
   console.log('Calls:', calls.map(c => c.name));
-  assert.equal(calls.length, 2);
-  const toolNames = calls.map(c => c.name).sort();
-  assert.deepEqual(toolNames, ['query-logs', 'query-metrics']);
+  const toolNames = calls.map(c => c.name);
+  assert.ok(toolNames.includes('query-logs'), 'should call query-logs');
+  assert.ok(toolNames.includes('query-metrics'), 'should call query-metrics');
 
   const logsCall = calls.find(c => c.name === 'query-logs');
   const metricsCall = calls.find(c => c.name === 'query-metrics');
diff --git a/tests/correlationDetector.test.ts b/tests/correlationDetector.test.ts
index 5e4a5ad..e5235dc 100644
--- a/tests/correlationDetector.test.ts
+++ b/tests/correlationDetector.test.ts
@@ -8,13 +8,15 @@ test('CorrelationDetector: extracts events from tool results', () => {
   const results: ToolResult[] = [
     {
       name: 'query-logs',
-      result: [
-        { timestamp: '2024-01-01T10:00:00Z', message: 'error' },
-        { timestamp: '2024-01-01T10:00:01Z', message: 'error' },
-        { timestamp: '2024-01-01T10:00:02Z', message: 'error' },
-        { timestamp: '2024-01-01T10:00:03Z', message: 'error' },
-        { timestamp: '2024-01-01T10:00:04Z', message: 'error' },
-      ],
+      result: {
+        entries: [
+          { timestamp: '2024-01-01T10:00:00Z', message: 'error', severity: 'error' },
+          { timestamp: '2024-01-01T10:00:01Z', message: 'error', severity: 'error' },
+          { timestamp: '2024-01-01T10:00:02Z', message: 'error', severity: 'error' },
+          { timestamp: '2024-01-01T10:00:03Z', message: 'error', severity: 'error' },
+          { timestamp: '2024-01-01T10:00:04Z', message: 'error', severity: 'error' },
+        ]
+      },
     },
   ];
 
diff --git a/tests/engine/handlers/incident/referenceHandler.test.ts b/tests/engine/handlers/incident/referenceHandler.test.ts
index e50dcaf..2ad215a 100644
--- a/tests/engine/handlers/incident/referenceHandler.test.ts
+++ b/tests/engine/handlers/incident/referenceHandler.test.ts
@@ -1,128 +1,81 @@
 import assert from 'node:assert/strict';
 import { test } from 'node:test';
 import { incidentReferenceHandler } from '../../../../src/engine/handlers/incident/referenceHandler.js';
-import { HandlerContext } from '../../../../src/types.js';
+import { HandlerContext, ToolResult } from '../../../../src/types.js';
 
 test('incidentReferenceHandler', async (t) => {
-    const context: HandlerContext = {
+    const createContext = (toolResults: ToolResult[] = []): HandlerContext => ({
         chatId: 'test',
         turnNumber: 1,
         conversationHistory: [],
-        toolResults: [],
+        toolResults,
         userQuestion: 'test'
-    };
+    });
 
     await t.test('resolves ID from recent tool result', async () => {
-        const testContext = {
-            ...context,
-            conversationHistory: [{
-                role: 'assistant',
-                content: '',
-                userMessage: '',
-                timestamp: Date.now(),
-                toolResults: [{
-                    name: 'get-incident',
-                    result: { id: 'INC-123', title: 'Test' },
-                    arguments: {}
-                }]
-            }]
-        };
+        const toolResults: ToolResult[] = [{
+            name: 'get-incident',
+            result: { id: 'INC-123', title: 'Test' },
+            arguments: {}
+        }];
+        const context = createContext(toolResults);
 
-        const result = await incidentReferenceHandler(testContext, '');
+        const result = await incidentReferenceHandler(context, 'that incident');
         assert.equal(result, 'INC-123');
     });
 
     await t.test('resolves ID from query array result', async () => {
-        const testContext = {
-            ...context,
-            conversationHistory: [{
-                role: 'assistant',
-                content: '',
-                userMessage: '',
-                timestamp: Date.now(),
-                toolResults: [{
-                    name: 'query-incidents',
-                    result: [
-                        { id: 'INC-111' },
-                        { id: 'INC-222' } // 222 is last in array, but prominence is equal
-                    ],
-                    arguments: {}
-                }]
-            }]
-        };
+        const toolResults: ToolResult[] = [{
+            name: 'query-incidents',
+            result: [
+                { id: 'INC-111' },
+                { id: 'INC-222' }
+            ],
+            arguments: {}
+        }];
+        const context = createContext(toolResults);
 
-        const result = await incidentReferenceHandler(testContext, '');
-        assert.equal(result, 'INC-111'); // First one pushed is index 0. logic?
-        // Actually code pushes all.
-        // sort by recency and prominence.
-        // timestamp is same for all in one turn.
-        // stable sort? 
-        // incidentEntities[0] is returned.
-        // It pushes in order. sorting might keep order if equal?
-        // Let's see what happens.
+        const result = await incidentReferenceHandler(context, 'that incident');
+        assert.equal(result, 'INC-111'); // First one in array
     });
 
     await t.test('refines using variable in query', async () => {
-        const testContext = {
-            ...context,
-            conversationHistory: [{
-                role: 'assistant',
-                content: '',
-                userMessage: '',
-                timestamp: Date.now(),
-                toolResults: [{
-                    name: 'query-incidents',
-                    result: [
-                        { id: 'INC-ABC' },
-                        { id: 'INC-XYZ' }
-                    ],
-                    arguments: {}
-                }]
-            }]
-        };
+        const toolResults: ToolResult[] = [{
+            name: 'query-incidents',
+            result: [
+                { id: 'INC-ABC' },
+                { id: 'INC-XYZ' }
+            ],
+            arguments: {}
+        }];
+        const context = createContext(toolResults);
 
-        const result = await incidentReferenceHandler(testContext, 'incident INC-XYZ');
+        const result = await incidentReferenceHandler(context, 'incident INC-XYZ');
         assert.equal(result, 'INC-XYZ');
     });
 
     await t.test('returns null for mismatching object reference', async () => {
-        const testContext = {
-            ...context,
-            conversationHistory: [{
-                role: 'assistant',
-                content: '',
-                userMessage: '',
-                timestamp: Date.now(),
-                toolResults: [{
-                    name: 'get-incident',
-                    result: { id: 'INC-123' },
-                    arguments: {}
-                }]
-            }]
-        };
+        const toolResults: ToolResult[] = [{
+            name: 'get-incident',
+            result: { id: 'INC-123' },
+            arguments: {}
+        }];
+        const context = createContext(toolResults);
 
         // User asks about "that service" but context has incident
-        const result = await incidentReferenceHandler(testContext, 'show that service details');
+        const result = await incidentReferenceHandler(context, 'show that service details');
         assert.equal(result, null);
     });
 
     await t.test('extracts ID from tool arguments', async () => {
-        const testContext = {
-            ...context,
-            conversationHistory: [{
-                role: 'assistant',
-                content: '',
-                userMessage: '',
-                timestamp: Date.now(),
-                toolResults: [{
-                    name: 'get-incident',
-                    result: null, // failed result maybe
-                    arguments: { id: 'INC-FAILED' }
-                }]
-            }]
-        };
+        const toolResults: ToolResult[] = [{
+            name: 'get-incident',
+            result: null, // failed result maybe
+            arguments: { id: 'INC-FAILED' }
+        }];
+        const context = createContext(toolResults);
 
-        const result = await incidentReferenceHandler(testContext, '');
+        const result = await incidentReferenceHandler(context, 'that incident');
         assert.equal(result, 'INC-FAILED');
     });
 });
diff --git a/tests/engine/handlers/incident/scopeHandler.test.ts b/tests/engine/handlers/incident/scopeHandler.test.ts
index 67df2ab..0e1fd18 100644
--- a/tests/engine/handlers/incident/scopeHandler.test.ts
+++ b/tests/engine/handlers/incident/scopeHandler.test.ts
@@ -75,14 +75,13 @@ test('incidentScopeInferenceHandler', async (t) => {
         const testContext = {
             ...context,
             conversationHistory: [{
-                role: 'assistant',
-                content: '',
-                userMessage: '',
+                userMessage: 'previous question',
                 timestamp: Date.now(),
-                toolResults: [{
-                    name: 'query-incidents',
-                    arguments: {},
-                    result: [{ id: '1', service: 'legacy-api', metadata: { environment: 'staging' } }]
+                entities: [{
+                    type: 'service' as const,
+                    value: 'legacy-api',
+                    extractedAt: Date.now(),
+                    source: 'query-incidents'
                 }]
             }],
             toolResults: []
@@ -92,7 +91,6 @@ test('incidentScopeInferenceHandler', async (t) => {
 
         assert.ok(result);
         assert.equal(result?.service, 'legacy-api');
-        assert.equal(result?.environment, 'staging');
     });
 
     await t.test('ignores non-incident tools', async () => {
diff --git a/tests/engine/handlers/metric/followUpHandler.test.ts b/tests/engine/handlers/metric/followUpHandler.test.ts
index 0a58127..add382d 100644
--- a/tests/engine/handlers/metric/followUpHandler.test.ts
+++ b/tests/engine/handlers/metric/followUpHandler.test.ts
@@ -72,7 +72,7 @@ test('metricFollowUpHandler', async (t) => {
         // The previous file content shows: if (metricName.toLowerCase().includes("latency") || metricName.toLowerCase().includes("error"))
         // So cpu_usage should NOT trigger log query.
         // With enhanced LogQueryGenerator, cpu_usage should triggers 'cpu' related logs (and alerts)
-        assert.equal(suggestions.length, 2);
+        assert.equal(suggestions.length, 3);
         const logsSuggestion = suggestions.find(s => s.name === 'query-logs');
         assert.ok(logsSuggestion);
         const logsArgs = logsSuggestion.arguments as unknown as LogQueryArgs;
@@ -186,5 +186,58 @@ test('metricFollowUpHandler', async (t) => {
         const deploymentsSuggestion = suggestions.find(s => s.name === 'query-deployments');
         assert.ok(!deploymentsSuggestion, 'should NOT duplicate query-deployments');
     });
-});
 
+    await t.test('should turn describe-metrics into query-metrics for investigative requests', async () => {
+        const context: HandlerContext = {
+            ...baseContext,
+            userQuestion: 'find the root cause and check cpu metrics',
+            toolResults: [{
+                name: 'query-incidents',
+                arguments: { limit: 1 },
+                result: [{
+                    id: 'INC-200',
+                    service: 'svc-api',
+                    startTime: '2024-01-01T00:00:00Z',
+                    endTime: '2024-01-01T00:30:00Z',
+                }],
+            }],
+        };
+        const result: ToolResult = {
+            name: 'describe-metrics',
+            arguments: { scope: { service: 'svc-api' } },
+            result: ['cpu_usage', 'memory_usage'],
+        };
+
+        const suggestions = await metricFollowUpHandler(context, result);
+        const metricsSuggestion = suggestions.find(s => s.name === 'query-metrics');
+
+        assert.ok(metricsSuggestion, 'should suggest query-metrics after metric discovery');
+        const args = metricsSuggestion!.arguments as {
+            scope: { service: string };
+            expression: { metricName: string };
+            start: string;
+            end: string;
+            step: number;
+        };
+        assert.equal(args.scope.service, 'svc-api');
+        assert.equal(args.expression.metricName, 'cpu_usage');
+        assert.equal(args.step, 60);
+        assert.ok(Number.isFinite(Date.parse(args.start)));
+        assert.ok(Number.isFinite(Date.parse(args.end)));
+    });
+
+    await t.test('should not query metrics for discovery-only requests', async () => {
+        const context: HandlerContext = {
+            ...baseContext,
+            userQuestion: 'what metrics are available for svc-api',
+        };
+        const result: ToolResult = {
+            name: 'describe-metrics',
+            arguments: { scope: { service: 'svc-api' } },
+            result: ['cpu_usage', 'memory_usage'],
+        };
+
+        const suggestions = await metricFollowUpHandler(context, result);
+        assert.equal(suggestions.some(s => s.name === 'query-metrics'), false);
+    });
+});
diff --git a/tests/engine/handlers/team/referenceHandler.test.ts b/tests/engine/handlers/team/referenceHandler.test.ts
index 8a79ea1..84bd833 100644
--- a/tests/engine/handlers/team/referenceHandler.test.ts
+++ b/tests/engine/handlers/team/referenceHandler.test.ts
@@ -37,13 +37,13 @@ function createTurnWithToolsAndEntities(
 }
 
 test("teamReferenceHandler", async (t) => {
-    const createContext = (conversationHistory: ConversationTurn[] = []): HandlerContext =>
+    const createContext = (conversationHistory: ConversationTurn[] = [], toolResults: ToolResult[] = []): HandlerContext =>
         ({
             chatId: "chat",
             turnNumber: 1,
             userQuestion: "test",
             conversationHistory,
-            toolResults: [],
+            toolResults,
         }) as HandlerContext;
 
     await t.test("should return null when no team entities exist", async () => {
@@ -53,39 +53,39 @@ test("teamReferenceHandler", async (t) => {
     });
 
     await t.test("should extract team from query-teams tool result", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: [
                 { id: "team-velocity", name: "Velocity Team" },
                 { id: "team-platform", name: "Platform Team" }
             ]
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "that team");
         assert.equal(result, "team-velocity"); // Should return first/most prominent
     });
 
     await t.test("should extract team from get-team tool result", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "get-team",
             arguments: { id: "team-velocity" },
             result: { id: "team-velocity", name: "Velocity Team" }
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "this team");
         assert.equal(result, "team-velocity");
     });
 
     await t.test("should extract team from get-team-members tool arguments", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "get-team-members",
             arguments: { id: "team-velocity" },
             result: [{ id: "user1", name: "John Doe" }]
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "that team");
         assert.equal(result, "team-velocity");
@@ -103,15 +103,15 @@ test("teamReferenceHandler", async (t) => {
     });
 
     await t.test("should prioritize exact name matches in reference text", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: [
                 { id: "team-velocity", name: "Velocity Team" },
                 { id: "team-platform", name: "Platform Team" }
             ]
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "the platform team");
         assert.equal(result, "team-platform");
@@ -130,7 +130,14 @@ test("teamReferenceHandler", async (t) => {
             result: { id: "team-recent", name: "Recent Team" }
         }], [], Date.now());
 
-        const context = createContext([oldTurn, recentTurn]);
+        // Also add current tool results for immediate context
+        const toolResults: ToolResult[] = [{
+            name: "get-team",
+            arguments: {},
+            result: { id: "team-recent", name: "Recent Team" }
+        }];
+
+        const context = createContext([oldTurn, recentTurn], toolResults);
 
         const result = await teamReferenceHandler(context, "that team");
         assert.equal(result, "team-recent");
@@ -149,87 +156,87 @@ test("teamReferenceHandler", async (t) => {
     });
 
     await t.test("should extract team name from 'the velocity team' pattern", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: [
                 { id: "team-velocity", name: "velocity" },
                 { id: "team-platform", name: "platform" }
             ]
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "the velocity team");
         assert.equal(result, "team-velocity");
     });
 
     await t.test("should extract team name from 'velocity team' pattern", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: [{ id: "team-velocity", name: "velocity" }]
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "velocity team");
         assert.equal(result, "team-velocity");
     });
 
     await t.test("should extract team name from 'team velocity' pattern", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: [{ id: "team-velocity", name: "velocity" }]
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "team velocity");
         assert.equal(result, "team-velocity");
     });
 
     await t.test("should extract team name from 'team-velocity' pattern", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: [{ id: "team-velocity", name: "velocity" }]
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "team-velocity");
         assert.equal(result, "team-velocity");
     });
 
     await t.test("should return null for domain mismatch", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: [{ id: "team-velocity", name: "Velocity Team" }]
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "that service");
         assert.equal(result, null);
     });
 
     await t.test("should return null for incident reference", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: [{ id: "team-velocity", name: "Velocity Team" }]
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "that incident");
         assert.equal(result, null);
     });
 
     await t.test("should allow team reference even with other entity words if 'team' is present", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: [{ id: "team-velocity", name: "Velocity Team" }]
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "that team service");
         assert.equal(result, "team-velocity");
@@ -259,36 +266,36 @@ test("teamReferenceHandler", async (t) => {
     });
 
     await t.test("should handle case insensitive matching", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: [{ id: "team-velocity", name: "Velocity Team" }]
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "THE VELOCITY TEAM");
         assert.equal(result, "team-velocity");
     });
 
     await t.test("should handle teams with no name gracefully", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: [{ id: "team-velocity" }] // No name field
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "that team");
         assert.equal(result, "team-velocity");
     });
 
     await t.test("should handle invalid tool results gracefully", async () => {
-        const turn = createTurnWithToolsAndEntities([{
+        const toolResults: ToolResult[] = [{
             name: "query-teams",
             arguments: {},
             result: null
-        }]);
-        const context = createContext([turn]);
+        }];
+        const context = createContext([], toolResults);
 
         const result = await teamReferenceHandler(context, "that team");
         assert.equal(result, null);
diff --git a/tests/engine/handlers/ticket/followUpHandler.test.ts b/tests/engine/handlers/ticket/followUpHandler.test.ts
index d6c152d..77cd2c1 100644
--- a/tests/engine/handlers/ticket/followUpHandler.test.ts
+++ b/tests/engine/handlers/ticket/followUpHandler.test.ts
@@ -105,9 +105,11 @@ test('ticketFollowUpHandler', async (t) => {
             conversationHistory: [{
                 userMessage: 'previous question',
                 timestamp: Date.now() - 1000,
-                toolResults: [{
-                    name: 'query-tickets',
-                    result: [{ id: 'TICKET-1', title: 'Already seen ticket' }],
+                entities: [{
+                    type: 'ticket' as const,
+                    value: 'TICKET-1',
+                    extractedAt: Date.now() - 1000,
+                    source: 'query-tickets'
                 }]
             }]
         };
diff --git a/tests/engine/handlers/ticket/referenceHandler.test.ts b/tests/engine/handlers/ticket/referenceHandler.test.ts
index 0f81484..a98b6b1 100644
--- a/tests/engine/handlers/ticket/referenceHandler.test.ts
+++ b/tests/engine/handlers/ticket/referenceHandler.test.ts
@@ -2,98 +2,74 @@
 import assert from 'node:assert/strict';
 import { test } from 'node:test';
 import { ticketReferenceHandler } from '../../../../src/engine/handlers/ticket/referenceHandler.js';
-import { HandlerContext, ConversationTurn, ToolResult } from '../../../../src/types.js';
+import { HandlerContext, ToolResult } from '../../../../src/types.js';
 
 test('ticketReferenceHandler', async (t) => {
-    // Helper to create context with history
-    const createCtx = (toolResults: ToolResult[] = [], refText = ''): HandlerContext => ({
+    const createContext = (toolResults: ToolResult[] = []): HandlerContext => ({
         chatId: 'test',
         turnNumber: 1,
-        userQuestion: refText,
-        // The handler looks at conversationHistory, not just current turn results
-        // usually. But here the logic iterates specific turns.
-        conversationHistory: [
-            {
-                userMessage: 'prev question',
-                timestamp: 1000,
-                toolResults: toolResults
-            } as ConversationTurn
-        ],
-        toolResults: [], // current turn results
+        userQuestion: 'test',
+        conversationHistory: [],
+        toolResults,
     });
 
     await t.test('should return null if no tickets in history', async () => {
-        const ctx = createCtx();
+        const ctx = createContext();
         const ref = await ticketReferenceHandler(ctx, 'that ticket');
         assert.equal(ref, null);
     });
 
     await t.test('should return most recent ticket from history', async () => {
-
-        // Logic sorts by timestamp descending.
-        // In the same turn, prominence is 1.0 for all.
-        // But the handler implementation pushes them in order.
-        // Wait, the handler sorts:
-        // ticketEntities.sort((a, b) => ... b.timestamp - a.timestamp)
-        // If they have same timestamp (turn timestamp), it's stable sort or undefined order?
-        // Let's check implementation:
-        // timestamp: turn.timestamp || Date.now()
-        // If they are in the same turn, they have same timestamp.
-        // It's likely the order in array matters differently or they are equal.
-        // Actually, normally "most recent" implies time. 
-        // If logic doesn't distinguish intra-turn time, it might return the first one pushed?
-        // Let's test single ticket first to be sure.
-
-        const singleResult: ToolResult[] = [{
+        const toolResults: ToolResult[] = [{
             name: 'query-tickets',
             result: [{ id: 'TICKET-1', title: 'one' }]
         }];
-        const ctx1 = createCtx(singleResult);
-        const ref = await ticketReferenceHandler(ctx1, 'that ticket');
+        const ctx = createContext(toolResults);
+        const ref = await ticketReferenceHandler(ctx, 'that ticket');
         assert.equal(ref, 'TICKET-1');
     });
 
     await t.test('should resolve specific ticket ID in reference text', async () => {
-        const results: ToolResult[] = [{
+        const toolResults: ToolResult[] = [{
             name: 'query-tickets',
             result: [
                 { id: 'TICKET-A', title: 'A' },
                 { id: 'TICKET-B', title: 'B' }
             ]
         }];
-        const ctx = createCtx(results);
+        const ctx = createContext(toolResults);
         // User explicitly asks for B
         const ref = await ticketReferenceHandler(ctx, 'check ticket-b please');
         assert.equal(ref, 'TICKET-B');
     });
 
     await t.test('should return null if domain does not match (e.g. incident)', async () => {
-        const results: ToolResult[] = [{
+        const toolResults: ToolResult[] = [{
             name: 'query-tickets',
             result: [{ id: 'TICKET-1', title: 'one' }]
         }];
-        const ctx = createCtx(results);
+        const ctx = createContext(toolResults);
         const ref = await ticketReferenceHandler(ctx, 'show that incident');
         // "incident" in text -> mismatch unless "ticket" also in text
         assert.equal(ref, null);
     });
 
     await t.test('should still resolve if domain match (e.g. ticket)', async () => {
-        const results: ToolResult[] = [{
+        const toolResults: ToolResult[] = [{
             name: 'query-tickets',
             result: [{ id: 'TICKET-1', title: 'one' }]
         }];
-        const ctx = createCtx(results);
+        const ctx = createContext(toolResults);
         const ref = await ticketReferenceHandler(ctx, 'show that ticket');
         assert.equal(ref, 'TICKET-1');
     });
 
     await t.test('should handle get-ticket single result', async () => {
-        const results: ToolResult[] = [{
+        const toolResults: ToolResult[] = [{
             name: 'get-ticket',
             result: { id: 'TICKET-SINGLE', title: 'one' }
         }];
-        const ctx = createCtx(results);
+        const ctx = createContext(toolResults);
         const ref = await ticketReferenceHandler(ctx, 'details on this');
         assert.equal(ref, 'TICKET-SINGLE');
     });
diff --git a/tests/engine/planRefiner.test.ts b/tests/engine/planRefiner.test.ts
index 1ba4af5..a216cb5 100644
--- a/tests/engine/planRefiner.test.ts
+++ b/tests/engine/planRefiner.test.ts
@@ -356,7 +356,7 @@ test('ScopeInferer: uses conversation history for scope inference', async () =>
     const inferer = new ScopeInferer();
 
     // Simulate a follow-up question where previous turn had incident context
-    const conversationHistory: ConversationTurn[] = [createTurnWithTools([{
+    const turn = createTurnWithTools([{
         name: 'query-incidents',
         arguments: {},
         result: [{
@@ -365,7 +365,17 @@ test('ScopeInferer: uses conversation history for scope inference', async () =>
             status: 'open',
             severity: 'sev3'
         }]
-    }])];
+    }]);
+
+    // Explicitly add entities to simulate extraction from the tool result
+    turn.entities = [{
+        type: 'service',
+        value: 'svc-realtime',
+        source: 'test',
+        extractedAt: Date.now()
+    }];
+
+    const conversationHistory: ConversationTurn[] = [turn];
 
     const inference = await inferer.inferScope(
         'any alerts',  // follow-up question
diff --git a/tests/followUpEngine.test.ts b/tests/followUpEngine.test.ts
index a0be864..e5aea2d 100644
--- a/tests/followUpEngine.test.ts
+++ b/tests/followUpEngine.test.ts
@@ -11,17 +11,16 @@ test('FollowUpEngine', async (t) => {
         const results: ToolResult[] = [
             {
                 name: 'query-incidents',
-                result: { incidents: [] },
+                result: { incidents: [] }, // Empty incidents array
                 arguments: { service: 'payment-api' },
             },
         ];
 
-
-
         const refined = await engine.applyFollowUps(results, 'test-chat', [], 'Show incidents');
 
-        // Should not include duplicate
-        assert.strictEqual(refined.length, 0);
+        // Should not include duplicate query-incidents call since it was already executed
+        const duplicateIncidentCalls = refined.filter(call => call.name === 'query-incidents');
+        assert.strictEqual(duplicateIncidentCalls.length, 0);
     });
 
     await t.test('applyFollowUps generates follow-up suggestions for incidents', async () => {
diff --git a/tests/mockLlm.test.ts b/tests/mockLlm.test.ts
new file mode 100644
index 0000000..0201b89
--- /dev/null
+++ b/tests/mockLlm.test.ts
@@ -0,0 +1,256 @@
+import assert from "node:assert/strict";
+import test from "node:test";
+import { MockLlm } from "../src/llms/mock.js";
+import { LlmMessage, Tool } from "../src/types.js";
+import {
+  buildFinalAnswerPrompt,
+  buildJsonPlannerPrompt,
+  buildPlannerPrompt,
+  buildRefinementPrompt,
+  buildToolContext,
+} from "../src/prompts.js";
+
+test("mock llm creates concrete multi-tool plans for broad investigations", async () => {
+  const llm = new MockLlm();
+  const tools: Tool[] = [
+    { name: "query-incidents" },
+    { name: "query-logs" },
+    { name: "describe-metrics" },
+    { name: "query-metrics" },
+    { name: "query-orchestration-plans" },
+  ];
+  const messages: LlmMessage[] = [
+    { role: "system", content: buildPlannerPrompt(buildToolContext(tools)) },
+    {
+      role: "user",
+      content: "Investigate payments latency and errors from the last 30 minutes",
+    },
+  ];
+
+  const response = await llm.chat(messages, tools);
+
+  assert.ok((response.toolCalls?.length ?? 0) >= 3);
+  assert.equal(response.toolCalls?.[0]?.name, "describe-metrics");
+  assert.ok(response.toolCalls?.some((call) => call.name === "query-logs"));
+  assert.ok(
+    response.toolCalls?.some((call) => call.name === "query-orchestration-plans"),
+  );
+
+  const logsCall = response.toolCalls?.find((call) => call.name === "query-logs");
+  assert.equal(typeof logsCall?.arguments.start, "string");
+  assert.equal(typeof logsCall?.arguments.end, "string");
+  assert.match(String(logsCall?.arguments.start), /^\d{4}-\d{2}-\d{2}T/);
+});
+
+test("mock llm emits parseable JSON plans in json planning mode", async () => {
+  const llm = new MockLlm();
+  const messages: LlmMessage[] = [
+    {
+      role: "system",
+      content: buildJsonPlannerPrompt(
+        ["query-incidents", "query-logs", "query-alerts"].map((name) => `- ${name}`).join("\n"),
+      ),
+    },
+    {
+      role: "user",
+      content: "User request: Show recent incidents and related logs for checkout\nReturn only JSON.",
+    },
+  ];
+
+  const response = await llm.chat(messages, []);
+  const parsed = JSON.parse(response.content) as {
+    reasoning: string;
+    toolCalls: Array<{ name: string; arguments: Record<string, unknown> }>;
+  };
+
+  assert.equal(typeof parsed.reasoning, "string");
+  assert.ok(parsed.toolCalls.length >= 2);
+  assert.ok(parsed.toolCalls.some((call) => call.name === "query-incidents"));
+  assert.ok(parsed.toolCalls.some((call) => call.name === "query-logs"));
+});
+
+test("mock llm suggests follow-up tools from prior results", async () => {
+  const llm = new MockLlm();
+  const tools: Tool[] = [
+    { name: "query-incidents" },
+    { name: "query-logs" },
+    { name: "describe-metrics" },
+    { name: "query-metrics" },
+    { name: "query-alerts" },
+  ];
+  const messages: LlmMessage[] = [
+    { role: "system", content: buildRefinementPrompt(buildToolContext(tools), 1) },
+    {
+      role: "user",
+      content:
+        "Question: Investigate payments incident\n" +
+        "Tool results (count=1):\n" +
+        "query-incidents returned [{\"id\":\"inc-123\",\"service\":\"payments\"}]\n" +
+        "Plan follow-up tool calls with concrete arguments.",
+    },
+  ];
+
+  const response = await llm.chat(messages, tools);
+
+  assert.ok((response.toolCalls?.length ?? 0) >= 1);
+  assert.ok(response.toolCalls?.some((call) => call.name === "query-logs"));
+  assert.ok(
+    response.toolCalls?.some(
+      (call) => call.name === "describe-metrics" || call.name === "query-metrics",
+    ),
+  );
+});
+
+test("mock llm returns structured synthesis output with references", async () => {
+  const llm = new MockLlm();
+  const messages: LlmMessage[] = [
+    { role: "system", content: buildFinalAnswerPrompt() },
+    {
+      role: "user",
+      content:
+        "Question: Investigate payments incident\n" +
+        "Tool results:\n" +
+        "- query-incidents => [{\"id\":\"inc-123\",\"service\":\"payments\"}]\n" +
+        "- query-orchestration-plans => [{\"id\":\"plan-42\",\"name\":\"payments recovery\"}]",
+    },
+  ];
+
+  const response = await llm.chat(messages, []);
+  const parsed = JSON.parse(response.content) as {
+    conclusion: string;
+    evidence: string[];
+    references: { incidents?: string[]; services?: string[]; orchestrationPlans?: string[] };
+    actions: Array<{ type: string; id?: string }>;
+    confidence: number;
+  };
+
+  assert.match(parsed.conclusion, /payments|plan-42|incident/i);
+  assert.ok(parsed.evidence.length >= 1);
+  assert.deepEqual(parsed.references.incidents, ["inc-123"]);
+  assert.ok(parsed.references.orchestrationPlans?.includes("plan-42"));
+  assert.equal(parsed.actions[0]?.type, "orchestration_plan");
+  assert.equal(parsed.confidence > 0.6, true);
+});
+
+test("mock llm synthesis parses structured tool results for richer evidence", async () => {
+  const llm = new MockLlm();
+  const messages: LlmMessage[] = [
+    { role: "system", content: buildFinalAnswerPrompt() },
+    {
+      role: "user",
+      content:
+        "Question: Investigate payments incident\n" +
+        "Tool Results:\n" +
+        'query-incidents: [{"id":"inc-456","service":"payments","status":"active","severity":"sev1"}]\n' +
+        '- query-orchestration-plans => [{"id":"plan-99","name":"payments recovery"}]',
+    },
+  ];
+
+  const response = await llm.chat(messages, []);
+  const parsed = JSON.parse(response.content) as {
+    conclusion: string;
+    evidence: string[];
+    references: { incidents?: string[]; services?: string[]; orchestrationPlans?: string[] };
+    actions: Array<{ type: string; id?: string; name?: string; reason?: string }>;
+  };
+
+  // Evidence should reference actual data from tool results
+  assert.ok(parsed.evidence.some((e) => e.includes("inc-456")));
+  assert.ok(parsed.evidence.some((e) => e.includes("status=active") || e.includes("severity=sev1")));
+  // Services should be discovered from tool result JSON
+  assert.ok(parsed.references.services?.includes("payments"));
+  // Orchestration action should use the plan name from tool results
+  assert.ok(parsed.actions.length > 0);
+  assert.equal(parsed.actions[0]?.name, "payments recovery");
+  // Conclusion should include richer incident context
+  assert.match(parsed.conclusion, /inc-456|sev1|active/i);
+});
+
+test("mock llm refinement extracts service from prior tool results", async () => {
+  const llm = new MockLlm();
+  const tools: Tool[] = [
+    { name: "query-incidents" },
+    { name: "query-logs" },
+    { name: "describe-metrics" },
+    { name: "query-metrics" },
+    { name: "query-alerts" },
+    { name: "query-deployments" },
+    { name: "query-teams" },
+  ];
+  const messages: LlmMessage[] = [
+    { role: "system", content: buildRefinementPrompt(buildToolContext(tools), 1) },
+    {
+      role: "user",
+      content:
+        "Question: Investigate incident\n" +
+        "Tool results (count=1):\n" +
+        'query-incidents returned [{"id":"inc-789","service":"checkout","status":"active"}]\n' +
+        "Plan follow-up tool calls with concrete arguments.",
+    },
+  ];
+
+  const response = await llm.chat(messages, tools);
+
+  // Follow-up should scope to the discovered service "checkout"
+  const logCall = response.toolCalls?.find((c) => c.name === "query-logs");
+  assert.ok(logCall, "Should suggest query-logs follow-up");
+  assert.deepEqual(logCall?.arguments.scope, { service: "checkout" });
+
+  // Should add deployment follow-up when incident data is present
+  assert.ok(
+    response.toolCalls?.some((c) => c.name === "query-deployments"),
+    "Should suggest query-deployments follow-up",
+  );
+
+  // Should add teams follow-up when service is discovered
+  assert.ok(
+    response.toolCalls?.some((c) => c.name === "query-teams"),
+    "Should suggest query-teams follow-up for discovered service",
+  );
+});
+
+test("mock llm handles status/health queries", async () => {
+  const llm = new MockLlm();
+  const tools: Tool[] = [
+    { name: "query-incidents" },
+    { name: "query-alerts" },
+    { name: "describe-metrics" },
+    { name: "query-orchestration-plans" },
+  ];
+  const messages: LlmMessage[] = [
+    { role: "system", content: buildPlannerPrompt(buildToolContext(tools)) },
+    {
+      role: "user",
+      content: "What is the status of the payments service?",
+    },
+  ];
+
+  const response = await llm.chat(messages, tools);
+
+  // Status queries should trigger incidents, alerts, and metrics checks
+  assert.ok(response.toolCalls?.some((c) => c.name === "query-incidents"));
+  assert.ok(response.toolCalls?.some((c) => c.name === "query-alerts"));
+  assert.ok(response.toolCalls?.some((c) => c.name === "describe-metrics"));
+});
+
+test("mock llm handles change detection queries", async () => {
+  const llm = new MockLlm();
+  const tools: Tool[] = [
+    { name: "query-incidents" },
+    { name: "query-deployments" },
+    { name: "query-logs" },
+  ];
+  const messages: LlmMessage[] = [
+    { role: "system", content: buildPlannerPrompt(buildToolContext(tools)) },
+    {
+      role: "user",
+      content: "What changed in the last hour for the checkout service?",
+    },
+  ];
+
+  const response = await llm.chat(messages, tools);
+
+  // Change queries should prioritize deployments
+  assert.ok(response.toolCalls?.some((c) => c.name === "query-deployments"));
+  assert.ok(response.toolCalls?.some((c) => c.name === "query-incidents"));
+});
diff --git a/tests/referenceResolver.test.ts b/tests/referenceResolver.test.ts
index e9d759b..1213d93 100644
--- a/tests/referenceResolver.test.ts
+++ b/tests/referenceResolver.test.ts
@@ -11,24 +11,27 @@ test('ReferenceResolver: resolves "that incident" reference', async () => {
     entities: new Map(),
   };
 
+  // Create a simple conversation history with the new format
   const conversationHistory = [
     {
       userMessage: 'show me incidents',
       assistantResponse: 'Here are the incidents',
-      toolResults: [
+      timestamp: Date.now(),
+      entities: [
         {
-          name: 'query-incidents',
-          result: [
-            { id: 'INC-999', title: 'Test incident' }
-          ]
+          type: 'incident' as const,
+          value: 'INC-999',
+          prominence: 1.0,
+          extractedAt: Date.now(),
+          source: 'query-incidents'
         }
-      ],
-      timestamp: Date.now()
+      ]
     }
   ];
 
   const resolutions = await resolver.resolveReferences('What caused that incident?', context, conversationHistory);
 
+  // The resolver should identify the reference pattern and resolve it using entities
   assert.ok(resolutions.has('that incident'));
   assert.equal(resolutions.get('that incident'), 'INC-999');
 });
@@ -44,15 +47,16 @@ test('ReferenceResolver: resolves "this service" reference', async () => {
     {
       userMessage: 'show me services',
       assistantResponse: 'Here are the services',
-      toolResults: [
+      timestamp: Date.now(),
+      entities: [
         {
-          name: 'query-services',
-          result: [
-            { name: 'payment-api', status: 'healthy' }
-          ]
+          type: 'service' as const,
+          value: 'payment-api',
+          prominence: 1.0,
+          extractedAt: Date.now(),
+          source: 'query-services'
         }
-      ],
-      timestamp: Date.now()
+      ]
     }
   ];
 
@@ -74,15 +78,16 @@ test('ReferenceResolver: resolves "since then" time reference', async () => {
     {
       userMessage: 'show me incident timeline',
       assistantResponse: 'Here is the timeline',
-      toolResults: [
+      timestamp: Date.now(),
+      entities: [
         {
-          name: 'get-incident-timeline',
-          result: [
-            { at: baseTime, kind: 'incident started', body: 'Incident began' }
-          ]
+          type: 'timestamp' as const,
+          value: baseTime,
+          prominence: 1.0,
+          extractedAt: Date.now(),
+          source: 'get-incident-timeline'
         }
-      ],
-      timestamp: Date.now()
+      ]
     }
   ];
 
@@ -128,28 +133,30 @@ test('ReferenceResolver: returns most recent entity when multiple exist', async
     {
       userMessage: 'show me incidents',
       assistantResponse: 'Here are the incidents',
-      toolResults: [
+      timestamp: now - 1000,
+      entities: [
         {
-          name: 'query-incidents',
-          result: [
-            { id: 'INC-100', title: 'Old incident' }
-          ]
+          type: 'incident' as const,
+          value: 'INC-100',
+          prominence: 1.0,
+          extractedAt: now - 1000,
+          source: 'query-incidents'
         }
-      ],
-      timestamp: now - 1000
+      ]
     },
     {
       userMessage: 'show me more incidents',
       assistantResponse: 'Here are more',
-      toolResults: [
+      timestamp: now,
+      entities: [
         {
-          name: 'query-incidents',
-          result: [
-            { id: 'INC-200', title: 'Recent incident' }
-          ]
+          type: 'incident' as const,
+          value: 'INC-200',
+          prominence: 1.0,
+          extractedAt: now,
+          source: 'query-incidents'
         }
-      ],
-      timestamp: now
+      ]
     }
   ];
 
@@ -182,15 +189,16 @@ test('ReferenceResolver: handles "before that" time reference', async () => {
     {
       userMessage: 'show me incident timeline',
       assistantResponse: 'Here is the timeline',
-      toolResults: [
+      timestamp: Date.now(),
+      entities: [
         {
-          name: 'get-incident-timeline',
-          result: [
-            { at: baseTime, kind: 'incident started', body: 'Incident began' }
-          ]
+          type: 'timestamp' as const,
+          value: baseTime,
+          prominence: 1.0,
+          extractedAt: Date.now(),
+          source: 'get-incident-timeline'
         }
-      ],
-      timestamp: Date.now()
+      ]
     }
   ];
 
@@ -217,28 +225,30 @@ test('ReferenceResolver: handles multiple references in one question', async ()
     {
       userMessage: 'show me incidents',
       assistantResponse: 'Here they are',
-      toolResults: [
+      timestamp: Date.now(),
+      entities: [
         {
-          name: 'query-incidents',
-          result: [
-            { id: 'INC-999', title: 'Critical incident' }
-          ]
+          type: 'incident' as const,
+          value: 'INC-999',
+          prominence: 1.0,
+          extractedAt: Date.now(),
+          source: 'query-incidents'
         }
-      ],
-      timestamp: Date.now()
+      ]
     },
     {
       userMessage: 'and services',
       assistantResponse: 'Here are services',
-      toolResults: [
+      timestamp: Date.now(),
+      entities: [
         {
-          name: 'query-services',
-          result: [
-            { name: 'payment-api', status: 'healthy' }
-          ]
+          type: 'service' as const,
+          value: 'payment-api',
+          prominence: 1.0,
+          extractedAt: Date.now(),
+          source: 'query-services'
         }
-      ],
-      timestamp: Date.now()
+      ]
     }
   ];
 
@@ -261,29 +271,42 @@ test('ReferenceResolver: uses prominence as tiebreaker when timestamps are equal
     entities: new Map(),
   };
 
-  // Test that when we have multiple incidents in one result, we get one of them
+  // Test that when we have multiple incidents in one result, we get the most prominent one
   const conversationHistory = [
     {
       userMessage: 'show me incidents',
       assistantResponse: 'Here are the incidents',
-      toolResults: [
+      timestamp: now,
+      entities: [
         {
-          name: 'query-incidents',
-          result: [
-            { id: 'inc-002', title: 'Minor incident' },
-            { id: 'inc-005', title: 'Major incident' },
-            { id: 'inc-008', title: 'Medium incident' }
-          ]
+          type: 'incident' as const,
+          value: 'inc-002',
+          prominence: 0.5,
+          extractedAt: now,
+          source: 'query-incidents'
+        },
+        {
+          type: 'incident' as const,
+          value: 'inc-005',
+          prominence: 0.9, // Highest prominence
+          extractedAt: now,
+          source: 'query-incidents'
+        },
+        {
+          type: 'incident' as const,
+          value: 'inc-008',
+          prominence: 0.7,
+          extractedAt: now,
+          source: 'query-incidents'
         }
-      ],
-      timestamp: now
+      ]
     }
   ];
 
   const resolutions = await resolver.resolveReferences('tell me more about that incident', context, conversationHistory);
 
   assert.ok(resolutions.has('that incident'));
-  //  Should pick one of the incidents (handlers return the first one found)
+  // Should pick the incident with highest prominence
   const resolved = resolutions.get('that incident');
-  assert.ok(resolved === 'inc-002' || resolved === 'inc-005' || resolved === 'inc-008');
+  assert.equal(resolved, 'inc-005');
 });