OpsOrch Copilot

OpsOrch Copilot is the AI runtime for OpsOrch. It plans tool calls against opsorch-mcp, gathers evidence, and returns structured answers for the Console UI and other clients.

Copilot never talks to OpsOrch Core directly. It only uses the MCP tools layer.

Status

License: Apache-2.0
Runtime: Node.js 20+
Transport: HTTP API
LLM providers: mock, openai, anthropic, gemini

Quick Start

Prerequisites

Node.js 20+
Running opsorch-core instance (port 8080)
Running opsorch-mcp instance (port 7070)

Installation and Startup

cd opsorch-copilot
npm install

# Start with mock LLM (no API key required)
MCP_URL=http://localhost:7070/mcp \
LLM_PROVIDER=mock \
npm run dev

The server will start on http://localhost:6060.

Verify Installation

Health check:

curl http://localhost:6060/health

Expected response: {"status":"ok"}

Make Your First Request

curl http://localhost:6060/chat \
  -H 'Content-Type: application/json' \
  -d '{"message":"What incidents are active right now?"}'

The response includes:

chatId – Conversation identifier for follow-up questions
name – Auto-generated conversation name
answer – Structured answer with conclusion, evidence, and references

Configuration

Core Runtime Settings

Variable	Default	Description
`PORT`	`6060`	HTTP port for the Copilot API
`MCP_URL`	`http://localhost:7070/mcp`	MCP endpoint URL
`LLM_PROVIDER`	`mock`	LLM provider: `mock`, `openai`, `anthropic`, or `gemini`

LLM Provider Settings

OpenAI:

OPENAI_API_KEY (required)
OPENAI_MODEL (optional, default: gpt-4o)
OPENAI_BASE_URL (optional, for custom endpoints)

Anthropic:

ANTHROPIC_API_KEY (required)
ANTHROPIC_MODEL (optional, default: claude-3-5-sonnet-20241022)
ANTHROPIC_BASE_URL (optional, for custom endpoints)

Google Gemini:

GEMINI_API_KEY (required)
GEMINI_MODEL (optional, default: gemini-2.0-flash-exp)

Conversation Storage Settings

Variable	Default	Description
`CONVERSATION_STORE_TYPE`	`memory`	Storage backend: `memory` or `sqlite`
`SQLITE_DB_PATH`	`./data/conversations.db`	SQLite database file path (when using `sqlite`)

What Copilot Does

Copilot answers operational questions by orchestrating MCP tool calls and synthesizing evidence:

Incident Analysis – Retrieve recent/impactful incidents with context including related PagerDuty alerts, linked Jira tickets, and nearby logs/metrics
Incident History – Explain incident changes and timelines, e.g., "What triggered the severity escalation?"
Pattern Detection – Find similar incidents, e.g., "Has this service had similar incidents recently?"
Signal Correlation – Correlate metrics, e.g., "Is the p95 latency spike correlated with CPU, memory, or traffic?"
Root Cause Analysis – Match error signatures to past incidents and identify likely causes
Deployment Correlation – Correlate incidents with recent deployments and code changes
Service Dependencies – Discover service relationships and dependencies
Team Context – Identify on-call teams and escalation paths
Messaging Integration – Share findings via Slack or other messaging tools when needed

Question Coverage Examples

Basic Understanding:

Summarize an incident
Note changes right before incident start
Infer likely root cause from logs/metrics
Correlate with recent deployments
Pull last N minutes of related logs

Context & Relationships:

List dependent services
Find similar incidents for a service
Relate to earlier incidents
Identify severity escalation triggers

Causal Analysis:

Match error signatures to past incidents
Correlate latency spikes with CPU/memory/traffic
Distinguish DB vs network vs code issues
Compare against prior failures

Metrics:

Explain CPU spikes and latency anomalies
Surface metric anomalies for a service in a time window
Identify pods/nodes contributing most errors

Logs:

Query 500 errors for a service over a time window
Extract dominant error patterns
List IPs with most failed requests
Flag unusual log patterns

Correlation:

Align logs and metrics for a service
Test hypotheses like memory leaks
Find earliest signals of degradation

Stack and Boundaries

OpsOrch Copilot is part of a layered architecture:

UI Layer – opsorch-console (Next.js web UI)
AI Runtime – opsorch-copilot (this repo) – LLM prompts, reasoning, tool orchestration
Tools Layer – opsorch-mcp – Typed MCP tools wrapping Core APIs
Core Layer – opsorch-core – Source of truth for incidents, logs, metrics, services, tickets, messaging
Adapters – Provider-specific adapters (PagerDuty, Datadog, Jira, Slack, etc.)

Key Principle: Copilot never talks to OpsOrch Core directly. All interactions go through the MCP tools layer, ensuring a clean separation of concerns and consistent tool-based interface.

Architecture

Copilot implements a multi-step agentic reasoning loop that orchestrates LLM planning, tool execution, and answer synthesis:

Planning – LLM analyzes the question and plans which MCP tools to call
Execution – Tools are called in parallel with retry logic and result caching
Analysis – Handlers extract entities, detect anomalies, and suggest follow-ups
Refinement – If needed, additional tool calls are planned based on results
Synthesis – Final answer is generated with evidence and structured references

Key architectural components:

CopilotEngine – Main orchestration engine (max 3 iterations)
Planner – LLM-based tool call planning with heuristic fallback
ToolRunner – Parallel tool execution with caching and retry strategy
EntityExtractor – Extracts IDs, timestamps, and references from results
ReferenceResolver – Resolves pronouns like "that incident" to specific entities
FollowUpEngine – Suggests intelligent next actions based on results
AnswerGenerator – Synthesizes final answers with evidence
ConversationManager – Manages multi-turn conversation history

See DESIGN.md for detailed architecture documentation and AGENTS.md for the layered system overview.

Capability-Based Handler Architecture

Copilot uses a capability-based handler system organized around nine core operational domains:

Nine Core Capabilities:

incident/ – Incident query and analysis
alert/ – Alert monitoring and investigation
log/ – Log search and analysis
metric/ – Metrics query and correlation
service/ – Service discovery and dependencies
ticket/ – Ticket linking and management
deployment/ – Deployment tracking and correlation
orchestration/ – Workflow orchestration and automation
team/ – Team management and on-call schedules

Handler Types (11 total): Each capability implements specialized handlers from this set:

Handler Type	Purpose
Intent	Classifies user intent for the capability
Entity	Extracts structured entities (IDs, timestamps) from tool results
Follow-up	Suggests intelligent next actions based on results
Validation	Validates tool call arguments and normalizes them
Scope	Infers query scope (service, environment, team) from context
Reference	Resolves pronouns like "that incident" to specific entity IDs
Correlation	Detects correlations between events (incidents, logs, metrics)
Anomaly	Detects anomalies in metric time series data
QueryBuilder	Constructs tool-specific queries from natural language
ServiceDiscovery	Discovers available services from MCP
ServiceMatching	Performs fuzzy matching of service names in questions

Development

Project Structure

src/
├── engine/              # Core orchestration and reasoning
│   ├── handlers/        # Capability-specific handlers
│   │   ├── incident/    # Incident analysis handlers
│   │   ├── alert/       # Alert monitoring handlers
│   │   ├── log/         # Log search handlers
│   │   ├── metric/      # Metrics analysis handlers
│   │   ├── service/     # Service discovery handlers
│   │   ├── ticket/      # Ticket management handlers
│   │   ├── deployment/  # Deployment tracking handlers
│   │   ├── orchestration/ # Workflow handlers
│   │   ├── team/        # Team management handlers
│   │   └── shared/      # Shared utilities
│   ├── copilotEngine.ts # Main orchestration engine
│   ├── planner.ts       # LLM-based tool planning
│   ├── toolRunner.ts    # Tool execution with retry logic
│   ├── entityExtractor.ts # Entity extraction from results
│   ├── referenceResolver.ts # Reference resolution
│   ├── followUpEngine.ts # Follow-up suggestion engine
│   └── answerGenerator.ts # Answer synthesis
├── llms/                # LLM provider adapters
├── mcps/                # MCP client implementations
├── stores/              # Conversation storage backends
└── server.ts            # HTTP API server

Running Locally

Start the full OpsOrch stack:

Start Core (port 8080):

cd ../opsorch-core && go run ./cmd/opsorch

Start MCP (port 7070):
```
cd ../opsorch-mcp && npm run dev
```

Start Copilot (port 6060):

cd opsorch-copilot
npm install
MCP_URL=http://localhost:7070/mcp \
LLM_PROVIDER=mock \
npm run dev

Start Console (port 3000):
```
cd ../opsorch-console && npm run dev
```

Available Scripts

npm run dev – Start development server with hot reload
npm start – Start production server
npm test – Run all tests
npm run type-check – TypeScript type checking
npm run lint – Lint code
npm run lint:fix – Fix linting issues
npm run build – Build for production
npm run seed – Seed database with sample conversations

Environment Variables

See the Configuration section above for all available environment variables.

HTTP API

The Copilot server exposes a REST API for chat interactions and conversation management.

Endpoints:

POST /chat – Submit a question and get an AI-generated answer
- Request body: { "message": "<question>", "chatId?": "<optional-conversation-id>" }
- Response: { "chatId": "<id>", "name": "<conversation-name>", "answer": { ... } }
- The answer object includes:
  - conclusion – Short summary answer
  - evidence – Supporting data and findings
  - references – Structured references for deep linking:
    - incidents[] – Incident IDs
    - services[] – Service names
    - tickets[] – Ticket IDs
    - alerts[] – Alert IDs
    - metrics[] – Metric queries with {expression, start, end, step}
    - logs[] – Log queries with {query, start, end, service}
  - missing – Notes about unavailable data
- If chatId is omitted, a new conversation is created and its ID is returned
GET /health – Health check endpoint
- Response: { "status": "ok" }
GET /chats – List all saved conversations with pagination
- Query parameters:
  - limit (optional) – Maximum number of results to return
  - offset (optional) – Number of results to skip (default: 0)
- Response: { "conversations": [...], "pagination": { total, offset, limit, hasMore } }
- Each conversation includes: chatId, name, createdAt, lastAccessedAt, turnCount, preview
- Results are sorted by most recent access first
GET /chats/search – Search conversations by content
- Query parameters:
  - query (required) – Search query string
  - limit (optional) – Maximum number of results (default: 50)
- Response: { "query": "...", "limit": 50, "totalResults": N, "results": [...] }
- Searches across conversation names, user messages, and assistant responses
GET /chats/:id – Retrieve a specific conversation by ID
- Response: { "conversation": { chatId, name, turns, createdAt, lastAccessedAt } }
- Returns 404 if conversation not found or expired

Conversation Storage

Copilot supports two storage backends for conversation persistence:

In-Memory Storage (Default)

Best for development and testing:

Conversations stored in memory with LRU eviction
Data is lost on server restart
No configuration required
Fast and lightweight

# No configuration needed - this is the default
npm run dev

SQLite Storage

Best for production and demos:

Conversations persist across server restarts
Stored in a local SQLite database file
Same LRU eviction behavior as in-memory storage
Supports full-text search across conversations

Configuration:

# Enable SQLite storage
CONVERSATION_STORE_TYPE=sqlite

# Optional: specify database file path (default: ./data/conversations.db)
SQLITE_DB_PATH=/path/to/conversations.db

npm run dev

Docker Example:

services:
  copilot:
    image: opsorch-copilot:latest
    environment:
      - CONVERSATION_STORE_TYPE=sqlite
      - SQLITE_DB_PATH=/data/conversations.db
    volumes:
      - copilot-data:/data
volumes:
  copilot-data:

Backup and Recovery:

For SQLite storage, regular backups of the database file are recommended:

# Backup
cp /path/to/conversations.db /path/to/backup/conversations-$(date +%Y%m%d).db

# Restore
cp /path/to/backup/conversations-20250122.db /path/to/conversations.db

Graceful Shutdown:

The server handles SIGTERM and SIGINT signals gracefully, ensuring the SQLite database is properly closed before exit.

Testing

Running Tests

# Run all tests
npm test

# Type checking
npm run type-check

# Linting
npm run lint

Test Coverage

Comprehensive test suites cover:

Engine & Orchestration:

CopilotEngine – Planning loop, iteration limits, multi-turn conversations
Planner – LLM planning, JSON fallback, heuristic fallback
ToolRunner – Tool execution, result normalization, error handling
ParallelToolRunner – Concurrent execution, ordering, deduplication
ResultCache – Cache hits/misses, invalidation
EntityExtractor – Entity extraction from various tool result structures
ReferenceResolver – Reference resolution with conversation history
FollowUpEngine – Follow-up suggestion generation and deduplication
ExecutionTracer – Trace creation, telemetry, and diagnostics

Capability Handlers:

Intent Classification – Pattern matching, service extraction, tool injection
Entity Extraction – ID extraction, entity type detection, nested structure handling
Scope Inference – Scope detection from context, intelligent parameterization
Reference Handlers – Pronoun resolution, entity linking, temporal references
Validation – Tool call validation and argument normalization
Follow-up – Context-aware follow-up suggestions

Conversation Management:

ConversationManager – Turn storage, retrieval, LRU eviction
ConversationStore – In-memory and SQLite persistence
ConversationSearch – Full-text search, filtering, result ranking

Analysis & Synthesis:

CorrelationDetector – Correlation detection, root cause identification
AnomalyDetector – Anomaly detection, trend analysis
TimeWindowExpander – Window expansion, capping calculations
AnswerFormatter – Evidence aggregation, reference formatting

Utilities:

ChatNamer – Conversation name generation and synthesis
ServiceDiscovery – Service lookup and caching
TimestampUtils – Timestamp parsing and formatting
MetricUtils – Metric parsing and aggregation
ToolsSchema – Tool schema validation

Testing Patterns

MockMcp – Simulates MCP tool responses without network calls
Temporary SQLite databases – Each SQLite test uses a temporary database file cleaned up after test runs
Conversation fixtures – Pre-built conversation data for testing multi-turn flows
Tool result mocking – Realistic tool responses for testing handlers and synthesis

Integration Testing

Start the full stack for end-to-end testing:

Start Core: cd ../opsorch-core && go run ./cmd/opsorch
Start MCP: cd ../opsorch-mcp && npm run dev
Start Copilot: npm run dev
Start Console: cd ../opsorch-console && npm run dev

Test via Console UI at http://localhost:3000 or direct API calls to http://localhost:6060/chat

Seeding the Database

To populate the database with realistic sample conversations for testing or demo purposes:

npm run seed

This command:

Clears any existing conversations in the database
Generates 30 realistic operational conversations covering various scenarios:
- Incident investigations (high error rates, service outages)
- Service health checks and monitoring
- Performance issues (latency spikes, memory leaks)
- Database and infrastructure problems
- Deployment verifications
- SSL certificate management
- Rate limiting and cache issues
Populates conversations with realistic tool results, timestamps, and entities
Distributes conversations across the last 30 days

The seed script uses the database path from SQLITE_DB_PATH environment variable or defaults to ./data/conversations.db.

Note: Seeding requires SQLite storage. Set CONVERSATION_STORE_TYPE=sqlite before running the seed command.

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
DESIGN.md		DESIGN.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

OpsOrch Copilot

Table of Contents

Status

Quick Start

Prerequisites

Installation and Startup

Verify Installation

Make Your First Request

Configuration

Core Runtime Settings

LLM Provider Settings

Conversation Storage Settings

What Copilot Does

Question Coverage Examples

Stack and Boundaries

Architecture

Capability-Based Handler Architecture

Development

Project Structure

Running Locally

Available Scripts

Environment Variables

HTTP API

Conversation Storage

In-Memory Storage (Default)

SQLite Storage

Testing

Running Tests

Test Coverage

Testing Patterns

Integration Testing

Seeding the Database

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages