Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,19 @@ export default defineConfig({
{ label: 'OpenShift Setup', slug: 'deployment/openshift-setup' },
],
},
{
label: 'MCP',
items: [
{ label: 'Overview', slug: 'mcp' },
{ label: 'Installation', slug: 'mcp/installation' },
{ label: 'Quick Start', slug: 'mcp/quickstart' },
{ label: 'Configuration', slug: 'mcp/configuration' },
{ label: 'Tool Reference', slug: 'mcp/tools' },
{ label: 'Resource Reference', slug: 'mcp/resources' },
{ label: 'Prompt Reference', slug: 'mcp/prompts' },
{ label: 'Troubleshooting', slug: 'mcp/troubleshooting' },
],
},
{
label: 'Adapters',
items: [
Expand Down
169 changes: 169 additions & 0 deletions src/content/docs/mcp/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
---
title: "Configuration"
---

The EvalHub MCP server can be configured through CLI flags, a YAML configuration file, or environment variables. When multiple sources set the same value, **CLI flags take highest precedence**, followed by the config file, then environment variables.

## CLI Flags

```
evalhub-mcp [flags]
```

| Flag | Default | Description |
|------|---------|-------------|
| `--transport` | `stdio` | Transport mode: `stdio`, `http`, or `http-sse` |
| `--host` | `localhost` | Bind address for HTTP transports |
| `--port` | `3001` | Port for HTTP transports |
| `--config` | — | Path to YAML configuration file |
| `--insecure` | `false` | Skip TLS certificate verification for the EvalHub backend |
| `--tls-cert` | — | Path to TLS certificate file (for HTTPS on the MCP server) |
| `--tls-key` | — | Path to TLS private key file (for HTTPS on the MCP server) |
| `--version` | — | Print version and exit |

Both `--tls-cert` and `--tls-key` must be provided together. When set, the HTTP server listens over HTTPS.

## Configuration File

Pass `--config <path>` to load settings from a YAML file:

```yaml
# evalhub-mcp.yaml
base_url: https://evalhub.apps.my-cluster.example.com
token: <your-api-token>
tenant: my-team
transport: http
host: 0.0.0.0
port: 3001
insecure: false
```

## Environment Variables

| Variable | Description |
|----------|-------------|
| `EVALHUB_BASE_URL` | EvalHub backend API URL |
| `EVALHUB_TOKEN` | Authentication token |
| `EVALHUB_TENANT` | Tenant identifier |
| `EVALHUB_TRANSPORT` | Transport mode (`stdio`, `http`, `http-sse`) |
| `EVALHUB_HOST` | HTTP bind address |
| `EVALHUB_PORT` | HTTP port |
| `EVALHUB_INSECURE` | Skip TLS verification for EvalHub backend (`true`/`false`) |
| `EVALHUB_TLS_CERT_FILE` | Path to TLS certificate |
| `EVALHUB_TLS_KEY_FILE` | Path to TLS private key |
| `EVALHUB_LIST_PAGE_LIMIT` | Default page size for list resources |

## Precedence

When the same setting is specified in multiple places:

1. **CLI flags** (highest priority)
2. **YAML config file** (if `--config` is used)
3. **Environment variables** (lowest priority)

For example, if `EVALHUB_TRANSPORT=http` is set as an environment variable but you run `evalhub-mcp --transport stdio`, the server uses stdio.

## Kubernetes Operator

When EvalHub is deployed via the TrustyAI operator, the MCP server is configured through the `spec.mcp` section of the EvalHub custom resource:

```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: EvalHub
metadata:
name: evalhub
namespace: my-namespace
spec:
replicas: 1
mcp:
enabled: true
replicas: 1
transport: http
image: quay.io/evalhub/evalhub-mcp:latest
authSecret: mcp-auth-token
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
env:
- name: LOG_LEVEL
value: "debug"
```

### Operator MCP Fields

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | bool | `false` | Enable MCP server deployment |
| `replicas` | int | `1` | Number of MCP server replicas |
| `transport` | string | `http` | Client-facing transport (`http` or `http-sse`) |
| `evalHubTransport` | string | `http` | Transport for internal EvalHub API calls |
| `image` | string | `quay.io/evalhub/evalhub-mcp:latest` | Container image override |
| `authSecret` | string | — | Kubernetes Secret containing a `token` key for EvalHub API auth |
| `resources` | ResourceRequirements | 100m/128Mi request, 500m/256Mi limit | Container resource requests and limits |
| `env` | []EnvVar | — | Additional environment variables |

### What the Operator Creates

When `spec.mcp.enabled` is `true`, the operator automatically creates:

- **Deployment** (`<name>-mcp`): Runs the MCP server container with health checks
- **Service** (`<name>-mcp`): ClusterIP service on port 8443
- **ConfigMap** (`<name>-mcp-config`): Server configuration YAML
- **Route** (OpenShift only, `<name>-mcp`): Edge-terminated TLS route for external access

TLS certificates are automatically provisioned via OpenShift service signing.

### Checking MCP Status

```bash
kubectl get evalhub <name> -o jsonpath='{.status.mcp}'
```

The status includes:
- `phase`: `Pending`, `Ready`, `Error`, or `Disabled`
- `ready`: Whether the MCP deployment is available
- `url`: Internal service URL

## Example Configurations

### Local Development

```bash
export EVALHUB_BASE_URL="http://localhost:8080"
export EVALHUB_TOKEN="dev-token"
export EVALHUB_TENANT="default"

evalhub-mcp --transport stdio
```

### Shared Team Server

```yaml
# team-mcp.yaml
base_url: https://evalhub.apps.cluster.example.com
token: <team-service-account-token>
tenant: team-a
transport: http
host: 0.0.0.0
port: 3001
```

```bash
evalhub-mcp --config team-mcp.yaml
```

### Secure Production Server

```bash
evalhub-mcp \
--transport http \
--host 0.0.0.0 \
--port 8443 \
--tls-cert /etc/tls/server.crt \
--tls-key /etc/tls/server.key \
--config /etc/evalhub-mcp/config.yaml
```
62 changes: 62 additions & 0 deletions src/content/docs/mcp/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: "MCP Overview"
---

The EvalHub MCP server implements the [Model Context Protocol](https://modelcontextprotocol.io/) (MCP), enabling AI coding assistants such as Claude Code, VS Code with GitHub Copilot, and other MCP-compatible clients to interact with EvalHub directly from a conversation.

## What is MCP?

MCP is an open standard that lets AI assistants connect to external tools and data sources through a unified protocol. Instead of manually copying commands or switching between terminal windows, your AI assistant can submit evaluations, check job status, browse benchmarks, and follow structured evaluation workflows — all through natural language.

## What the EvalHub MCP Server Provides

### Tools

Actions the AI assistant can execute on your behalf:

| Tool | Description |
|------|-------------|
| `submit_evaluation` | Submit a new model evaluation job with benchmarks or a collection |
| `get_job_status` | Check job progress, state, and per-benchmark status |
| `cancel_job` | Cancel a running or pending evaluation job |

### Resources

Read-only data the assistant can query using `evalhub://` URIs:

| Resource | URI | Description |
|----------|-----|-------------|
| Providers | `evalhub://providers` | List evaluation providers and their benchmarks |
| Benchmarks | `evalhub://benchmarks` | Browse benchmarks, filter by label |
| Collections | `evalhub://collections` | List pre-defined benchmark collections |
| Jobs | `evalhub://jobs` | List evaluation jobs, filter by status |
| Server Version | `evalhub://server/version` | Server build and version metadata |

All list resources support pagination (`?limit=N&offset=N`). Benchmarks support label filtering (`?label=rag&label=safety`). Jobs support status filtering (`?status=running`).

### Prompts

Structured conversation templates that guide the assistant through complex workflows:

| Prompt | Description |
|--------|-------------|
| `edd_workflow` | Evaluation-Driven Development cycle: Define, Measure, Iterate |
| `evaluate_model` | Step-by-step model evaluation from discovery to results |
| `compare_runs` | Compare metrics across two or more evaluation jobs |

## Transport Modes

The MCP server supports multiple transport modes for different deployment scenarios:

| Mode | Flag | Use Case |
|------|------|----------|
| **stdio** | `--transport stdio` | Local development. The AI client launches the server as a subprocess. |
| **Streamable HTTP** | `--transport http` | Remote or shared deployments. The server runs as a standalone HTTP service. |
| **Legacy HTTP+SSE** | `--transport http-sse` | Older MCP clients that don't support Streamable HTTP. |

## Next Steps

- [Install the MCP server](/mcp/installation/) on your platform
- Follow the [Quick Start](/mcp/quickstart/) to connect your AI assistant in under 5 steps
- Browse the [Tool](/mcp/tools/), [Resource](/mcp/resources/), and [Prompt](/mcp/prompts/) references
- See [Configuration](/mcp/configuration/) for all available options
108 changes: 108 additions & 0 deletions src/content/docs/mcp/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
title: "Installation"
---

import { Tabs, TabItem } from '@astrojs/starlight/components';

The `evalhub-mcp` binary is a standalone server that connects AI assistants to EvalHub. It is available for macOS (Intel and Apple Silicon), Linux (amd64 and arm64), and as a container image.

## Prerequisites

- An EvalHub instance (running locally or on a cluster) with a reachable API endpoint
- An authentication token for your EvalHub tenant
- An MCP-compatible AI client ([Claude Code](https://docs.anthropic.com/en/docs/claude-code), [VS Code with GitHub Copilot](https://code.visualstudio.com/), or another MCP client)

## Install the Binary

<Tabs>
<TabItem label="Homebrew (macOS / Linux)">

```bash
brew install evalhub-mcp
```

Verify:

```bash
evalhub-mcp --version
```

</TabItem>
<TabItem label="GitHub Releases">

Download the binary for your platform from [GitHub Releases](https://github.com/eval-hub/eval-hub/releases):

```bash
# macOS (Apple Silicon)
curl -Lo evalhub-mcp https://github.com/eval-hub/eval-hub/releases/latest/download/evalhub-mcp-darwin-arm64

# macOS (Intel)
curl -Lo evalhub-mcp https://github.com/eval-hub/eval-hub/releases/latest/download/evalhub-mcp-darwin-amd64

# Linux (amd64)
curl -Lo evalhub-mcp https://github.com/eval-hub/eval-hub/releases/latest/download/evalhub-mcp-linux-amd64

# Linux (arm64)
curl -Lo evalhub-mcp https://github.com/eval-hub/eval-hub/releases/latest/download/evalhub-mcp-linux-arm64
```

Make it executable and move it to your PATH:

```bash
chmod +x evalhub-mcp
sudo mv evalhub-mcp /usr/local/bin/
```

Verify:

```bash
evalhub-mcp --version
```

</TabItem>
<TabItem label="Build from Source">

Requires Go 1.25 or later.

```bash
git clone https://github.com/eval-hub/eval-hub.git
cd eval-hub
make build-mcp
```

The binary is placed in `./bin/evalhub-mcp`. Move it to your PATH:

```bash
sudo mv ./bin/evalhub-mcp /usr/local/bin/
```

</TabItem>
</Tabs>

## Kubernetes / OpenShift Deployment

If EvalHub is managed by the TrustyAI operator, the MCP server can be deployed as a sidecar by enabling it in the EvalHub custom resource:

```yaml
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: EvalHub
metadata:
name: evalhub
spec:
replicas: 1
mcp:
enabled: true
replicas: 1
```

The operator creates a Deployment, Service, ConfigMap, and (on OpenShift) a Route for the MCP server automatically. See [Configuration](/mcp/configuration/#kubernetes-operator) for all available fields.

## Using the EvalHub CLI as an MCP Server

If you already have the [EvalHub CLI](/guides/cli/) installed and configured, you can use it as an MCP server directly without installing `evalhub-mcp` separately:

```bash
claude mcp add evalhub -- evalhub --profile <profile-name> mcp
```

This uses the CLI's built-in `mcp` subcommand with an existing CLI profile for authentication. See the [Quick Start](/mcp/quickstart/) for the full setup flow using either approach.
Loading