Hub and Spoke Model for Multi-Cluster UI Management

## Summary

Kagent currently operates as a single-cluster deployment — the UI talks to one controller (port 8083) which proxies to agent pods in the same cluster. This proposal introduces a **hub and spoke architecture** that enables a central UI to manage agents across multiple Kubernetes clusters.

## Motivation

Organizations running Kubernetes across multiple clusters (multi-cloud, on-prem, hybrid) need a unified management plane for their AI agents. Today, each cluster requires its own kagent UI instance with no cross-cluster visibility or management.

A hub and spoke model would provide:
- **Unified view** of agents, conversations, and tools across all clusters
- **Single pane of glass** for operations teams
- **Cross-cluster agent communication** via A2A protocol
- **Centralized auth** with a shared OIDC provider (aligns with EP-476 Dex plans)
- **Spoke autonomy** — each cluster continues to function independently if the hub is unavailable

## Proposed Architecture

```
                    ┌─────────────────────────┐
                    │      Hub (Central)       │
                    │  ┌───────────────────┐   │
                    │  │   Next.js UI      │   │
                    │  │  (cluster-aware)  │   │
                    │  └────────┬──────────┘   │
                    │           │               │
                    │  ┌────────▼──────────┐   │
                    │  │  Federation API   │   │
                    │  │  (new Go service) │   │
                    │  └──┬─────┬──────┬───┘   │
                    │     │     │      │        │
                    │  ┌──▼─┐ ┌▼──┐ ┌─▼──┐    │
                    │  │ DB │ │Auth│ │Reg.│    │
                    │  └────┘ └───┘ └────┘    │
                    └─────┼─────┼──────┼───────┘
                          │     │      │
              ┌───────────┘     │      └──────────┐
              ▼                 ▼                  ▼
    ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
    │  Spoke: Cluster A│ │  Spoke: Cluster B│ │  Spoke: Cluster C│
    │  ┌─────────────┐ │ │  ┌─────────────┐ │ │  ┌─────────────┐ │
    │  │  Controller  │ │ │  │  Controller  │ │ │  │  Controller  │ │
    │  │  (port 8083) │ │ │  │  (port 8083) │ │ │  │  (port 8083) │ │
    │  └──────┬───────┘ │ │  └──────┬───────┘ │ │  └──────┬───────┘ │
    │  ┌──────▼───────┐ │ │  ┌──────▼───────┐ │ │  ┌──────▼───────┐ │
    │  │  Agent Pods  │ │ │  │  Agent Pods  │ │ │  │  Agent Pods  │ │
    │  └──────────────┘ │ │  └──────────────┘ │ │  └──────────────┘ │
    └──────────────────┘ └──────────────────┘ └──────────────────┘
```

## Key Components

### 1. Federation API (New Go Service)

A lightweight Go service deployed in the hub cluster:

- **Spoke Registry** — stores cluster endpoints, credentials, and health status
- **Request Routing** — proxies API calls to the correct spoke controller based on cluster context
- **Aggregation** — merges agent/conversation/tool lists from all spokes for unified views
- **Auth Gateway** — single OIDC/auth layer (shared Dex instance per EP-476)
- **Health Monitoring** — periodic health checks on spoke controllers

API design wraps existing controller endpoints with a cluster prefix:

```
GET  /api/clusters                              # list registered spokes
GET  /api/clusters/{clusterId}/agents            # agents on a specific spoke
POST /api/clusters/{clusterId}/a2a/{ns}/{name}   # A2A proxy to specific spoke
GET  /api/agents                                 # aggregated view across all spokes
```

### 2. UI Changes (Cluster-Aware Next.js)

Modify the existing Next.js UI to support multi-cluster context:

- **Cluster Selector** — top-level nav element to switch cluster context
- **Aggregated Views** — agent list, conversations, tools show data across all clusters with cluster badges
- **Server Actions** — update `ui/src/app/actions/` to route through the federation API with cluster context
- **A2A Streaming** — update `KagentA2AClient` to include cluster context when proxying SSE streams

Key files to modify:
- `ui/src/app/actions/` — all server actions get a `clusterId` parameter
- `ui/src/components/` — add cluster context/selector components
- `ui/src/lib/a2aClient.ts` — route A2A calls through federation API
- New `ui/src/app/clusters/` — cluster management pages

### 3. Spoke Agent

A small agent deployed alongside the existing controller in each spoke cluster:

- Registers itself with the hub on startup
- Exposes a secure endpoint for the hub to reach the local controller
- Handles mTLS or token-based auth for hub-to-spoke communication
- Reports cluster metadata (version, capacity, agent count)

## Connectivity Options

| Approach | When to Use |
|----------|------------|
| **Direct (Hub → Spoke Ingress)** | Spokes are reachable from hub network; expose controller via Ingress/Gateway with mTLS |
| **Tunnel (Spoke → Hub)** | Spokes behind firewalls; spoke agent opens persistent gRPC/WebSocket tunnel to hub |
| **Hybrid** | Mix based on network topology |

The tunnel approach is more practical for real-world multi-cloud/on-prem deployments since spokes often can't expose inbound ports.

## Design Principles

1. **Hub has no agents** — purely a management/routing plane. Agents always run in spokes. Keeps the blast radius local.
2. **Spokes remain autonomous** — if the hub goes down, each spoke still works independently with its local controller. Critical for resilience.
3. **Leverage existing A2A protocol** — the current A2A JSON-RPC + SSE streaming model works as-is. The federation API adds a routing layer on top.
4. **Database strategy** — hub gets its own PostgreSQL for cluster registry + aggregated metadata. Spokes keep their own SQLite/Postgres. No shared database across clusters.

## Implementation Phases

### Phase 1 — Federation API + Cluster Registry
- New Go module under `go/federation/` using existing `go/api/` types
- Spoke registration (CRD or DB-backed registry)
- Proxy layer that forwards requests to spoke controllers
- Health check loop

### Phase 2 — UI Multi-Cluster Support
- Cluster context provider in React (extend existing Zustand store)
- Cluster selector component
- Update all server actions to accept cluster context
- Aggregated list views with cluster filtering

### Phase 3 — Spoke Agent + Secure Connectivity
- Helm chart for spoke agent deployment
- mTLS or OIDC token exchange between hub and spokes
- Tunnel mode for restricted network environments

### Phase 4 — Cross-Cluster Agent Communication
- A2A routing across clusters (Agent in Cluster A talks to Agent in Cluster B)
- Federation API becomes the A2A router for cross-cluster messages
- Cross-cluster addressing scheme: `agent-name.namespace.cluster-id`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hub and Spoke Model for Multi-Cluster UI Management #1490

Summary

Motivation

Proposed Architecture

Key Components

1. Federation API (New Go Service)

2. UI Changes (Cluster-Aware Next.js)

3. Spoke Agent

Connectivity Options

Design Principles

Implementation Phases

Phase 1 — Federation API + Cluster Registry

Phase 2 — UI Multi-Cluster Support

Phase 3 — Spoke Agent + Secure Connectivity

Phase 4 — Cross-Cluster Agent Communication

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Approach	When to Use
Direct (Hub → Spoke Ingress)	Spokes are reachable from hub network; expose controller via Ingress/Gateway with mTLS
Tunnel (Spoke → Hub)	Spokes behind firewalls; spoke agent opens persistent gRPC/WebSocket tunnel to hub
Hybrid	Mix based on network topology

Hub and Spoke Model for Multi-Cluster UI Management #1490

Description

Summary

Motivation

Proposed Architecture

Key Components

1. Federation API (New Go Service)

2. UI Changes (Cluster-Aware Next.js)

3. Spoke Agent

Connectivity Options

Design Principles

Implementation Phases

Phase 1 — Federation API + Cluster Registry

Phase 2 — UI Multi-Cluster Support

Phase 3 — Spoke Agent + Secure Connectivity

Phase 4 — Cross-Cluster Agent Communication

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions