-
Notifications
You must be signed in to change notification settings - Fork 476
Description
Summary
Kagent currently operates as a single-cluster deployment — the UI talks to one controller (port 8083) which proxies to agent pods in the same cluster. This proposal introduces a hub and spoke architecture that enables a central UI to manage agents across multiple Kubernetes clusters.
Motivation
Organizations running Kubernetes across multiple clusters (multi-cloud, on-prem, hybrid) need a unified management plane for their AI agents. Today, each cluster requires its own kagent UI instance with no cross-cluster visibility or management.
A hub and spoke model would provide:
- Unified view of agents, conversations, and tools across all clusters
- Single pane of glass for operations teams
- Cross-cluster agent communication via A2A protocol
- Centralized auth with a shared OIDC provider (aligns with EP-476 Dex plans)
- Spoke autonomy — each cluster continues to function independently if the hub is unavailable
Proposed Architecture
┌─────────────────────────┐
│ Hub (Central) │
│ ┌───────────────────┐ │
│ │ Next.js UI │ │
│ │ (cluster-aware) │ │
│ └────────┬──────────┘ │
│ │ │
│ ┌────────▼──────────┐ │
│ │ Federation API │ │
│ │ (new Go service) │ │
│ └──┬─────┬──────┬───┘ │
│ │ │ │ │
│ ┌──▼─┐ ┌▼──┐ ┌─▼──┐ │
│ │ DB │ │Auth│ │Reg.│ │
│ └────┘ └───┘ └────┘ │
└─────┼─────┼──────┼───────┘
│ │ │
┌───────────┘ │ └──────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Spoke: Cluster A│ │ Spoke: Cluster B│ │ Spoke: Cluster C│
│ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ Controller │ │ │ │ Controller │ │ │ │ Controller │ │
│ │ (port 8083) │ │ │ │ (port 8083) │ │ │ │ (port 8083) │ │
│ └──────┬───────┘ │ │ └──────┬───────┘ │ │ └──────┬───────┘ │
│ ┌──────▼───────┐ │ │ ┌──────▼───────┐ │ │ ┌──────▼───────┐ │
│ │ Agent Pods │ │ │ │ Agent Pods │ │ │ │ Agent Pods │ │
│ └──────────────┘ │ │ └──────────────┘ │ │ └──────────────┘ │
└──────────────────┘ └──────────────────┘ └──────────────────┘
Key Components
1. Federation API (New Go Service)
A lightweight Go service deployed in the hub cluster:
- Spoke Registry — stores cluster endpoints, credentials, and health status
- Request Routing — proxies API calls to the correct spoke controller based on cluster context
- Aggregation — merges agent/conversation/tool lists from all spokes for unified views
- Auth Gateway — single OIDC/auth layer (shared Dex instance per EP-476)
- Health Monitoring — periodic health checks on spoke controllers
API design wraps existing controller endpoints with a cluster prefix:
GET /api/clusters # list registered spokes
GET /api/clusters/{clusterId}/agents # agents on a specific spoke
POST /api/clusters/{clusterId}/a2a/{ns}/{name} # A2A proxy to specific spoke
GET /api/agents # aggregated view across all spokes
2. UI Changes (Cluster-Aware Next.js)
Modify the existing Next.js UI to support multi-cluster context:
- Cluster Selector — top-level nav element to switch cluster context
- Aggregated Views — agent list, conversations, tools show data across all clusters with cluster badges
- Server Actions — update
ui/src/app/actions/to route through the federation API with cluster context - A2A Streaming — update
KagentA2AClientto include cluster context when proxying SSE streams
Key files to modify:
ui/src/app/actions/— all server actions get aclusterIdparameterui/src/components/— add cluster context/selector componentsui/src/lib/a2aClient.ts— route A2A calls through federation API- New
ui/src/app/clusters/— cluster management pages
3. Spoke Agent
A small agent deployed alongside the existing controller in each spoke cluster:
- Registers itself with the hub on startup
- Exposes a secure endpoint for the hub to reach the local controller
- Handles mTLS or token-based auth for hub-to-spoke communication
- Reports cluster metadata (version, capacity, agent count)
Connectivity Options
| Approach | When to Use |
|---|---|
| Direct (Hub → Spoke Ingress) | Spokes are reachable from hub network; expose controller via Ingress/Gateway with mTLS |
| Tunnel (Spoke → Hub) | Spokes behind firewalls; spoke agent opens persistent gRPC/WebSocket tunnel to hub |
| Hybrid | Mix based on network topology |
The tunnel approach is more practical for real-world multi-cloud/on-prem deployments since spokes often can't expose inbound ports.
Design Principles
- Hub has no agents — purely a management/routing plane. Agents always run in spokes. Keeps the blast radius local.
- Spokes remain autonomous — if the hub goes down, each spoke still works independently with its local controller. Critical for resilience.
- Leverage existing A2A protocol — the current A2A JSON-RPC + SSE streaming model works as-is. The federation API adds a routing layer on top.
- Database strategy — hub gets its own PostgreSQL for cluster registry + aggregated metadata. Spokes keep their own SQLite/Postgres. No shared database across clusters.
Implementation Phases
Phase 1 — Federation API + Cluster Registry
- New Go module under
go/federation/using existinggo/api/types - Spoke registration (CRD or DB-backed registry)
- Proxy layer that forwards requests to spoke controllers
- Health check loop
Phase 2 — UI Multi-Cluster Support
- Cluster context provider in React (extend existing Zustand store)
- Cluster selector component
- Update all server actions to accept cluster context
- Aggregated list views with cluster filtering
Phase 3 — Spoke Agent + Secure Connectivity
- Helm chart for spoke agent deployment
- mTLS or OIDC token exchange between hub and spokes
- Tunnel mode for restricted network environments
Phase 4 — Cross-Cluster Agent Communication
- A2A routing across clusters (Agent in Cluster A talks to Agent in Cluster B)
- Federation API becomes the A2A router for cross-cluster messages
- Cross-cluster addressing scheme:
agent-name.namespace.cluster-id
Metadata
Metadata
Assignees
Labels
Type
Projects
Status