Skip to content

Hub and Spoke Model for Multi-Cluster UI Management #1490

@jsonmp-k8

Description

@jsonmp-k8

Summary

Kagent currently operates as a single-cluster deployment — the UI talks to one controller (port 8083) which proxies to agent pods in the same cluster. This proposal introduces a hub and spoke architecture that enables a central UI to manage agents across multiple Kubernetes clusters.

Motivation

Organizations running Kubernetes across multiple clusters (multi-cloud, on-prem, hybrid) need a unified management plane for their AI agents. Today, each cluster requires its own kagent UI instance with no cross-cluster visibility or management.

A hub and spoke model would provide:

  • Unified view of agents, conversations, and tools across all clusters
  • Single pane of glass for operations teams
  • Cross-cluster agent communication via A2A protocol
  • Centralized auth with a shared OIDC provider (aligns with EP-476 Dex plans)
  • Spoke autonomy — each cluster continues to function independently if the hub is unavailable

Proposed Architecture

                    ┌─────────────────────────┐
                    │      Hub (Central)       │
                    │  ┌───────────────────┐   │
                    │  │   Next.js UI      │   │
                    │  │  (cluster-aware)  │   │
                    │  └────────┬──────────┘   │
                    │           │               │
                    │  ┌────────▼──────────┐   │
                    │  │  Federation API   │   │
                    │  │  (new Go service) │   │
                    │  └──┬─────┬──────┬───┘   │
                    │     │     │      │        │
                    │  ┌──▼─┐ ┌▼──┐ ┌─▼──┐    │
                    │  │ DB │ │Auth│ │Reg.│    │
                    │  └────┘ └───┘ └────┘    │
                    └─────┼─────┼──────┼───────┘
                          │     │      │
              ┌───────────┘     │      └──────────┐
              ▼                 ▼                  ▼
    ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
    │  Spoke: Cluster A│ │  Spoke: Cluster B│ │  Spoke: Cluster C│
    │  ┌─────────────┐ │ │  ┌─────────────┐ │ │  ┌─────────────┐ │
    │  │  Controller  │ │ │  │  Controller  │ │ │  │  Controller  │ │
    │  │  (port 8083) │ │ │  │  (port 8083) │ │ │  │  (port 8083) │ │
    │  └──────┬───────┘ │ │  └──────┬───────┘ │ │  └──────┬───────┘ │
    │  ┌──────▼───────┐ │ │  ┌──────▼───────┐ │ │  ┌──────▼───────┐ │
    │  │  Agent Pods  │ │ │  │  Agent Pods  │ │ │  │  Agent Pods  │ │
    │  └──────────────┘ │ │  └──────────────┘ │ │  └──────────────┘ │
    └──────────────────┘ └──────────────────┘ └──────────────────┘

Key Components

1. Federation API (New Go Service)

A lightweight Go service deployed in the hub cluster:

  • Spoke Registry — stores cluster endpoints, credentials, and health status
  • Request Routing — proxies API calls to the correct spoke controller based on cluster context
  • Aggregation — merges agent/conversation/tool lists from all spokes for unified views
  • Auth Gateway — single OIDC/auth layer (shared Dex instance per EP-476)
  • Health Monitoring — periodic health checks on spoke controllers

API design wraps existing controller endpoints with a cluster prefix:

GET  /api/clusters                              # list registered spokes
GET  /api/clusters/{clusterId}/agents            # agents on a specific spoke
POST /api/clusters/{clusterId}/a2a/{ns}/{name}   # A2A proxy to specific spoke
GET  /api/agents                                 # aggregated view across all spokes

2. UI Changes (Cluster-Aware Next.js)

Modify the existing Next.js UI to support multi-cluster context:

  • Cluster Selector — top-level nav element to switch cluster context
  • Aggregated Views — agent list, conversations, tools show data across all clusters with cluster badges
  • Server Actions — update ui/src/app/actions/ to route through the federation API with cluster context
  • A2A Streaming — update KagentA2AClient to include cluster context when proxying SSE streams

Key files to modify:

  • ui/src/app/actions/ — all server actions get a clusterId parameter
  • ui/src/components/ — add cluster context/selector components
  • ui/src/lib/a2aClient.ts — route A2A calls through federation API
  • New ui/src/app/clusters/ — cluster management pages

3. Spoke Agent

A small agent deployed alongside the existing controller in each spoke cluster:

  • Registers itself with the hub on startup
  • Exposes a secure endpoint for the hub to reach the local controller
  • Handles mTLS or token-based auth for hub-to-spoke communication
  • Reports cluster metadata (version, capacity, agent count)

Connectivity Options

Approach When to Use
Direct (Hub → Spoke Ingress) Spokes are reachable from hub network; expose controller via Ingress/Gateway with mTLS
Tunnel (Spoke → Hub) Spokes behind firewalls; spoke agent opens persistent gRPC/WebSocket tunnel to hub
Hybrid Mix based on network topology

The tunnel approach is more practical for real-world multi-cloud/on-prem deployments since spokes often can't expose inbound ports.

Design Principles

  1. Hub has no agents — purely a management/routing plane. Agents always run in spokes. Keeps the blast radius local.
  2. Spokes remain autonomous — if the hub goes down, each spoke still works independently with its local controller. Critical for resilience.
  3. Leverage existing A2A protocol — the current A2A JSON-RPC + SSE streaming model works as-is. The federation API adds a routing layer on top.
  4. Database strategy — hub gets its own PostgreSQL for cluster registry + aggregated metadata. Spokes keep their own SQLite/Postgres. No shared database across clusters.

Implementation Phases

Phase 1 — Federation API + Cluster Registry

  • New Go module under go/federation/ using existing go/api/ types
  • Spoke registration (CRD or DB-backed registry)
  • Proxy layer that forwards requests to spoke controllers
  • Health check loop

Phase 2 — UI Multi-Cluster Support

  • Cluster context provider in React (extend existing Zustand store)
  • Cluster selector component
  • Update all server actions to accept cluster context
  • Aggregated list views with cluster filtering

Phase 3 — Spoke Agent + Secure Connectivity

  • Helm chart for spoke agent deployment
  • mTLS or OIDC token exchange between hub and spokes
  • Tunnel mode for restricted network environments

Phase 4 — Cross-Cluster Agent Communication

  • A2A routing across clusters (Agent in Cluster A talks to Agent in Cluster B)
  • Federation API becomes the A2A router for cross-cluster messages
  • Cross-cluster addressing scheme: agent-name.namespace.cluster-id

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions