Skip to content

feat: add term-llm Hub and delegation#801

Open
sam-saffron-jarvis wants to merge 9 commits into
SamSaffron:mainfrom
sam-saffron-jarvis:feat/serve-hub-v1
Open

feat: add term-llm Hub and delegation#801
sam-saffron-jarvis wants to merge 9 commits into
SamSaffron:mainfrom
sam-saffron-jarvis:feat/serve-hub-v1

Conversation

@sam-saffron-jarvis

@sam-saffron-jarvis sam-saffron-jarvis commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What

Adds term-llm Hub as a single dashboard/proxy over multiple term-llm web nodes, plus opt-in cross-node delegation.

Hub v1 includes:

  • term-llm serve hub
  • Hub bearer auth (--auth bearer, --token, TERM_LLM_HUB_TOKEN) with --auth none restricted to loopback
  • static/config, local-store, and contain node discovery
  • /node/<id>/... node UI proxy with server-side bearer-token injection
  • Back to Hub UI context injection
  • node health/status API with dashboard diagnostics
  • direct and reverse/private node connection modes
  • cross-node delegation APIs
  • hub_delegate and hub_check_delegation tools
  • jobs-v2 delegated run creation/trigger/status/cancel
  • delegation reconnect/recovery hardening for partial job/trigger failures
  • delegation ledger/audit trail
  • target-node sidebar sessions titled like Delegation from jarvis-home
  • Hub Delegations dashboard panel with returned artifact previews

Safety model

Delegation is default-off and opt-in per node:

delegation:
  enabled: true

Without delegation.enabled: true, a node can neither originate nor accept delegated work.

Accepting delegated work still requires a target workdir:

delegation:
  enabled: true
  accept_from: [jarvis-home]
  workdir: /work

Existing narrowing knobs remain available:

  • to
  • accept_from
  • allowed_agents
  • allowed_models
  • max_in_flight

But the user-facing safety story is the simple switch: delegation participation is off until the operator enables it.

Hub auth

Hub now has first-party bearer auth:

TERM_LLM_HUB_TOKEN=... term-llm serve hub --auth bearer
  • --auth bearer is the default.
  • --auth none is allowed only on loopback.
  • Public/non-loopback binds require Hub auth.
  • /api/connect remains node-authenticated so reverse nodes can connect with their node token.
  • Node-originated delegation calls keep using node auth (X-Term-LLM-Node-ID + node bearer token), not the operator Hub token.

This keeps Hub auth simple: one operator token now; no users/RBAC/OAuth in this PR.

Reverse/private nodes

Adds reverse connection mode for private nodes where the Hub has a public address but cannot reach the node directly.

Hub config:

nodes:
  - id: artist
    name: Artist
    connection: reverse
    base_path: /chat
    token: <artist-token>
    delegation:
      enabled: true
      accept_from: [jarvis-home]
      workdir: /work

Private node:

term-llm serve web jobs \
  --base-path /chat \
  --token "$ARTIST_TOKEN" \
  --hub-url https://hub.example.com \
  --hub-node-id artist \
  --hub-connect reverse

The node dials GET /api/connect as a websocket using its node id + bearer token. Hub requests then use the same node record and policy path as direct nodes; only the transport changes. Both ends send websocket pings every 20s, require pong/read activity within 60s, and use write deadlines so silent half-open sockets are detected and the node reconnects instead of sitting stale.

Reverse transport now streams responses in bounded chunks over the websocket, with cancellation propagation. Direct-node streams still use the normal reverse proxy path.

This keeps the architecture simple:

  • direct nodes: Hub → node HTTP
  • reverse nodes: node → Hub websocket, Hub sends request frames over it
  • same /node/<id>/... proxy
  • same delegation APIs
  • same token injection/redaction model
  • no offline queue or second delegation API

Dashboard diagnostics

Node cards now surface token-safe diagnostics for common bad setups:

  • reverse node disconnected
  • missing node bearer token
  • delegation enabled without a workdir
  • delegation accept policy present but no jobs capability detected
  • obvious origin/target policy mismatches

The point is to show the rake before someone steps on it.

Delegation recovery hardening

Partial target failures are now more recoverable:

  • delegation records are persisted before target job creation/trigger where possible
  • known jobs without known runs stay refreshable instead of becoming dead terminal errors
  • trigger-response-loss can recover by polling target runs
  • cancel refreshes/reuses recovered run IDs
  • stale transport errors clear when a later refresh finds the real run state

Still no offline queue. Offline nodes fail fast; reconnect lets subsequent refresh/cancel paths recover known work.

Node-anywhere docs

Docs cover both modes:

  • direct nodes can run on the same machine, Docker/contain, VM, cloud runner, remote server, private network/tunnel
  • reverse nodes can run behind NAT/firewalls with no inbound port, as long as they can dial the Hub

Demo

Standalone isolated demo video, using two clean node homes/session stores rather than Sam's real Jarvis setup:

  • Jarvis Home asks Artist to draw a picture
  • Hub dashboard shows delegated work
  • Artist sidebar shows Delegation from jarvis-home
  • back to Hub after success
  • original Jarvis Home session shows the returned image

Video artifact:

/chat/files/term-llm-hub-standalone-demo.mp4

Tests

go test ./cmd ./internal/hub
go test ./...
go build ./...

Live smoke:

Hub bearer auth rejects unauthenticated /api/nodes
Hub configured with connection: reverse
Private node started with --hub-connect reverse
GET /api/nodes shows connection=reverse and connected status
GET /node/private/healthz succeeds through the reverse websocket

Adds 'term-llm serve hub', a launcher/control plane fronting many term-llm
serves (nodes), as a more ambitious successor to the serve-gateway prototype.

internal/hub (new): Node abstraction + resolvers
- Node: identity, URL/base path, bearer token, source
- Registry over pluggable resolvers with precedence dedupe and soft failures
- Static config resolver (YAML/JSON, --config), contain workspace resolver
  (via new contain.ReadWebConfig), local JSON store (0600) for UI-added nodes
- Concurrent health prober: reachability, latency, agent/version/capabilities

serve hub command:
- Polished dashboard (embedded template/CSS): node cards with status dot,
  latency, source badge, capabilities, Open action, Add Node modal with
  Test connection, remove for local nodes; term-llm Hub branding
- /node/<id>/* reverse proxy: injects node token server-side, strips client
  Authorization/Cookie/X-Api-Key and forwarding headers, no env proxy on the
  backend transport, SSE-safe timeouts, rebases <base>/TERM_LLM_UI_PREFIX
  onto the mount, rejects encoded separators and dot-dot traversal
- Loopback-only bind until hub auth exists (documented TODOs: auth,
  self-registration, mTLS, host-based routing)

Hub integration in serve web:
- Proxy injects window.TERM_LLM_HUB into node HTML; the web UI sidebar now
  renders a Back to Hub link just below the Widgets entry when present
- Native flags --hub-url/--hub-node-id/--hub-node-name for direct hub
  awareness without the proxy
- healthz reports agent/version/capabilities to bearer-authenticated callers
  so the hub dashboard can display them

contain: new env helpers (EnvPath, ReadEnvFile, ReadWebConfig)

Tests: hub resolvers/registry/store/prober, contain env parsing, proxy token
injection + credential stripping, HTML rebase + hub context injection, node
API token non-leakage, add/test/remove flow, JS sidebar back-to-hub tests.

Docs: new Hub guide.
@sam-saffron-jarvis sam-saffron-jarvis changed the title feat: add term-llm Hub feat: add term-llm Hub and delegation Jun 14, 2026
@sam-saffron-jarvis sam-saffron-jarvis force-pushed the feat/serve-hub-v1 branch 2 times, most recently from 8038e2e to 04a121c Compare June 14, 2026 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant