[arch] Design discussion: shared pluggable service for cross-agent state (feedback, sessions, traces)

## Background

The v0.12.0 enterprise feature track added several stateful surfaces to BaseAgent's server layer:

- `POST/GET /v1/sessions` — conversation persistence
- `GET /v1/traces` — trace inspection
- `POST/GET/PATCH /v1/feedback` — user feedback collection
- `GET /metrics` — Prometheus

All four follow the same shape: BaseAgent owns a pluggable store (Null / SQLite / Postgres), the server layer exposes REST endpoints, and the gateway-template proxies through. This was the right call when most deployments had one or two agents.

In multi-agent deployments (eg 10 agents fronted by a single gateway and UI) the per-agent ownership starts to chafe:

- **Duplication** — 10 Postgres pools, 10 schema migrations, 10 housekeeping loops, all writing the same tables
- **Fan-out for cross-agent queries** — "show me all thumbs-down feedback this week" requires the dashboard to hit 10 endpoints and merge client-side, OR query the shared Postgres directly out-of-band (the schema becomes a de-facto API)
- **Schema becomes a contract** — once N agents write the same table, schema changes need coordinated rollouts
- **No auth boundary** between agents sharing storage
- **Sessions are conceptually cross-agent** — a user talks to "the system," not to agent #4. Today there's no clean way to follow a conversation that gets routed to different agents

## What to design

Open question: what does the shape of a shared 'agent platform' service look like, and which surfaces move there?

Initial options to discuss (not a decision, a starting point):

1. **Status quo + documentation.** Document that multi-agent deployments should point all agents at the same Postgres and treat the shared schema as a stable join point. Cheapest, but the rough edges remain.

2. **Full extraction.** A new FastAPI service (working name: \`fipsagents-platform\`) owns sessions + traces + feedback. BaseAgent becomes a thin client. Gateway routes \`/v1/sessions\`, \`/v1/traces\`, \`/v1/feedback\` to the platform service rather than fanning out to per-agent endpoints. One Postgres pool, one REST surface, one dashboard backend.

3. **Partial extraction.** Move feedback + sessions (genuinely cross-agent) but leave traces in BaseAgent shipping to an Otel collector (industry-standard answer, already partially done via \`OTELTraceStore\`). Less moving parts, addresses the highest-value duplication.

4. **Something else.** Maybe BaseAgent keeps everything but grows a 'remote store' adapter for each — same code, configurable backend (in-process vs HTTP). Lets a deployer choose per-feature without forcing a topology.

## Things to think about during the discussion

- **Migration story.** The longer we wait, the more deployments depend on the per-agent endpoints. Cheap to do now while there's effectively one production user; observable migration later
- **Memory** is intentionally NOT in this list — \`self.memory\` is per-agent by design, and MemoryHub already provides the centralized option
- **Metrics** is also separate — Prometheus scrape targets are inherently per-pod, that's fine
- **Auth** — if multiple agents share a backend, who's allowed to write what? Today there's no model for this
- **Deployment friction** — every service we extract is another Helm chart, another readiness probe, another thing for ops to think about. Worth it iff the cross-agent benefits land
- **Pluggability shape** — same \`FeedbackStore\`/\`SessionStore\`/\`TraceStore\` ABCs we have today, just running in a different process? Or a different abstraction entirely?

## Out of scope for this issue

This is a **design discussion** issue, not an implementation. The goal is to come out with a written architecture decision (in \`docs/architecture.md\` or similar) that we can point at when implementing.

Captured during the v0.12.0 feedback feature track. Conversation context: the per-agent ownership felt fine for one or two agents but the smell got louder once we considered the 10-agent case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[arch] Design discussion: shared pluggable service for cross-agent state (feedback, sessions, traces) #112

Background

What to design

Things to think about during the discussion

Out of scope for this issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[arch] Design discussion: shared pluggable service for cross-agent state (feedback, sessions, traces) #112

Description

Background

What to design

Things to think about during the discussion

Out of scope for this issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions