CSA Swarm Platform

A multi-user, cloud-native platform that simulates a structured design session between AI-powered Cloud Solution Architect (CSA) personas and a Director CSA — producing grounded, enterprise-grade architecture deliverables for the Oil & Gas Energy (OGE) observability domain.

What It Does

The platform replaces a manual, asynchronous process where a PM has to interview multiple CSAs and synthesize their inputs. Instead, a PM describes a design problem in plain language, and a swarm of specialized AI agents debates it in real time — each agent advocating from a distinct domain perspective — before a Director agent synthesizes the outcome. The session ends with AI-generated, audit-ready deliverables scoped to what was actually discussed.

End-to-End User Workflow

1. Sign In (Entra ID / MSAL)
       ↓
2. Create a Session (name it, choose a model)
       ↓
3. Upload Grounding Context  ← optional but powerful
   (customer interview transcripts, PDFs, URLs, GitHub repos)
       ↓
4. Configure Agent Personas  ← optional
   (edit CSA display names, domains, analytical lenses, system prompts
    or auto-generate a new persona from a transcript using Bootstrap)
       ↓
5. Run Debate Rounds  ← core experience
   (PM types a question / design prompt → agents respond live via SSE;
    all prior rounds are visible inline with timestamps)
       ↓
6. Generate Deliverables
   (Architecture Recommendation, Project Plan, Technical Specs, Roadmap, Diagram;
    all rounds and deliverables are persisted in Cosmos DB per user per session)

System Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                           Browser (Next.js 14)                          │
│       Sessions │ Context │ Agent Config │ Debate │ Deliverables        │
└───────────────────────────────┬─────────────────────────────────────────┘
                                │ HTTPS + Entra ID Bearer JWT
                                ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    FastAPI Backend (Azure Container Apps)                │
│                                                                         │
│  /api/sessions        – session CRUD                                    │
│  /api/sessions/:id/context    – grounding source management             │
│  /api/sessions/:id/agent-config/bootstrap  – LLM persona generation     │
│  /api/sessions/:id/rounds     – debate orchestration  [SSE stream]      │
│  /api/sessions/:id/recommendations – deliverable generation [SSE stream]│
└─────┬──────────────────────────────────────────┬───────────────────────┘
      │ DefaultAzureCredential (Managed Identity) │
      ▼                                           ▼
┌─────────────┐                         ┌─────────────────────────────────┐
│  Cosmos DB  │                         │  Azure AI Services              │
│  NoSQL      │                         │  (Azure OpenAI endpoint)        │
│  (private   │                         │  gpt-4o / gpt-4.1 / o4-mini    │
│   endpoint) │                         │  Microsoft Agent Framework SDK  │
└─────────────┘                         └─────────────────────────────────┘

Key infrastructure decisions:

Both Container Apps run inside a VNet-injected Container Apps Environment, reaching Cosmos DB over a private endpoint — publicNetworkAccess is disabled on Cosmos.
All credentials are eliminated via Azure Managed Identity + RBAC; no secrets in app settings.
Authentication is Microsoft Entra ID only: the frontend uses MSAL and sends a Bearer JWT; the backend validates it against the Entra tenant's JWKS endpoint.

Components

Layer	Technology	Purpose
Frontend	Next.js 14 (App Router), TypeScript, Tailwind	SPA UI; consumes SSE streams for live agent output
Backend API	Python 3.12, FastAPI	Auth, session management, orchestration entry points
Debate Workflow	`workflows/debate_workflow.py`	Parallel CSA dispatch + sequential Director synthesis
Agent Layer	`agents/role_agents.py` + Microsoft Agent Framework	Creates Azure OpenAI Assistants per role, enforces grounding constraints
Swarm Orchestrator	`swarm/orchestrator.py`	Deliverable generation using debate transcript as source of truth
Context Loader	`swarm/context_loader.py`	Parses uploaded files (PDF, DOCX, TXT, MD) and fetches URLs/GitHub repos
Persistence	Azure Cosmos DB NoSQL (serverless)	Multi-user session isolation; single container, partition key: `/session_id`
Identity	Azure Managed Identity + Entra ID	Zero-secret architecture end-to-end
Infrastructure	Bicep (ACA, Cosmos, ACR, AI Services, VNet)	Fully reproducible with `azd up`

The Agentic Framework and Debate Process

This is the core of the platform. Understanding it is key to understanding why the outputs are trustworthy.

What is the Microsoft Agent Framework here?

Each agent is an OpenAI Assistant (Azure OpenAI Assistants API) created on demand for a single debate round via AzureOpenAIChatClient.as_agent(). The framework manages the thread lifecycle, tool calling surface, and streaming. A fresh assistant instance is created per round per role — there is no shared state between rounds at the agent level. History is instead passed explicitly as structured context in each user message.

Role Configuration (`config/roles.yaml`)

Agents are not generic AI personas. Each one is loaded from roles.yaml, which defines:

display_name — the label shown in the UI (e.g., CSA – Observability & Standardization)
domain — the technical specialty the agent grounds into (e.g., Oil & Gas operational monitoring)
lens — the analytical frame the agent argues from (e.g., Operational standardization and efficiency)
system_prompt — a detailed instruction block that defines output format, decision heuristics, and precisely what the agent is and is not allowed to say

The system prompt is prepended with a non-negotiable grounding constraint that prohibits the agent from citing any vendor, tool, or product not explicitly present in either the session grounding context or official Microsoft documentation. This prevents hallucination at the persona level.

The Grounding Context Block

Before any debate round runs, the PM can upload grounding sources — customer interview transcripts, architecture PDFs, Azure documentation URLs, or entire GitHub repositories. These are stored in Cosmos DB per session and assembled into a single grounding_block string. This block is injected into every CSA user message at the top of each round, giving all agents the same factual foundation.

A Single Debate Round — Step by Step

PM types: "What should the baseline Azure monitoring architecture look like?"
                              │
                              ▼
           run_round_streaming() is called
                              │
                              ├── loads all prior rounds from Cosmos (round history)
                              ├── loads grounding_block from Cosmos
                              └── merges session-level agent_config overrides on top of roles.yaml defaults
                              │
            ┌─────────────────┴──────────────────────────────┐
            │         Phase 1 — CSAs run in PARALLEL          │
            │                                                  │
            │  asyncio.gather(                                 │
            │    _run_csa("csa_1"),   ← CSA Observability    │
            │    _run_csa("csa_2"),   ← CSA IT Operations    │
            │    _run_csa("csa_3"),   ← CSA Customer Voice   │
            │  )                                               │
            │                                                  │
            │  Each CSA receives the same user message:        │
            │    [grounding_block]                             │
            │    [prior round summaries, truncated to 600 ch.] │
            │    PM: <pm_message>                              │
            │                                                  │
            │  Each CSA responds from its own domain + lens.   │
            │  Disagreements with other CSAs are explicit and  │
            │  must be grounded in stated facts.               │
            └──────────────────┬─────────────────────────────-┘
                               │  all CSA responses collected
                               ▼
            ┌──────────────────────────────────────────────────┐
            │       Phase 2 — Director CSA runs SEQUENTIALLY    │
            │                                                   │
            │  Director receives:                               │
            │    [prior round summaries]                        │
            │    PM message this round: <pm_message>            │
            │    === CSA RESPONSES THIS ROUND ===               │
            │      CSA Observability   (full text)              │
            │      CSA IT Operations   (full text)              │
            │      CSA Customer Voice  (full text)              │
            │    === END CSA RESPONSES ===                      │
            │    "As Director CSA, please synthesize..."        │
            │                                                   │
            │  Director response is STREAMED token-by-token     │
            │  to the browser via SSE as it generates.          │
            └──────────────────┬────────────────────────────────┘
                               │  round_complete event emitted
                               ▼
                  Round saved to Cosmos DB

Why Run CSAs in Parallel?

Each CSA has a distinct domain and lens. They are not meant to see each other's answers in real time — that would anchor them to the first response. Running them in parallel with asyncio.gather ensures each CSA argues independently from their grounding, producing genuinely different perspectives for the Director to synthesize. The Director is specifically designed to surface disagreements, flag unaddressed gaps, and choose viability over consensus.

SSE Streaming — How the UI Stays Live

The backend does not wait for all agents to finish before responding. It uses FastAPI's StreamingResponse with text/event-stream and yields events as they occur:

Event Type	When	What It Contains
`csa_complete`	After each CSA finishes	Full response text + display metadata
`dir_chunk`	On every token from Director	Incremental text chunk
`round_complete`	After Director finishes	Full structured round object
`error`	On any exception	Error message
`[DONE]`	Always last	Sentinel to close the stream

The frontend consumes these via a custom streamDebateRound() function and renders each CSA card as its event arrives, and streams the Director response character-by-character.

Role Customisation and Bootstrap

Session-level role overrides can be applied without changing roles.yaml. A PM can:

Manually edit display_name, domain, lens, or system_prompt for any role via the Agent Config page
Bootstrap a new persona from a real meeting transcript — the backend sends the transcript to the same LLM with a meta-prompt that extracts domain, lens, and a full structured system prompt, then returns a draft for PM review before saving it to the session

This means the swarm can be tuned to a specific customer reality within minutes, without any code changes.

Deliverable Generation

After one or more debate rounds, the PM can generate any of the following documents:

Document Type	Content
Architecture Recommendation	Executive summary, agreed principles, layered architecture by component, open risks, next steps
Project Plan	Phased timeline (honours any explicit duration stated in the debate), milestones, resource requirements
Technical Specifications	Per-component specs, data flow, integration table, non-functional requirements, decisions log
Detailed Roadmap	Short / mid / long-term workstreams derived from debate outcomes
Risk Register	Identified risks with likelihood, impact, and mitigation owners
GSA Compliance Assessment	Mapping of the proposed architecture to GSA / federal compliance controls
User Stories & Tasks	Importable Azure DevOps and GitHub Projects CSV with Epics, User Stories, and Tasks

All documents are generated by the Synthesis Orchestrator (swarm/orchestrator.py), which sends the full round history as context to the model under the same grounding constraint: every statement in the document must trace back to what was said in the debate or to official Microsoft documentation. The output is streamed to the browser and persisted to Cosmos DB.

Data Model (Cosmos DB)

All data lives in a single container, partitioned by session_id. Documents are discriminated by a type field:

Type	Key Fields
`session`	`session_id`, `user_id`, `title`, `agent_config`
`round`	`session_id`, `round_number`, `pm_message`, `csa_responses{}`, `dir_response{}`
`recommendation`	`session_id`, `doc_type`, `content`
`grounding`	`session_id`, `position`, `source_type`, `content`, `pinned`

Each user only sees their own sessions — all queries filter by user_id derived from the validated JWT.

Security Model

Concern	Approach
Authentication	Entra ID MSAL (frontend) + JWT validation against JWKS (backend)
No credentials in config	All Azure SDK calls use `DefaultAzureCredential` (Managed Identity in ACA)
Cosmos DB network isolation	Private endpoint; `publicNetworkAccess: Disabled`
Cosmos DB data plane auth	RBAC (`Cosmos DB Built-in Data Contributor`) — no account keys
Azure OpenAI auth	RBAC (`Cognitive Services OpenAI Contributor`) — no API keys
Container image pull	Managed Identity with `AcrPull` — no registry credentials
Grounding constraint enforcement	Hardcoded preamble in every agent's system prompt

See SECURITY.md for the full threat model and how to report vulnerabilities.

Quick Start — Deploy to Azure

The entire platform can be deployed end-to-end with the Azure Developer CLI (azd).

Prerequisites

Azure subscription with permission to create resource groups and role assignments
Azure CLI (az) and Azure Developer CLI (azd)
Docker (used by azd to build container images)
An Azure region with gpt-4o capacity (e.g., eastus2, westus3, swedencentral)
Your own Microsoft Entra ID app registration for SSO. This repo does not ship a shared client ID — you must create one in your own tenant before deploying. Configure it as:
- Supported account types: single-tenant (or as required by your organisation)
- Platform: Single-page application (SPA)
- Redirect URIs: http://localhost:3000 for local dev, plus your eventual frontend Container App URL after the first azd up
- API → Expose an API: an Application ID URI and a delegated scope (e.g. access_as_user); the postprovision script will set the URI to the deployed backend FQDN automatically
- Note the Application (client) ID and Directory (tenant) ID — you will pass these to azd env set below

Deploy

# 1. Authenticate
az login
azd auth login

# 2. Initialise an azd environment (you'll be prompted for a name and location)
azd env new <env-name>
azd env set ENTRA_CLIENT_ID <your-app-registration-client-id>
azd env set ENTRA_TENANT_ID <your-aad-tenant-id>

# 3. Provision infrastructure and deploy both apps
azd up

azd up will:

Provision all Azure resources via Bicep (infra/main.bicep) — VNet, Container Apps Environment, ACR, Cosmos DB (private endpoint), AI Services + gpt-4o deployment, Log Analytics, Key Vault, Storage, AI Foundry hub/project.
Build the backend and frontend Docker images and push them to the new ACR.
Deploy both Container Apps with managed identities and the necessary RBAC role assignments.
Run the postprovision hook in scripts/ to set the Entra ID app registration identifierUris to the deployed backend URL.

When complete, azd prints the frontend URL. Sign in with an account in your tenant to begin a session.

Local Development

# Backend (Python 3.12)
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env       # then fill in COSMOS_ENDPOINT, FOUNDRY_OPENAI_ENDPOINT
uvicorn main:app --reload  # http://localhost:8000

# Frontend (Node.js 20+)
cd frontend
npm install
cp .env.local.example .env.local
npm run dev                # http://localhost:3000

For local dev the backend uses DefaultAzureCredential, so az login against an account that has the Cognitive Services OpenAI User and Cosmos DB Built-in Data Contributor roles on the deployed resources.

Set AUTH_ENABLED=false in .env to bypass JWT validation for fast iteration.

Tear Down

azd down --purge

Repository Layout

agents/      Per-role agent factory (Microsoft Agent Framework wrapper)
api/         FastAPI app, routes, JWT auth
config/      Settings loader and roles.yaml (CSA persona definitions)
frontend/    Next.js 14 App Router UI
infra/       Bicep templates (main + cosmos + containerapp modules)
scripts/     azd lifecycle hooks (e.g. set-app-identifier-uri.sh)
swarm/       Cosmos store, context loader, deliverable orchestrator
workflows/   Debate workflow (parallel CSAs + sequential Director)
Dockerfile.backend / Dockerfile.frontend
azure.yaml   azd service map

Contributing

Issues and pull requests are welcome. See CONTRIBUTING.md for the workflow. For security issues, please follow the disclosure process in SECURITY.md.

License

Released under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSA Swarm Platform

What It Does

End-to-End User Workflow

System Architecture

Components

The Agentic Framework and Debate Process

What is the Microsoft Agent Framework here?

Role Configuration (`config/roles.yaml`)

The Grounding Context Block

A Single Debate Round — Step by Step

Why Run CSAs in Parallel?

SSE Streaming — How the UI Stays Live

Role Customisation and Bootstrap

Deliverable Generation

Data Model (Cosmos DB)

Security Model

Quick Start — Deploy to Azure

Prerequisites

Deploy

Local Development

Tear Down

Repository Layout

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github		.github
.vscode		.vscode
agents		agents
api		api
config		config
frontend		frontend
infra		infra
scripts		scripts
swarm		swarm
workflows		workflows
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.backend		Dockerfile.backend
Dockerfile.frontend		Dockerfile.frontend
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
azure.yaml		azure.yaml
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CSA Swarm Platform

What It Does

End-to-End User Workflow

System Architecture

Components

The Agentic Framework and Debate Process

What is the Microsoft Agent Framework here?

Role Configuration (config/roles.yaml)

The Grounding Context Block

A Single Debate Round — Step by Step

Why Run CSAs in Parallel?

SSE Streaming — How the UI Stays Live

Role Customisation and Bootstrap

Deliverable Generation

Data Model (Cosmos DB)

Security Model

Quick Start — Deploy to Azure

Prerequisites

Deploy

Local Development

Tear Down

Repository Layout

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Role Configuration (`config/roles.yaml`)

Packages