A multi-user, cloud-native platform that simulates a structured design session between AI-powered Cloud Solution Architect (CSA) personas and a Director CSA — producing grounded, enterprise-grade architecture deliverables for the Oil & Gas Energy (OGE) observability domain.
The platform replaces a manual, asynchronous process where a PM has to interview multiple CSAs and synthesize their inputs. Instead, a PM describes a design problem in plain language, and a swarm of specialized AI agents debates it in real time — each agent advocating from a distinct domain perspective — before a Director agent synthesizes the outcome. The session ends with AI-generated, audit-ready deliverables scoped to what was actually discussed.
1. Sign In (Entra ID / MSAL)
↓
2. Create a Session (name it, choose a model)
↓
3. Upload Grounding Context ← optional but powerful
(customer interview transcripts, PDFs, URLs, GitHub repos)
↓
4. Configure Agent Personas ← optional
(edit CSA display names, domains, analytical lenses, system prompts
or auto-generate a new persona from a transcript using Bootstrap)
↓
5. Run Debate Rounds ← core experience
(PM types a question / design prompt → agents respond live via SSE;
all prior rounds are visible inline with timestamps)
↓
6. Generate Deliverables
(Architecture Recommendation, Project Plan, Technical Specs, Roadmap, Diagram;
all rounds and deliverables are persisted in Cosmos DB per user per session)
┌─────────────────────────────────────────────────────────────────────────┐
│ Browser (Next.js 14) │
│ Sessions │ Context │ Agent Config │ Debate │ Deliverables │
└───────────────────────────────┬─────────────────────────────────────────┘
│ HTTPS + Entra ID Bearer JWT
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ FastAPI Backend (Azure Container Apps) │
│ │
│ /api/sessions – session CRUD │
│ /api/sessions/:id/context – grounding source management │
│ /api/sessions/:id/agent-config/bootstrap – LLM persona generation │
│ /api/sessions/:id/rounds – debate orchestration [SSE stream] │
│ /api/sessions/:id/recommendations – deliverable generation [SSE stream]│
└─────┬──────────────────────────────────────────┬───────────────────────┘
│ DefaultAzureCredential (Managed Identity) │
▼ ▼
┌─────────────┐ ┌─────────────────────────────────┐
│ Cosmos DB │ │ Azure AI Services │
│ NoSQL │ │ (Azure OpenAI endpoint) │
│ (private │ │ gpt-4o / gpt-4.1 / o4-mini │
│ endpoint) │ │ Microsoft Agent Framework SDK │
└─────────────┘ └─────────────────────────────────┘
Key infrastructure decisions:
- Both Container Apps run inside a VNet-injected Container Apps Environment, reaching Cosmos DB over a private endpoint —
publicNetworkAccessis disabled on Cosmos. - All credentials are eliminated via Azure Managed Identity + RBAC; no secrets in app settings.
- Authentication is Microsoft Entra ID only: the frontend uses MSAL and sends a Bearer JWT; the backend validates it against the Entra tenant's JWKS endpoint.
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Next.js 14 (App Router), TypeScript, Tailwind | SPA UI; consumes SSE streams for live agent output |
| Backend API | Python 3.12, FastAPI | Auth, session management, orchestration entry points |
| Debate Workflow | workflows/debate_workflow.py |
Parallel CSA dispatch + sequential Director synthesis |
| Agent Layer | agents/role_agents.py + Microsoft Agent Framework |
Creates Azure OpenAI Assistants per role, enforces grounding constraints |
| Swarm Orchestrator | swarm/orchestrator.py |
Deliverable generation using debate transcript as source of truth |
| Context Loader | swarm/context_loader.py |
Parses uploaded files (PDF, DOCX, TXT, MD) and fetches URLs/GitHub repos |
| Persistence | Azure Cosmos DB NoSQL (serverless) | Multi-user session isolation; single container, partition key: /session_id |
| Identity | Azure Managed Identity + Entra ID | Zero-secret architecture end-to-end |
| Infrastructure | Bicep (ACA, Cosmos, ACR, AI Services, VNet) | Fully reproducible with azd up |
This is the core of the platform. Understanding it is key to understanding why the outputs are trustworthy.
Each agent is an OpenAI Assistant (Azure OpenAI Assistants API) created on demand for a single debate round via AzureOpenAIChatClient.as_agent(). The framework manages the thread lifecycle, tool calling surface, and streaming. A fresh assistant instance is created per round per role — there is no shared state between rounds at the agent level. History is instead passed explicitly as structured context in each user message.
Agents are not generic AI personas. Each one is loaded from roles.yaml, which defines:
display_name— the label shown in the UI (e.g., CSA – Observability & Standardization)domain— the technical specialty the agent grounds into (e.g., Oil & Gas operational monitoring)lens— the analytical frame the agent argues from (e.g., Operational standardization and efficiency)system_prompt— a detailed instruction block that defines output format, decision heuristics, and precisely what the agent is and is not allowed to say
The system prompt is prepended with a non-negotiable grounding constraint that prohibits the agent from citing any vendor, tool, or product not explicitly present in either the session grounding context or official Microsoft documentation. This prevents hallucination at the persona level.
Before any debate round runs, the PM can upload grounding sources — customer interview transcripts, architecture PDFs, Azure documentation URLs, or entire GitHub repositories. These are stored in Cosmos DB per session and assembled into a single grounding_block string. This block is injected into every CSA user message at the top of each round, giving all agents the same factual foundation.
PM types: "What should the baseline Azure monitoring architecture look like?"
│
▼
run_round_streaming() is called
│
├── loads all prior rounds from Cosmos (round history)
├── loads grounding_block from Cosmos
└── merges session-level agent_config overrides on top of roles.yaml defaults
│
┌─────────────────┴──────────────────────────────┐
│ Phase 1 — CSAs run in PARALLEL │
│ │
│ asyncio.gather( │
│ _run_csa("csa_1"), ← CSA Observability │
│ _run_csa("csa_2"), ← CSA IT Operations │
│ _run_csa("csa_3"), ← CSA Customer Voice │
│ ) │
│ │
│ Each CSA receives the same user message: │
│ [grounding_block] │
│ [prior round summaries, truncated to 600 ch.] │
│ PM: <pm_message> │
│ │
│ Each CSA responds from its own domain + lens. │
│ Disagreements with other CSAs are explicit and │
│ must be grounded in stated facts. │
└──────────────────┬─────────────────────────────-┘
│ all CSA responses collected
▼
┌──────────────────────────────────────────────────┐
│ Phase 2 — Director CSA runs SEQUENTIALLY │
│ │
│ Director receives: │
│ [prior round summaries] │
│ PM message this round: <pm_message> │
│ === CSA RESPONSES THIS ROUND === │
│ CSA Observability (full text) │
│ CSA IT Operations (full text) │
│ CSA Customer Voice (full text) │
│ === END CSA RESPONSES === │
│ "As Director CSA, please synthesize..." │
│ │
│ Director response is STREAMED token-by-token │
│ to the browser via SSE as it generates. │
└──────────────────┬────────────────────────────────┘
│ round_complete event emitted
▼
Round saved to Cosmos DB
Each CSA has a distinct domain and lens. They are not meant to see each other's answers in real time — that would anchor them to the first response. Running them in parallel with asyncio.gather ensures each CSA argues independently from their grounding, producing genuinely different perspectives for the Director to synthesize. The Director is specifically designed to surface disagreements, flag unaddressed gaps, and choose viability over consensus.
The backend does not wait for all agents to finish before responding. It uses FastAPI's StreamingResponse with text/event-stream and yields events as they occur:
| Event Type | When | What It Contains |
|---|---|---|
csa_complete |
After each CSA finishes | Full response text + display metadata |
dir_chunk |
On every token from Director | Incremental text chunk |
round_complete |
After Director finishes | Full structured round object |
error |
On any exception | Error message |
[DONE] |
Always last | Sentinel to close the stream |
The frontend consumes these via a custom streamDebateRound() function and renders each CSA card as its event arrives, and streams the Director response character-by-character.
Session-level role overrides can be applied without changing roles.yaml. A PM can:
- Manually edit
display_name,domain,lens, orsystem_promptfor any role via the Agent Config page - Bootstrap a new persona from a real meeting transcript — the backend sends the transcript to the same LLM with a meta-prompt that extracts domain, lens, and a full structured system prompt, then returns a draft for PM review before saving it to the session
This means the swarm can be tuned to a specific customer reality within minutes, without any code changes.
After one or more debate rounds, the PM can generate any of the following documents:
| Document Type | Content |
|---|---|
| Architecture Recommendation | Executive summary, agreed principles, layered architecture by component, open risks, next steps |
| Project Plan | Phased timeline (honours any explicit duration stated in the debate), milestones, resource requirements |
| Technical Specifications | Per-component specs, data flow, integration table, non-functional requirements, decisions log |
| Detailed Roadmap | Short / mid / long-term workstreams derived from debate outcomes |
| Risk Register | Identified risks with likelihood, impact, and mitigation owners |
| GSA Compliance Assessment | Mapping of the proposed architecture to GSA / federal compliance controls |
| User Stories & Tasks | Importable Azure DevOps and GitHub Projects CSV with Epics, User Stories, and Tasks |
All documents are generated by the Synthesis Orchestrator (swarm/orchestrator.py), which sends the full round history as context to the model under the same grounding constraint: every statement in the document must trace back to what was said in the debate or to official Microsoft documentation. The output is streamed to the browser and persisted to Cosmos DB.
All data lives in a single container, partitioned by session_id. Documents are discriminated by a type field:
| Type | Key Fields |
|---|---|
session |
session_id, user_id, title, agent_config |
round |
session_id, round_number, pm_message, csa_responses{}, dir_response{} |
recommendation |
session_id, doc_type, content |
grounding |
session_id, position, source_type, content, pinned |
Each user only sees their own sessions — all queries filter by user_id derived from the validated JWT.
| Concern | Approach |
|---|---|
| Authentication | Entra ID MSAL (frontend) + JWT validation against JWKS (backend) |
| No credentials in config | All Azure SDK calls use DefaultAzureCredential (Managed Identity in ACA) |
| Cosmos DB network isolation | Private endpoint; publicNetworkAccess: Disabled |
| Cosmos DB data plane auth | RBAC (Cosmos DB Built-in Data Contributor) — no account keys |
| Azure OpenAI auth | RBAC (Cognitive Services OpenAI Contributor) — no API keys |
| Container image pull | Managed Identity with AcrPull — no registry credentials |
| Grounding constraint enforcement | Hardcoded preamble in every agent's system prompt |
See SECURITY.md for the full threat model and how to report vulnerabilities.
The entire platform can be deployed end-to-end with the Azure Developer CLI (azd).
- Azure subscription with permission to create resource groups and role assignments
- Azure CLI (
az) and Azure Developer CLI (azd) - Docker (used by
azdto build container images) - An Azure region with gpt-4o capacity (e.g.,
eastus2,westus3,swedencentral) - Your own Microsoft Entra ID app registration for SSO. This repo does not ship a shared client ID — you must create one in your own tenant before deploying. Configure it as:
- Supported account types: single-tenant (or as required by your organisation)
- Platform: Single-page application (SPA)
- Redirect URIs:
http://localhost:3000for local dev, plus your eventual frontend Container App URL after the firstazd up - API → Expose an API: an
Application ID URIand a delegated scope (e.g.access_as_user); thepostprovisionscript will set the URI to the deployed backend FQDN automatically - Note the Application (client) ID and Directory (tenant) ID — you will pass these to
azd env setbelow
# 1. Authenticate
az login
azd auth login
# 2. Initialise an azd environment (you'll be prompted for a name and location)
azd env new <env-name>
azd env set ENTRA_CLIENT_ID <your-app-registration-client-id>
azd env set ENTRA_TENANT_ID <your-aad-tenant-id>
# 3. Provision infrastructure and deploy both apps
azd upazd up will:
- Provision all Azure resources via Bicep (
infra/main.bicep) — VNet, Container Apps Environment, ACR, Cosmos DB (private endpoint), AI Services + gpt-4o deployment, Log Analytics, Key Vault, Storage, AI Foundry hub/project. - Build the backend and frontend Docker images and push them to the new ACR.
- Deploy both Container Apps with managed identities and the necessary RBAC role assignments.
- Run the
postprovisionhook in scripts/ to set the Entra ID app registrationidentifierUristo the deployed backend URL.
When complete, azd prints the frontend URL. Sign in with an account in your tenant to begin a session.
# Backend (Python 3.12)
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # then fill in COSMOS_ENDPOINT, FOUNDRY_OPENAI_ENDPOINT
uvicorn main:app --reload # http://localhost:8000
# Frontend (Node.js 20+)
cd frontend
npm install
cp .env.local.example .env.local
npm run dev # http://localhost:3000For local dev the backend uses DefaultAzureCredential, so az login against an account that has the Cognitive Services OpenAI User and Cosmos DB Built-in Data Contributor roles on the deployed resources.
Set AUTH_ENABLED=false in .env to bypass JWT validation for fast iteration.
azd down --purgeagents/ Per-role agent factory (Microsoft Agent Framework wrapper)
api/ FastAPI app, routes, JWT auth
config/ Settings loader and roles.yaml (CSA persona definitions)
frontend/ Next.js 14 App Router UI
infra/ Bicep templates (main + cosmos + containerapp modules)
scripts/ azd lifecycle hooks (e.g. set-app-identifier-uri.sh)
swarm/ Cosmos store, context loader, deliverable orchestrator
workflows/ Debate workflow (parallel CSAs + sequential Director)
Dockerfile.backend / Dockerfile.frontend
azure.yaml azd service map
Issues and pull requests are welcome. See CONTRIBUTING.md for the workflow. For security issues, please follow the disclosure process in SECURITY.md.
Released under the MIT License.