Derived from: Blueflame-Spec-v3-ACAR.md Stack: TypeScript / Next.js full-stack Agent tooling: Claude Code Sprint window: February 10 – March 15, 2026 (5 weeks)
This PRD translates the Blueflame spec into 10 buildable systems. Each system has: a purpose statement, user stories, technical requirements, acceptance criteria (traceable to the spec), and dependencies on other systems.
The companion tasks.yaml breaks each system into atomic tasks sized for Claude Code (~1 PR each).
| # | System | Spec Sections | Sprint | Priority |
|---|---|---|---|---|
| S1 | Project Scaffold & Infrastructure | 5, 8, 19 | Week 1 | P0 |
| S2 | Authentication & RBAC | 5, 10 (Stage 4), 15 | Week 1 | P0 |
| S3 | Cosmos DB Data Layer | 13 | Week 1–2 | P0 |
| S4 | Chat Interface & Designer Agent | 10 (Stage 1), 11.1 | Week 2 | P0 |
| S5 | Spec Engine (Output Spec lifecycle) | 10 (Stages 2–4), 17 | Week 2 | P0 |
| S6 | Planning Engine (Derivation + Authorization) | 10 (Stages 3–4) | Week 2–3 | P0 |
| S7 | Agent Swarm (Foundry Agent Service) | 6, 7 | Week 3 | P0 |
| S8 | GitHub Integration (Agentic DevOps) | 9 | Week 3 | P0 |
| S9 | Budget System & Partial Execution | 16 | Week 4 | P1 |
| S10 | Observability Dashboard | 18 | Week 4 | P1 |
| S11 | CI/CD Failure Intelligence | 10.3, 11 | Week 5 | P0 |
| S12 | ACAR σ-Routing (Multi-Provider) | 3, 6 | Week 6 | P0 |
| S13 | Enterprise Governance (Tracing + Compliance) | 6, 15, 23 | Week 6 | P0 |
| S14 | Spec Delta Detection | 17 | Week 6 | P1 |
| S15 | CI/CD Templates & Security Constraints | 7.2, 9, 14 | Week 6 | P0 |
| S16 | Enterprise Budgeting & MS Integration | 16, 23 | Week 7 | P0 |
Completed: S1–S16 (all MVP + enterprise systems), visual animations, demo wiring, UI redesign, SCR governance + delta execution, 12-phase gap resolution. In Progress: Demo recording + submission. Deferred: S16-004 (SignalR migration), S16-005 (AppInsights SDK).
Stand up the monorepo, Azure resources, and deployment pipeline so that all subsequent systems have a working foundation.
- As a developer, I can clone the repo, run
npm install, and start both frontend and backend locally. - As a developer, I can deploy to Azure with a single command or CI push.
Monorepo Structure:
blueflame/
├── apps/
│ ├── web/ # Next.js frontend (Azure Static Web Apps)
│ └── api/ # Express/Fastify API (Azure Container Apps)
├── packages/
│ ├── shared/ # TypeScript types, schemas, constants
│ ├── cosmos/ # Cosmos DB client + repository pattern
│ ├── foundry/ # Microsoft Foundry SDK wrappers
│ └── github-app/ # GitHub App + Octokit client
├── infra/ # Bicep/ARM templates for Azure
├── blueflame/ # Spec artifacts (output-spec schema, etc.)
├── tasks.yaml # This task plan
├── turbo.json
├── package.json
└── tsconfig.base.json
Infrastructure (Azure):
- Cosmos DB (NoSQL, serverless) — 7 containers per spec Section 13.2
- Azure Blob Storage (Hot tier)
- Azure Static Web Apps (Standard)
- Azure Container Apps (Consumption)
- Azure SignalR Service (Standard)
- Azure Key Vault (Standard)
- Azure Monitor + Log Analytics workspace
Tooling:
- Turborepo for monorepo management
- TypeScript strict mode throughout
- Biome for linting/formatting
- Vitest for unit tests
- Playwright for E2E (demo recording)
-
npm install && npm run devstarts both web (port 3000) and api (port 4000) -
npm run deployprovisions Azure resources via Bicep and deploys - All 7 Cosmos DB containers exist with correct partition keys
- SignalR connection established from web to api
- Key Vault accessible from api with managed identity
- CI pipeline (GitHub Actions) runs lint + test on PR
None — this is the foundation.
Implement Azure Entra ID SSO and the 4-tier RBAC model (Viewer, Editor, Authorizer, Admin) that gates every action in the system.
- As a user, I can sign in with my Microsoft account via Entra ID.
- As an Admin, I can assign roles to team members.
- As a Viewer, I can see specs and run status but cannot authorize execution.
- As an Authorizer, I can approve plan.lock and trigger agent execution.
- Next.js middleware using
@azure/msal-nodefor server-side auth - MSAL React (
@azure/msal-react) for client-side auth state - Entra ID App Registration with redirect URIs for local + deployed
- 4 App Roles defined in the manifest:
Blueflame.Viewer,Blueflame.Editor,Blueflame.Authorizer,Blueflame.Admin - API middleware that validates JWT + extracts role claims
- Role-based UI rendering (disable Authorize button for Viewers/Editors)
- Entra Agent ID setup for agent identities (4 agents × project-scoped)
- User can sign in via Microsoft SSO
- JWT contains role claims
- API rejects unauthorized requests with 403
- Authorize action requires
Blueflame.AuthorizerorBlueflame.Adminrole - Agent identities created with scoped permissions
- S1 (infrastructure must exist)
Implement the repository pattern for all 7 Cosmos DB containers with TypeScript types, CRUD operations, and change feed support.
- As a system, I can create, read, update, and query documents in all 7 containers.
- As a system, I can listen to change feed events for real-time state transitions.
Package: packages/cosmos/
Containers + Types:
| Container | Partition Key | TypeScript Interface |
|---|---|---|
| specs | /projectId | OutputSpec (with version history, SHA-256 hash) |
| plans | /runId | TaskPlan, PRD |
| locks | /runId | PlanLock (immutable after creation) |
| runs | /projectId | Run (state machine: PENDING → AUTHORIZED → EXECUTING → PAUSED → COMPLETED/FAILED/PARTIAL) |
| agents | /runId | AgentState (status, tokens, σ, cost) |
| constraints | /projectId | Constraint (per Section 14.2 schema) |
| documents | /projectId | UploadedDocument (metadata + Blob ref) |
Key behaviors:
PlanLockhas no update method — create-only (immutability enforced in code)OutputSpecstores version array;freezeSpec()computes SHA-256 and marks immutableRunstate transitions enforced via state machine (no rollback)- Change feed processor for
runsandagentscontainers → emits events to SignalR
- All 7 container repositories with full CRUD (except PlanLock: create + read only)
- TypeScript interfaces match spec Section 13–14 schemas
- SHA-256 hash computation for OutputSpec freeze
- State machine for Run with transition validation (rejects invalid transitions)
- Change feed processor emits events
- Unit tests for all repositories (≥80% coverage)
- S1 (Cosmos DB must be provisioned)
Build the conversational design UI (Stage 1) and the Designer agent that elicits requirements through dialogue.
- As a user, I can describe what I want to build in a chat interface.
- As a user, I see the Designer agent ask clarifying questions in real-time (streaming).
- As a user, I see a "Generate Spec" button when the Designer has enough context.
Frontend (apps/web/):
- Chat panel (left side of split view) with message history
- Real-time streaming via SignalR (not polling)
- Markdown rendering for agent responses
- "Generate Spec" action button (appears after sufficient context)
- Typing indicators during agent response
Backend (apps/api/):
/api/chatendpoint — streams Designer agent responses via SignalR- Foundry Agent Service SDK integration (
@azure/ai-projectsor REST) - Designer agent system prompt: requirement elicitation, ambiguity detection, progressive structuring
- Conversation memory: stored in Cosmos DB (or Foundry Enhanced Memory)
Foundry (packages/foundry/):
- Foundry project client wrapper
- Agent creation helper (Designer role)
- Model configuration: multi-provider (Azure OpenAI, Anthropic, Google, OpenAI Direct) — configurable per agent role via model registry
- User can type message and receive streaming response
- Agent asks at least 2 clarifying questions before offering to generate spec
- Messages persist across page refresh
- SignalR streaming works (not chunked HTTP)
- System prompt follows spec Section 10 (Stage 1) requirements
- S1 (frontend + backend running)
- S2 (user must be authenticated)
- S3 (conversation storage)
Build the Output Spec lifecycle: generation from conversation, editing, versioning, freezing, and delta detection.
- As a user, I see the generated Output Spec in a side panel alongside chat.
- As a user, I can edit the spec using a Monaco editor with YAML schema validation.
- As a user, I can accept the spec (freezes it for planning).
- As a user, I can edit a frozen spec to create a new version (triggers delta detection).
Frontend:
- Split view: chat (left) + spec editor (right)
- Monaco Editor with YAML language support and custom schema validation
- Spec status indicators: DRAFT → ACCEPTED → FROZEN
- Version history dropdown (v1, v2, ...)
- Diff view between spec versions
Backend:
/api/specs— CRUD for OutputSpec documents/api/specs/:id/freeze— compute SHA-256, mark immutable, increment version/api/specs/:id/delta— compute diff between two versions (Section 17.1)- Spec generation: Foundry Agent call that transforms conversation into YAML OutputSpec
Schema (in packages/shared/):
interface OutputSpec {
id: string;
projectId: string;
version: number;
status: 'DRAFT' | 'ACCEPTED' | 'FROZEN';
hash?: string; // SHA-256, set on freeze
deliverables: Deliverable[];
acceptance_criteria: AcceptanceCriterion[];
constraints: { must: string[]; must_not: string[]; };
non_goals: string[];
risks: Risk[];
definition_of_done: string;
inherited_constraints: string[]; // from constraint registry
created_at: string;
updated_at: string;
}- Spec generated from conversation context
- Monaco editor renders and validates YAML
- Freeze computes SHA-256 and prevents further edits to that version
- New edit on frozen spec creates v(n+1) with DRAFT status
- Delta endpoint returns field-by-field diff between versions
- Spec versions queryable with full history
- S3 (specs container)
- S4 (conversation context needed for generation)
Build the derivation pipeline (Stage 3) and authorization gate (Stage 4): from approved spec to locked execution plan.
- As a user, I see a generated task plan with dependency DAG after accepting the spec.
- As a user, I can review estimated costs per task (σ-informed).
- As an Authorizer, I can set a budget ceiling and approve the plan (creates plan.lock).
Backend:
/api/plans/derive— Planner agent decomposes spec into task DAG/api/plans/:id/authorize— creates immutable plan.lock.json (requires Authorizer role)- Planner agent (Foundry): takes frozen spec, outputs TaskPlan YAML
- σ estimation: for each task, Planner estimates complexity (mock σ in Week 2; real σ via N=3 sampling in Week 3)
Frontend:
- Task plan view: table with task ID, description, dependencies, estimated cost, agent role
- DAG visualization (simple: use
dagreorelkjsfor layout) - Budget input (USD) + Authorize button (gated by role)
- Authorization confirmation modal
Schema:
interface TaskPlan {
id: string;
runId: string;
specId: string;
specHash: string;
tasks: Task[];
total_estimated_cost: number;
}
interface Task {
task_id: string;
description: string;
acceptance_criteria_ids: string[];
dependencies: string[];
agent_role: 'builder' | 'verifier';
estimated_tokens: number;
estimated_cost: number;
sigma: number; // σ estimate
parallelizable: boolean;
}
interface PlanLock {
id: string;
runId: string;
specHash: string;
tasks: Task[];
budget_ceiling: number;
per_agent_limit: number;
constraint_snapshot: Constraint[];
authorized_by: string;
authorized_at: string;
}- Planner agent generates task DAG from frozen spec
- Tasks have dependency ordering and acceptance criteria links
- DAG rendered in UI
- Authorization creates immutable PlanLock in Cosmos DB
- Authorization rejected if user lacks Authorizer role
- PlanLock references correct spec hash
- S3 (plans + locks containers)
- S5 (frozen spec required)
- S2 (RBAC for authorization gate)
Deploy the 5 agent roles (Planner, Builder, Verifier, Explainer, Fixer) in Foundry Agent Service with orchestration via Foundry Workflows. Each agent role has a configurable default model+provider pair, supporting multi-provider routing (Azure OpenAI, Anthropic, Google, OpenAI Direct).
- As a system, I can spawn Builder agents that create branches and write code.
- As a system, I can spawn Verifier agents that trigger CI and evaluate results.
- As a system, I can spawn Explainer agents that generate PR descriptions.
- As a user, I see agent execution progress in real-time.
Agent Definitions (packages/foundry/agents/):
| Agent | System Prompt Focus | Tools (MCP) | Default Model (Configurable) |
|---|---|---|---|
| Planner | Task decomposition, DAG, σ estimation | GitHub API (read), Cosmos (read), Foundry IQ | o1 (Azure) — fallback: Claude Opus 4.6 |
| Builder | Code implementation, branch/PR creation | GitHub API (write), Foundry IQ, Code Server MCP | Claude Sonnet 4.5 (Anthropic) — fallback: Codex / GPT-4o |
| Verifier | Test execution, constraint validation | GitHub Actions (trigger), Test Runner MCP, Cosmos (constraints) | GPT-4o (Azure) — fallback: Gemini 2.5 Pro |
| Explainer | Root cause analysis, PR descriptions, run summaries | Foundry Tracing (read), Cosmos (read), GitHub Diff API | GPT-4o (Azure) — fallback: Claude Opus 4.6 |
| Fixer | CI/CD failure analysis, remediation planning | ADO REST API (read), GitHub API (read), Foundry IQ, Cosmos (failures) | GPT-4o + Claude Sonnet 4.5 (multi-provider) |
Orchestration (apps/api/services/orchestrator.ts):
- Workflow engine: receives authorized PlanLock, spawns agents per task DAG
- Phase 1 (Sequential): Planner finalizes task assignments
- Phase 2 (Parallel): Builders spawn for independent tasks; Verifiers follow each Builder
- Phase 3 (Sequential): Explainer generates consolidated summary
- State tracked in
agentscontainer; emitted to SignalR - A2A handoff: Builder completion → Verifier start (pass branch ref + task context)
Real-time updates:
- Agent status changes → Cosmos change feed → SignalR → dashboard
- All 5 agents deployed in Foundry Agent Service (Planner, Builder, Verifier, Explainer, Fixer)
- Builder creates branch, writes files, opens PR
- Verifier triggers GitHub Action and receives results
- Explainer generates PR description with spec traceability
- Orchestrator respects dependency DAG (no task starts before deps complete)
- Agent state streamed to frontend via SignalR
- Agent execution stoppable by user (interrupt)
- S3 (agents + runs containers)
- S6 (PlanLock required to spawn)
- S8 (GitHub integration for branch/PR/Actions)
Implement the GitHub App, branch strategy, PR workflow, and Actions integration that forms the agentic DevOps loop.
- As a user, I can connect my GitHub repo to Blueflame.
- As a Builder agent, I can create branches and open PRs.
- As a Verifier agent, I can trigger GitHub Actions and receive CI results.
- As a user, I review agent-generated PRs in GitHub's standard UI.
GitHub App (packages/github-app/):
- GitHub App registration with permissions: contents (write), pull_requests (write), actions (write), checks (read)
- Octokit client wrapper with installation token management
- Branch operations:
createBranch(runId, taskId),commitFiles(branch, files),createPR(branch, title, body) - Actions operations:
triggerWorkflow(branch),getWorkflowRunResult(runId)
Branch Strategy:
main (or target branch)
└── blueflame/run-{runId} # run branch
├── blueflame/run-{runId}/task-001 # task branch (Builder)
├── blueflame/run-{runId}/task-002
└── blueflame/run-{runId}/task-003
Webhook Handler (apps/api/webhooks/github.ts):
- Receives:
workflow_run.completed,pull_request.reviewed,check_run.completed - Routes events to orchestrator for Verifier feedback loop
PR Template:
- Links to spec (spec hash, acceptance criteria IDs)
- Agent-generated description (from Explainer)
- Constraint compliance summary
- Cost report
- GitHub App installable on user's repo
- Builder can create branch + commit files + open PR
- Verifier can trigger GitHub Action workflow
- Webhook receives CI results and routes to Verifier agent
- PR description includes spec traceability
- Branch naming follows
blueflame/run-{id}/task-{n}convention - Agent never has merge permission (branch protection enforced)
- S1 (GitHub App registered)
- S7 (agents need GitHub operations)
Implement cost tracking, budget enforcement, graceful pause, and partial result handling.
- As a user, I see real-time cost burn-down during execution.
- As a user, I receive a warning at 80% budget.
- As a user, execution pauses at 95% and I can choose: resume, accept partial, or abandon.
Budget Monitor (apps/api/services/budget-monitor.ts):
- Tracks cumulative cost per run (token usage × model pricing)
- Emits WARNING event at 80% threshold → SignalR → UI alert
- Emits PAUSE event at 95% → orchestrator pauses agent spawning
- Completed tasks: PRs remain open
- In-progress tasks: draft PRs with [PARTIAL] label
- Not-started tasks: marked DEFERRED
Resume Flow:
- User provides additional budget → new PlanLock addendum → orchestrator resumes
Frontend:
- Cost burn-down progress bar (green → yellow → red)
- Pause modal with 3 choices: Resume / Accept Partial / Abandon
- Cost tracked per agent per task per run
- Warning at 80%, pause at 95% (thresholds configurable)
- Partial results produce draft PRs with correct labels
- Resume creates PlanLock addendum and continues execution
- Abandon cleans up branches
- Run state transitions: EXECUTING → PAUSED → RESUMED | PARTIAL_COMPLETE | ABANDONED
- S7 (agent execution must be running)
- S3 (runs container for state tracking)
Build the real-time dashboard that serves as the primary Stage 5 interface and the visual centerpiece of the demo.
- As a user, I see live agent statuses (running/blocked/paused/complete).
- As a user, I see the task DAG with completion progress.
- As a user, I see cost burn-down against budget.
- As a user, I see constraint evaluation results (green/red).
- As a user, I see a live action stream of agent decisions.
Frontend (apps/web/components/dashboard/):
- Agent status cards (4 agents, color-coded by state)
- DAG progress view (nodes light up as tasks complete)
- Cost burn-down chart (recharts or similar)
- Constraint results panel (pass/fail per criterion)
- Live action stream (scrolling log of agent tool calls, powered by SignalR)
- Budget alert overlay
- Run summary panel (post-completion)
Data Source:
- All data via SignalR from Cosmos change feed
- No polling — fully event-driven
- Dashboard updates in real-time (< 1s latency)
- Agent states reflect actual Foundry agent status
- DAG shows correct dependency relationships and completion
- Cost numbers match actual token usage
- Action stream shows agent tool calls with timestamps
- Works during demo recording (no flicker, no missing updates)
- S7 (agents must emit state)
- S9 (budget data for cost display)
- S1 (SignalR connection)
- Blob upload UI, Foundry IQ indexing, Ingestion agent
- Can be high-fidelity simulated for demo
- GitHub repo indexing via Foundry IQ, constraint auto-extraction
- Can be described in demo, not shown live
- CRUD UI for project-level constraints, loading flow
- Core data model exists in S3; UI is the deferred part
- Semantic diff engine, impact mapping, rebuild orchestration
- MUST be functional for Workflow 6 demo — prioritize backend logic in Week 4
- Foundry Content Safety + Protected Material Detection integration
- Configuration-level; lower implementation effort
Implement the Spec-Freeze Doctrine: "A frozen spec is law. Changing it is a governance event, not a chat edit." Changes to frozen specs follow a formal Spec Change Request (SCR) workflow with impact analysis and delta execution (patch the existing plan, only re-execute affected tasks — never a fresh run).
- As a user, when a spec is frozen, I can request a formal change via SCR (not a direct edit).
- As a user, I see a DiffPack showing exactly what changed between spec versions.
- As a user, I see a color-coded Impact Map (PRESERVE/REBUILD/NEW/REMOVE) per task.
- As an Authorizer, I can approve or reject SCRs with full audit trail.
- As a user, approved SCRs trigger delta execution that preserves completed work.
Shared Types (packages/shared/src/types/scr.ts):
SCRStatus: OPEN → IMPACT_ANALYZED → APPROVED → REJECTED → DEFERRED → EXECUTING → COMPLETEDSCRSeverity: PATCH (clarification) / MINOR (additive) / MAJOR (breaking)DiffPack+DiffPackItem: Field-level diffs with affected criteria IDsTaskPatch+TaskPatchEntry: add/update/invalidate/cancel with DiffPack citationBaselineSnapshot: Captures run state before delta executionSpecChangeRequest: Full SCR document with governance metadata
SCR Service (apps/api/src/services/scr-service.ts):
createSCR()— Creates new spec version via editFrozenSpec(), runs delta analysis, builds DiffPackapproveSCR()— Generates TaskPatch, transitions to APPROVEDrejectSCR()— Transitions to REJECTED with reasonexecuteDeltaRun()— Captures baseline, applies TaskPatch, resumes execution
API Routes (apps/api/src/routes/scr.ts):
- POST
/api/scr— Create SCR - GET
/api/scr/:scrId— Get SCR - GET
/api/scr/project/:projectId— List SCRs by project - PUT
/api/scr/:scrId/approve— Approve - PUT
/api/scr/:scrId/reject— Reject - POST
/api/scr/:scrId/execute— Execute delta run
Orchestrator (apps/api/src/services/orchestrator.ts):
applyTaskPatch()— Patches existing run plan: invalidate → reset to PENDING, cancel → DEFERRED, add → append new tasks
Task Executor (apps/api/src/services/task-executor.ts):
- Patch Mode: Builder agents get constrained prompts during delta execution ("Only modify files related to cited DiffPack items")
Frontend (apps/web/components/spec/SCRPanel.tsx):
- Multi-step UI: idle → editing → reviewing (DiffPack + Impact Map) → approved (TaskPatch summary) → executing
- Reuses existing
DeltaImpactMap.tsxcomponent
- SCR can be created from a frozen spec with reason
- DiffPack shows field-level changes with severity
- Impact Map shows per-task PRESERVE/REBUILD/NEW/REMOVE
- Approve/reject transitions with audit logging
- Delta execution patches plan and only re-executes affected tasks
- Completed work preserved in BaselineSnapshot
- Every TaskPatchEntry cites a DiffPackItem.id
- S5 (spec freeze), S7 (orchestrator), S14 (delta detection)
- All API endpoints return structured errors:
{ error: string, code: string, details?: any } - Agent failures: caught by orchestrator, logged, run marked FAILED with partial results preserved
- Cosmos transient failures: retry with exponential backoff (built into
@azure/cosmos)
- Unit tests: Vitest, ≥80% coverage on
packages/* - Integration tests: Vitest with Cosmos emulator for data layer
- E2E: Playwright for critical paths (auth → chat → spec → authorize)
- Agent tests: mock Foundry responses for deterministic testing
.env.localfor local dev (Cosmos emulator, local Foundry endpoint).env.productionpopulated from Key Vault- All secrets via
@azure/identityDefaultAzureCredential