Flow Studio

An agentic SDLC harness that transforms requirements into merged PRs with forensic evidence. For: Platform engineers and staff+ operators who need repeatable, auditable automation (not demos).

Docs: docs/INDEX.md | Getting started: docs/GETTING_STARTED.md | Golden runs: docs/GOLDEN_RUNS.md | Roadmap: docs/ROADMAP_3_0.md | Changelog: CHANGELOG.md

Why This Exists

The job is moving up the stack. Again.

Punchcards → Assembly → High-level languages → Now

Each transition followed the same pattern: what was once skilled craft becomes mechanical, and humans move to higher-leverage work. Programmers stopped managing memory addresses. Then stopped thinking in registers. Now they stop grinding on first-draft implementation.

The shift: Models write nearly-working code at 1,000+ tokens/second. The bottleneck isn't generation—it's trust. Can a human review and trust the output in 30 minutes instead of spending a week doing it themselves?

Flow Studio addresses that constraint. It produces forensic evidence alongside code. You audit the evidence, escalate verification at hotspots where doubt exists, and ship—or bounce it back for another iteration.

The machine does the implementation. You do the architecture, the intent, the judgment.

Just like every transition before.

Status

Current Version: v3.0.0-rc.1 (Release Candidate)

See RELEASE_NOTES_3_0_0_RC1.md for details on the V3 "Intelligent Factory" update.

This release is stable for evaluation. It includes the new Pack System, Stepwise Orchestrator, and Resilient DB.

What Flow Studio Does

Flow Studio orchestrates 9 flows that transform a requirement into a merged PR and manage the lifecycle:

Flow	Transformation	Output
1: Signal	Raw input → structured problem	Requirements, BDD scenarios, risk assessment
2: Plan	Requirements → architecture	ADR, contracts, work plan, test plan
3: Build	Plan → working code	Implementation + tests via adversarial loops
4: Review	Draft PR → Ready PR	Harvest feedback, apply fixes
5: Gate	Code → merge decision	Audit receipts, policy check, recommendation
6: Deploy	Approved → production	Merge, verify health, audit trail
7: Wisdom	Artifacts → learnings	Pattern detection, feedback loops
8: Reset	Clean state	Infrastructure wipe, cache purge
9: Demo	Simulation	Walkthrough of the Stepwise Orchestrator

Each flow produces receipts (proof of execution) and evidence (test results, coverage, lint output). Kill the process anytime—resume from the last checkpoint with zero data loss.

The Economics

Approach	Cost	Output
Developer implements feature	5 days of salary	Code you hope works
Flow Studio runs overnight	~$30 compute	Code + tests + receipts + evidence panel + hotspot list

The receipts are the product. The code is a side effect.

Installation

Requirements:

Python 3.13+
uv (required)
GNU Make (Linux/macOS: included; Windows: use WSL2 or MSYS2)
Node.js 20+ (optional, for UI development)

Install:

git clone https://github.com/EffortlessMetrics/flow-studio-swarm.git
cd flow-studio-swarm
uv sync --extra dev

Verify:

make dev-check    # Should pass all checks

Quick Start

make demo-run       # Populate example artifacts
make flow-studio    # Start UI → http://localhost:5000

Open: http://localhost:5000/?run=demo-health-check&mode=operator

You'll see:

Left: 9 flows (Signal → Plan → Build → Review → Gate → Deploy → Wisdom → Reset → Demo)
Center: Step graph showing what ran and what it produced
Right: Evidence, artifacts, and agent details

The demo shows a complete run—all nine flows indexed, with artifacts captured. Click around. This is what "done" looks like.

What You Get

Every completed run produces:

Artifact	Purpose
Receipts	Forensic proof of what happened—commands run, exit codes, timing
Evidence panel	Multi-metric dashboard (tests, coverage, lint, security) that resists gaming
Hotspots	The 3-8 files a reviewer should actually look at
Bounded diff	The change itself, with clear scope
Explicit unknowns	What wasn't measured, what's still risky

A reviewer answers three questions in under 5 minutes:

Does evidence exist and is it fresh?
Does the panel of metrics agree?
Where would I escalate verification?

If yes, approve. If contradictions, investigate. The system did the grinding.

The Mental Model

Don't anthropomorphize the AI as a "copilot." View it as a manufacturing plant.

Component	Role	Behavior
Python Kernel	Factory Foreman	Deterministic. Manages time, disk, budget. Never guesses.
Agents	Enthusiastic Interns	Brilliant, tireless. Will claim success to please you. Need boundaries.
Disk	Ledger	If it isn't written, it didn't happen.
Receipts	Audit Trail	The actual product.

The foreman's job:

Don't ask interns if they succeeded—measure the bolt
Don't give them everything—curate what they need
Don't trust their prose—trust their receipts

This is why the system doesn't rely on agent claims and runs forensic scanners. Exit codes don't lie. Git diffs don't hallucinate.

Architecture

Three planes, cleanly separated:

Plane	Component	What it does
Control	Python kernel	State machine, budgets, atomic disk commits
Execution	Claude Agent SDK	Autonomous work in a sandbox
Projection	DuckDB	Queryable index for the UI

The kernel is deterministic. The agent is stochastic. The database is ephemeral (rebuildable from the event journal).

Step lifecycle:

Work — Agent executes with full autonomy
Finalize — Structured handoff envelope extracted from hot context
Route — Next step determined from forensic evidence, not prose

Flow Studio orchestrates work in repos of any language. It's implemented in Python (kernel) and TypeScript (UI).

Key Principles

Principle	What it means
Forensics over narrative	Trust the git diff, the test log, the receipt. Not the agent's claim.
Verification is the product	Output is code + the evidence needed to trust it.
Steps, not sessions	Each step has one job in fresh context. No 100k-token confusion.
Adversarial loops	Critics find problems. Authors fix them. They never agree to be nice.
Resumable by default	Kill anytime. Resume from last checkpoint. Zero data loss.

Commands

make dev-check          # Validate swarm health (run before commits)
make selftest           # Full 16-step validation
make kernel-smoke       # Fast kernel check (~300ms)
make stepwise-sdlc-stub # Zero-cost demo run
make help               # All commands

Documentation

Get Started

Time	Document	What you'll learn
10 min	GETTING_STARTED.md	Run the demo, see it work
20 min	TOUR_20_MIN.md	Understand the full system
5 min	MARKET_SNAPSHOT.md	Why this approach, why now

Go Deeper

Topic	Document
Flow Studio UI	FLOW_STUDIO.md
Stepwise execution	STEPWISE_BACKENDS.md
Reviewing PRs	REVIEWING_PRS.md
Adopting for your repo	ADOPTION_PLAYBOOK.md
Full reference	CLAUDE.md

Philosophy

Topic	Document
The AgOps philosophy	AGOPS_MANIFESTO.md — Steven Zimmerman
What this system is	TRUST_COMPILER.md
15 lessons learned	META_LEARNINGS.md
12 emergent laws	EMERGENT_PHYSICS.md

Are You Ready to Adopt This?

Flow Studio is built for teams who want evidence, replayability, and change control around agentic work. You'll have a better time if:

You can run Python locally and in CI (the repo assumes uv)
You have an opinionated place to store run artifacts (logs, snapshots, traces)
You have a stance on secrets (source, access, rotation)
You treat "automation with guardrails" as a feature: validation, schemas, and CI checks
You have a human review path when the system is uncertain

Start with a known-good reference execution: GOLDEN_RUNS.md.

See ADOPTION_PLAYBOOK.md for the full onboarding guide.

Contributing

Contributions are welcome. Before submitting:

Run make dev-check to validate the swarm
Run make selftest for full validation
Follow existing patterns in swarm/ and .claude/

See CLAUDE.md for the full reference on how the system works.

Something broken? Open an issue.

License

Apache-2.0 or MIT

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.claude		.claude
.gemini/commands/swarm		.gemini/commands/swarm
.github		.github
.jules		.jules
.vscode		.vscode
demo		demo
docs		docs
examples		examples
features		features
observability		observability
packages/selftest-core		packages/selftest-core
plans		plans
specs		specs
src/handlers		src/handlers
swarm		swarm
templates		templates
tests		tests
.cspell.json		.cspell.json
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.pre-commit-config.yaml		.pre-commit-config.yaml
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CHEATSHEET.md		CHEATSHEET.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEMO_RUN.md		DEMO_RUN.md
GLOSSARY.md		GLOSSARY.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
REPO_MAP.md		REPO_MAP.md
SUPPORT.md		SUPPORT.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock
ux_manifest.json		ux_manifest.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flow Studio

Why This Exists

Status

What Flow Studio Does

The Economics

Installation

Quick Start

What You Get

The Mental Model

Architecture

Key Principles

Commands

Documentation

Get Started

Go Deeper

Philosophy

Are You Ready to Adopt This?

Contributing

Related

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Flow Studio

Why This Exists

Status

What Flow Studio Does

The Economics

Installation

Quick Start

What You Get

The Mental Model

Architecture

Key Principles

Commands

Documentation

Get Started

Go Deeper

Philosophy

Are You Ready to Adopt This?

Contributing

Related

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages