Skip to content

EffortlessMetrics/flow-studio-swarm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

119 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flow Studio

An agentic SDLC harness that transforms requirements into merged PRs with forensic evidence. For: Platform engineers and staff+ operators who need repeatable, auditable automation (not demos).

Docs: docs/INDEX.md | Getting started: docs/GETTING_STARTED.md | Golden runs: docs/GOLDEN_RUNS.md | Roadmap: docs/ROADMAP_3_0.md | Changelog: CHANGELOG.md

License Python 3.13+


Why This Exists

The job is moving up the stack. Again.

Punchcards → Assembly → High-level languages → Now

Each transition followed the same pattern: what was once skilled craft becomes mechanical, and humans move to higher-leverage work. Programmers stopped managing memory addresses. Then stopped thinking in registers. Now they stop grinding on first-draft implementation.

The shift: Models write nearly-working code at 1,000+ tokens/second. The bottleneck isn't generation—it's trust. Can a human review and trust the output in 30 minutes instead of spending a week doing it themselves?

Flow Studio addresses that constraint. It produces forensic evidence alongside code. You audit the evidence, escalate verification at hotspots where doubt exists, and ship—or bounce it back for another iteration.

The machine does the implementation. You do the architecture, the intent, the judgment.

Just like every transition before.


Status

Current Version: v3.0.0-rc.1 (Release Candidate)

See RELEASE_NOTES_3_0_0_RC1.md for details on the V3 "Intelligent Factory" update.

This release is stable for evaluation. It includes the new Pack System, Stepwise Orchestrator, and Resilient DB.


What Flow Studio Does

Flow Studio orchestrates 9 flows that transform a requirement into a merged PR and manage the lifecycle:

Flow Transformation Output
1: Signal Raw input → structured problem Requirements, BDD scenarios, risk assessment
2: Plan Requirements → architecture ADR, contracts, work plan, test plan
3: Build Plan → working code Implementation + tests via adversarial loops
4: Review Draft PR → Ready PR Harvest feedback, apply fixes
5: Gate Code → merge decision Audit receipts, policy check, recommendation
6: Deploy Approved → production Merge, verify health, audit trail
7: Wisdom Artifacts → learnings Pattern detection, feedback loops
8: Reset Clean state Infrastructure wipe, cache purge
9: Demo Simulation Walkthrough of the Stepwise Orchestrator

Each flow produces receipts (proof of execution) and evidence (test results, coverage, lint output). Kill the process anytime—resume from the last checkpoint with zero data loss.


The Economics

Approach Cost Output
Developer implements feature 5 days of salary Code you hope works
Flow Studio runs overnight ~$30 compute Code + tests + receipts + evidence panel + hotspot list

The receipts are the product. The code is a side effect.


Installation

Requirements:

  • Python 3.13+
  • uv (required)
  • GNU Make (Linux/macOS: included; Windows: use WSL2 or MSYS2)
  • Node.js 20+ (optional, for UI development)

Install:

git clone https://github.com/EffortlessMetrics/flow-studio-swarm.git
cd flow-studio-swarm
uv sync --extra dev

Verify:

make dev-check    # Should pass all checks

Quick Start

make demo-run       # Populate example artifacts
make flow-studio    # Start UI → http://localhost:5000

Open: http://localhost:5000/?run=demo-health-check&mode=operator

You'll see:

  • Left: 9 flows (Signal → Plan → Build → Review → Gate → Deploy → Wisdom → Reset → Demo)
  • Center: Step graph showing what ran and what it produced
  • Right: Evidence, artifacts, and agent details

The demo shows a complete run—all nine flows indexed, with artifacts captured. Click around. This is what "done" looks like.


What You Get

Every completed run produces:

Artifact Purpose
Receipts Forensic proof of what happened—commands run, exit codes, timing
Evidence panel Multi-metric dashboard (tests, coverage, lint, security) that resists gaming
Hotspots The 3-8 files a reviewer should actually look at
Bounded diff The change itself, with clear scope
Explicit unknowns What wasn't measured, what's still risky

A reviewer answers three questions in under 5 minutes:

  1. Does evidence exist and is it fresh?
  2. Does the panel of metrics agree?
  3. Where would I escalate verification?

If yes, approve. If contradictions, investigate. The system did the grinding.


The Mental Model

Don't anthropomorphize the AI as a "copilot." View it as a manufacturing plant.

Component Role Behavior
Python Kernel Factory Foreman Deterministic. Manages time, disk, budget. Never guesses.
Agents Enthusiastic Interns Brilliant, tireless. Will claim success to please you. Need boundaries.
Disk Ledger If it isn't written, it didn't happen.
Receipts Audit Trail The actual product.

The foreman's job:

  • Don't ask interns if they succeeded—measure the bolt
  • Don't give them everything—curate what they need
  • Don't trust their prose—trust their receipts

This is why the system doesn't rely on agent claims and runs forensic scanners. Exit codes don't lie. Git diffs don't hallucinate.


Architecture

Three planes, cleanly separated:

Plane Component What it does
Control Python kernel State machine, budgets, atomic disk commits
Execution Claude Agent SDK Autonomous work in a sandbox
Projection DuckDB Queryable index for the UI

The kernel is deterministic. The agent is stochastic. The database is ephemeral (rebuildable from the event journal).

Step lifecycle:

  1. Work — Agent executes with full autonomy
  2. Finalize — Structured handoff envelope extracted from hot context
  3. Route — Next step determined from forensic evidence, not prose

Flow Studio orchestrates work in repos of any language. It's implemented in Python (kernel) and TypeScript (UI).


Key Principles

Principle What it means
Forensics over narrative Trust the git diff, the test log, the receipt. Not the agent's claim.
Verification is the product Output is code + the evidence needed to trust it.
Steps, not sessions Each step has one job in fresh context. No 100k-token confusion.
Adversarial loops Critics find problems. Authors fix them. They never agree to be nice.
Resumable by default Kill anytime. Resume from last checkpoint. Zero data loss.

Commands

make dev-check          # Validate swarm health (run before commits)
make selftest           # Full 16-step validation
make kernel-smoke       # Fast kernel check (~300ms)
make stepwise-sdlc-stub # Zero-cost demo run
make help               # All commands

Documentation

Get Started

Time Document What you'll learn
10 min GETTING_STARTED.md Run the demo, see it work
20 min TOUR_20_MIN.md Understand the full system
5 min MARKET_SNAPSHOT.md Why this approach, why now

Go Deeper

Topic Document
Flow Studio UI FLOW_STUDIO.md
Stepwise execution STEPWISE_BACKENDS.md
Reviewing PRs REVIEWING_PRS.md
Adopting for your repo ADOPTION_PLAYBOOK.md
Full reference CLAUDE.md

Philosophy

Topic Document
The AgOps philosophy AGOPS_MANIFESTO.md — Steven Zimmerman
What this system is TRUST_COMPILER.md
15 lessons learned META_LEARNINGS.md
12 emergent laws EMERGENT_PHYSICS.md

Are You Ready to Adopt This?

Flow Studio is built for teams who want evidence, replayability, and change control around agentic work. You'll have a better time if:

  • You can run Python locally and in CI (the repo assumes uv)
  • You have an opinionated place to store run artifacts (logs, snapshots, traces)
  • You have a stance on secrets (source, access, rotation)
  • You treat "automation with guardrails" as a feature: validation, schemas, and CI checks
  • You have a human review path when the system is uncertain

Start with a known-good reference execution: GOLDEN_RUNS.md.

See ADOPTION_PLAYBOOK.md for the full onboarding guide.


Contributing

Contributions are welcome. Before submitting:

  1. Run make dev-check to validate the swarm
  2. Run make selftest for full validation
  3. Follow existing patterns in swarm/ and .claude/

See CLAUDE.md for the full reference on how the system works.

Something broken? Open an issue.


Related


License

Apache-2.0 or MIT

Packages

 
 
 

Contributors

Languages

  • Python 81.8%
  • TypeScript 9.1%
  • HTML 6.8%
  • CSS 1.2%
  • Makefile 0.6%
  • Shell 0.3%
  • Other 0.2%