_ _ __ _
__| | __ _ _ __| | __ / _| __ _ ___| |_ ___ _ __ _ _
/ _` |/ _` | '__| |/ /_____| |_ / _` |/ __| __/ _ \| '__| | | |
| (_| | (_| | | | <______| _| (_| | (__| || (_) | | | |_| |
\__,_|\__,_|_| |_|\_\ |_| \__,_|\___|\__\___/|_| \__, |
|___/
A Go CLI built for Claude Code that orchestrates autonomous AI agents to implement GitHub issues, review their own work, and merge — without human intervention.
Documentation · Getting Started · Releases
The hard part of software engineering isn't typing code — it's deciding what to build and how it fits. Dark Factory keeps those decisions with humans. Engineers write the roadmap, define architecture layers, design conventions, and author issue specs. Agents operate within those constraints. The harness is the design.
This is a collaborative architecture tool, not a "throw a ticket at an AI and hope for the best" system. The adversarial review model reinforces this: a separate reviewer agent checks whether the code respects the architecture a human defined, follows conventions a human wrote, and meets acceptance criteria a human specified. Every judgment call that shapes a codebase stays with the humans who understand it.
Dark Factory has been built entirely by its own agent pipeline — every feature
was implemented, reviewed, and merged by godark run. The humans write specs
and design harnesses; the agents write code.
Homebrew (macOS):
brew install peter-stratton/dark-factory/godarkGo install:
go install github.com/peter-stratton/dark-factory/cmd/godark@latestBinary download: grab a pre-built binary from GitHub Releases.
Dark Factory is built for Claude Code and GitHub. The architecture is designed around Claude Code's specific capabilities — session resumption, CLAUDE.md as a control surface, slash command skills, and sandboxed execution.
| Layer | Supported |
|---|---|
| AI agent | Claude Code (Anthropic) |
| Version control | GitHub |
- Three-agent pipeline — implementer, quality reviewer, and functional reviewer are independent agents with isolated permissions; reviewers literally cannot edit files
- Specification-driven quality gates — human-authored scenario specs define "done"; the functional reviewer generates ephemeral integration tests from specs, not just rubber-stamping the diff
- Architecture-as-code enforcement — machine-readable layer definitions validated by
godark vet; reviewers check architectural compliance, not just correctness - Structured agent dialogue — implementer posts reasoning as PR comments, reviewers challenge it; the PR thread is an auditable record of adversarial design review
- Full run observability — local web dashboard with review chain timelines, quality flags, tool traces, and agent dialogue history for every issue
- Harness engineering lifecycle — scaffold, validate, and enforce project constraints with
godark new,godark init,godark vet, and six harness types - Auto-detected multi-language support — detects project type from marker files and configures the sandbox, build, and test commands automatically
- Fully sandboxed agent runs by default — agents execute inside ephemeral Docker containers with no access to the host filesystem or network beyond what's explicitly configured
- Single binary, runs on a laptop — no infrastructure fleet, no MCP server farm; just a Go binary, and Docker
Given a GitHub repo and a milestone, godark runs a three-agent development loop:
- Fetch open issues from the milestone, sorted by priority (
p1→p2→p3→ unlabeled) - Resolve dependencies — issues declare
Blocked by: #NorDepends on: #Nin their body; skip any whose dependencies are still open - Implementer — Claude Code implements the issue, writes unit tests, and opens a PR
- Guard rails — verify the PR exists, contains
Closes #N, and didn't touch protected files - Quality reviewer — a separate Claude Code instance audits the PR for security, performance, and code quality issues; if it requests changes, the implementer retries before functional review begins
- Functional reviewer — another Claude Code instance reviews the PR against human-authored scenario specs, generates ephemeral integration tests, and approves or requests changes
- Retry loop — if either reviewer rejects, the implementer reads the review comments and pushes fixes (max N retries per gate)
- Merge or escalate — approved PRs are squash-merged; failed PRs are labeled
needs-human-review - Punchlist — for each merged PR, a tool-less punchlist agent generates 3-5 concrete manual acceptance tests (specific config values, commands, expected outcomes) rendered as checkboxes alongside the existing punchlist output
- Repeat — move to the next unblocked issue
# New project
godark new my-project --repo owner/my-project
# Existing project
godark init --repo owner/my-projectThen open the project in Claude Code and use the built-in skills to define your architecture, conventions, and roadmap. See the Getting Started guide for a full walkthrough.
Full documentation is available at godarkfactory.com:
- Getting Started — installation, setup, and tutorial
- CLI Reference — all commands, flags, and usage examples
- Configuration —
godark.yamldeep dive - Skills — slash commands for roadmaps, planning, issues, and more
- Licensing & Adoption — commercial use, data privacy, and FAQ
Each completed phase has a practical overview with real-world examples showing
what was built and how users experience it. These live in
docs/phase-overviews/:
| Phase | Overview |
|---|---|
| 1 | Skeleton & Orchestration — CLI scaffold, config, deps, dry-run |
| 2 | Quality & Vetting — godark vet validation framework |
| 3 | Docker Sandbox — container isolation, auth, cloning |
| 4 | Agent Execution — implementer, reviewer, guard rails, retry loop |
| 5 | Agent SDK Migration — SDK wrapper, role permissions, session resumption |
| 6 | Multi-Language Support — auto-detect, runtime config, pluggable Dockerfiles |
| 7 | Review Quality & Dashboard — run data, quality flags, web dashboard |
| 8 | Harness Engineering — harness templates, godark new, vet architecture |
| 9 | Harness-Aware Agent Execution — harness injection, dialogue, enforcement |
| 10 | Deterministic Verification Pipeline — verify step, auto-fix, bash deny-list |
| 11 | Run Analysis & Prompt Feedback — godark analyze, trends, prompt gaps |
| 12 | Complex Project Support — multi-module, codegen, secrets, CI checks |
| 13 | Human-in-the-Loop Review — graduated auto-merge, watch command, risk classifier, notifications |
| 14 | Bounded Concurrency — wave-barrier dispatcher, RunMode, serial post-wave merge, rate-limit batching, per-issue logs |
| 15 | Deferred — Server Mode & Centralized Operation |
| 16 | Public Release — ELv2 license, GoReleaser, Homebrew tap, release workflow, CONTRIBUTING.md |
| 17 | Configurable Base Branch — base branch config, PR targeting, prompt safety, run data tracking |
| 18 | Adaptive Agent Loop — recon agent, hybrid retry strategy, handoff context |
| 19 | Spring Cleaning — unified verdict parsing, typed constants, shared helpers, CLI consolidation |
| 20 | Terminal UI — Bubble Tea TUI, progress reporter, adaptive colors, hybrid output mode |
| 21 | Analytics Persistence — SQLite stats store, retry recovery rate, cost/duration breakdown, repo stats, flag-based prompt gaps |
| 23 | Watch & Daemon Mode — shared watch package, daemon mode, external merge detection, watch TUI and dashboard |
| 24 | Container Resource Tracking — Docker stats capture, per-step memory/CPU, analyze output, dashboard columns, host mode |
| 25 | Docker Socket Mount & Compose Lifecycle — compose config, socket mount, up/down lifecycle, env forwarding, doctor checks |
| 22 | Analytics Overhaul — first-pass rate, wasted cost, failure reasons, per-repo breakdown, sprint report command |
| 26 | Merge Coordinator Agent - dedicated conflict resolver, per-issue and rollup integration, telemetry, dashboard step |
| 27 | Agent Efficiency & Resilience - per-role judge thresholds, benign kill handling, model overrides, handoff context, generalized recon |
| 28 | Container Health Judge — real-time log streaming, idle/thrash/transport rules, container retry, intervention flow |
| 29 | Complete CLI Migration - delete Python runner, simplify Run(), remove --no-sandbox, unconditional Docker, test migration |
| 30 | Spec Tightening - GIVEN/WHEN/THEN validation, phase-scoped vet, spec delta generation, pipeline integration |
| 31 | Planner Agent - structured implementation plans, non-blocking pipeline step, implementer prompt injection, model override |
| 32 | Decision Flow Tracing - trace ID generation, SQLite persistence, godark trace CLI, dashboard copy button, TUI column |
| 33 | Semi-Structured Review - semi-formal reviewer prompt, config toggle, consistency quality gate, automatic re-run on contradiction |
To generate an overview for a newly completed phase, use /godark-create-phase-overview <phase-number>.
go build -o bin/godark ./cmd/godark
go test ./...See docs/roadmap/ for the full development roadmap.
Dark Factory is licensed under the Elastic License 2.0. Free for commercial use — the only restriction is you can't resell it as a hosted service. See the Licensing & Adoption page for details.