Skip to content

peter-stratton/dark-factory

Repository files navigation

     _            _           __            _
  __| | __ _ _ __| | __      / _| __ _  ___| |_ ___  _ __ _   _
 / _` |/ _` | '__| |/ /_____| |_ / _` |/ __| __/ _ \| '__| | | |
| (_| | (_| | |  |   <______|  _| (_| | (__| || (_) | |  | |_| |
 \__,_|\__,_|_|  |_|\_\     |_|  \__,_|\___|\__\___/|_|   \__, |
                                                           |___/

A Go CLI built for Claude Code that orchestrates autonomous AI agents to implement GitHub issues, review their own work, and merge — without human intervention.

Documentation · Getting Started · Releases

Philosophy

The hard part of software engineering isn't typing code — it's deciding what to build and how it fits. Dark Factory keeps those decisions with humans. Engineers write the roadmap, define architecture layers, design conventions, and author issue specs. Agents operate within those constraints. The harness is the design.

This is a collaborative architecture tool, not a "throw a ticket at an AI and hope for the best" system. The adversarial review model reinforces this: a separate reviewer agent checks whether the code respects the architecture a human defined, follows conventions a human wrote, and meets acceptance criteria a human specified. Every judgment call that shapes a codebase stays with the humans who understand it.

Dark Factory has been built entirely by its own agent pipeline — every feature was implemented, reviewed, and merged by godark run. The humans write specs and design harnesses; the agents write code.

Install

Homebrew (macOS):

brew install peter-stratton/dark-factory/godark

Go install:

go install github.com/peter-stratton/dark-factory/cmd/godark@latest

Binary download: grab a pre-built binary from GitHub Releases.

Platform support

Dark Factory is built for Claude Code and GitHub. The architecture is designed around Claude Code's specific capabilities — session resumption, CLAUDE.md as a control surface, slash command skills, and sandboxed execution.

Layer Supported
AI agent Claude Code (Anthropic)
Version control GitHub

Features

  • Three-agent pipeline — implementer, quality reviewer, and functional reviewer are independent agents with isolated permissions; reviewers literally cannot edit files
  • Specification-driven quality gates — human-authored scenario specs define "done"; the functional reviewer generates ephemeral integration tests from specs, not just rubber-stamping the diff
  • Architecture-as-code enforcement — machine-readable layer definitions validated by godark vet; reviewers check architectural compliance, not just correctness
  • Structured agent dialogue — implementer posts reasoning as PR comments, reviewers challenge it; the PR thread is an auditable record of adversarial design review
  • Full run observability — local web dashboard with review chain timelines, quality flags, tool traces, and agent dialogue history for every issue
  • Harness engineering lifecycle — scaffold, validate, and enforce project constraints with godark new, godark init, godark vet, and six harness types
  • Auto-detected multi-language support — detects project type from marker files and configures the sandbox, build, and test commands automatically
  • Fully sandboxed agent runs by default — agents execute inside ephemeral Docker containers with no access to the host filesystem or network beyond what's explicitly configured
  • Single binary, runs on a laptop — no infrastructure fleet, no MCP server farm; just a Go binary, and Docker

How it works

Given a GitHub repo and a milestone, godark runs a three-agent development loop:

  1. Fetch open issues from the milestone, sorted by priority (p1p2p3 → unlabeled)
  2. Resolve dependencies — issues declare Blocked by: #N or Depends on: #N in their body; skip any whose dependencies are still open
  3. Implementer — Claude Code implements the issue, writes unit tests, and opens a PR
  4. Guard rails — verify the PR exists, contains Closes #N, and didn't touch protected files
  5. Quality reviewer — a separate Claude Code instance audits the PR for security, performance, and code quality issues; if it requests changes, the implementer retries before functional review begins
  6. Functional reviewer — another Claude Code instance reviews the PR against human-authored scenario specs, generates ephemeral integration tests, and approves or requests changes
  7. Retry loop — if either reviewer rejects, the implementer reads the review comments and pushes fixes (max N retries per gate)
  8. Merge or escalate — approved PRs are squash-merged; failed PRs are labeled needs-human-review
  9. Punchlist — for each merged PR, a tool-less punchlist agent generates 3-5 concrete manual acceptance tests (specific config values, commands, expected outcomes) rendered as checkboxes alongside the existing punchlist output
  10. Repeat — move to the next unblocked issue

Quick start

# New project
godark new my-project --repo owner/my-project

# Existing project
godark init --repo owner/my-project

Then open the project in Claude Code and use the built-in skills to define your architecture, conventions, and roadmap. See the Getting Started guide for a full walkthrough.

Documentation

Full documentation is available at godarkfactory.com:

Phase overviews

Each completed phase has a practical overview with real-world examples showing what was built and how users experience it. These live in docs/phase-overviews/:

Phase Overview
1 Skeleton & Orchestration — CLI scaffold, config, deps, dry-run
2 Quality & Vettinggodark vet validation framework
3 Docker Sandbox — container isolation, auth, cloning
4 Agent Execution — implementer, reviewer, guard rails, retry loop
5 Agent SDK Migration — SDK wrapper, role permissions, session resumption
6 Multi-Language Support — auto-detect, runtime config, pluggable Dockerfiles
7 Review Quality & Dashboard — run data, quality flags, web dashboard
8 Harness Engineering — harness templates, godark new, vet architecture
9 Harness-Aware Agent Execution — harness injection, dialogue, enforcement
10 Deterministic Verification Pipeline — verify step, auto-fix, bash deny-list
11 Run Analysis & Prompt Feedbackgodark analyze, trends, prompt gaps
12 Complex Project Support — multi-module, codegen, secrets, CI checks
13 Human-in-the-Loop Review — graduated auto-merge, watch command, risk classifier, notifications
14 Bounded Concurrency — wave-barrier dispatcher, RunMode, serial post-wave merge, rate-limit batching, per-issue logs
15 Deferred — Server Mode & Centralized Operation
16 Public Release — ELv2 license, GoReleaser, Homebrew tap, release workflow, CONTRIBUTING.md
17 Configurable Base Branch — base branch config, PR targeting, prompt safety, run data tracking
18 Adaptive Agent Loop — recon agent, hybrid retry strategy, handoff context
19 Spring Cleaning — unified verdict parsing, typed constants, shared helpers, CLI consolidation
20 Terminal UI — Bubble Tea TUI, progress reporter, adaptive colors, hybrid output mode
21 Analytics Persistence — SQLite stats store, retry recovery rate, cost/duration breakdown, repo stats, flag-based prompt gaps
23 Watch & Daemon Mode — shared watch package, daemon mode, external merge detection, watch TUI and dashboard
24 Container Resource Tracking — Docker stats capture, per-step memory/CPU, analyze output, dashboard columns, host mode
25 Docker Socket Mount & Compose Lifecycle — compose config, socket mount, up/down lifecycle, env forwarding, doctor checks
22 Analytics Overhaul — first-pass rate, wasted cost, failure reasons, per-repo breakdown, sprint report command
26 Merge Coordinator Agent - dedicated conflict resolver, per-issue and rollup integration, telemetry, dashboard step
27 Agent Efficiency & Resilience - per-role judge thresholds, benign kill handling, model overrides, handoff context, generalized recon
28 Container Health Judge — real-time log streaming, idle/thrash/transport rules, container retry, intervention flow
29 Complete CLI Migration - delete Python runner, simplify Run(), remove --no-sandbox, unconditional Docker, test migration
30 Spec Tightening - GIVEN/WHEN/THEN validation, phase-scoped vet, spec delta generation, pipeline integration
31 Planner Agent - structured implementation plans, non-blocking pipeline step, implementer prompt injection, model override
32 Decision Flow Tracing - trace ID generation, SQLite persistence, godark trace CLI, dashboard copy button, TUI column
33 Semi-Structured Review - semi-formal reviewer prompt, config toggle, consistency quality gate, automatic re-run on contradiction

To generate an overview for a newly completed phase, use /godark-create-phase-overview <phase-number>.

Building

go build -o bin/godark ./cmd/godark
go test ./...

Status

See docs/roadmap/ for the full development roadmap.

License

Dark Factory is licensed under the Elastic License 2.0. Free for commercial use — the only restriction is you can't resell it as a hosted service. See the Licensing & Adoption page for details.

About

Human constraints, interactive planning, autonomous execution.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors