CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Prerequisites

Python 3.11+
Node.js 20+ (for UI development)
Claude Code CLI

Project Overview

This is an autonomous coding agent system with a React-based UI. It uses the Claude Agent SDK to build complete applications over multiple sessions using a two-agent pattern:

Initializer Agent - First session reads an app spec and creates features in a SQLite database
Coding Agent - Subsequent sessions implement features one by one, marking them as passing

Commands

Quick Start (Recommended)

# Windows - launches CLI menu
start.bat

# macOS/Linux
./start.sh

# Launch Web UI (serves pre-built React app)
start_ui.bat      # Windows
./start_ui.sh     # macOS/Linux

Python Backend (Manual)

# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate  # Windows
source venv/bin/activate  # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Run the main CLI launcher
python start.py

# Run agent directly for a project (use absolute path or registered name)
python autonomous_agent_demo.py --project-dir C:/Projects/my-app
python autonomous_agent_demo.py --project-dir my-app  # if registered

# YOLO mode: rapid prototyping without browser testing
python autonomous_agent_demo.py --project-dir my-app --yolo

# Parallel mode: run multiple agents concurrently (1-5 agents)
python autonomous_agent_demo.py --project-dir my-app --parallel --max-concurrency 3

YOLO Mode (Rapid Prototyping)

YOLO mode skips all testing for faster feature iteration:

# CLI
python autonomous_agent_demo.py --project-dir my-app --yolo

# UI: Toggle the lightning bolt button before starting the agent

What's different in YOLO mode:

No regression testing (skips feature_get_for_regression)
No Playwright MCP server (browser automation disabled)
Features marked passing after lint/type-check succeeds
Faster iteration for prototyping

What's the same:

Lint and type-check still run to verify code compiles
Feature MCP server for tracking progress
All other development tools available

When to use: Early prototyping when you want to quickly scaffold features without verification overhead. Switch back to standard mode for production-quality development.

React UI (in ui/ directory)

cd ui
npm install
npm run dev      # Development server (hot reload)
npm run build    # Production build (required for start_ui.bat)
npm run lint     # Run ESLint

Note: The start_ui.bat script serves the pre-built UI from ui/dist/. After making UI changes, run npm run build in the ui/ directory.

Testing

Python

ruff check .                      # Lint
mypy .                            # Type check
python test_security.py           # Security unit tests (136 tests)
python test_security_integration.py  # Integration tests (9 tests)

React UI

cd ui
npm run lint          # ESLint
npm run build         # Type check + build
npm run test:e2e      # Playwright end-to-end tests
npm run test:e2e:ui   # Playwright tests with UI

Code Quality

Configuration in pyproject.toml:

ruff: Line length 120, Python 3.11 target
mypy: Strict return type checking, ignores missing imports

Architecture

Core Python Modules

start.py - CLI launcher with project creation/selection menu
autonomous_agent_demo.py - Entry point for running the agent
agent.py - Agent session loop using Claude Agent SDK
client.py - ClaudeSDKClient configuration with security hooks and MCP servers
security.py - Bash command allowlist validation (ALLOWED_COMMANDS whitelist)
prompts.py - Prompt template loading with project-specific fallback
progress.py - Progress tracking, database queries, webhook notifications
registry.py - Project registry for mapping names to paths (cross-platform)
parallel_orchestrator.py - Concurrent agent execution with dependency-aware scheduling
api/dependency_resolver.py - Cycle detection (Kahn's algorithm + DFS) and dependency validation

Project Registry

Projects can be stored in any directory. The registry maps project names to paths using SQLite:

All platforms: ~/.autocoder/registry.db

The registry uses:

SQLite database with SQLAlchemy ORM
POSIX path format (forward slashes) for cross-platform compatibility
SQLite's built-in transaction handling for concurrency safety

Server API (server/)

The FastAPI server provides REST endpoints for the UI:

server/routers/projects.py - Project CRUD with registry integration
server/routers/features.py - Feature management
server/routers/agent.py - Agent control (start/stop/pause/resume)
server/routers/filesystem.py - Filesystem browser API with security controls
server/routers/spec_creation.py - WebSocket for interactive spec creation

Feature Management

Features are stored in SQLite (features.db) via SQLAlchemy. The agent interacts with features through an MCP server:

mcp_server/feature_mcp.py - MCP server exposing feature management tools
api/database.py - SQLAlchemy models (Feature table with priority, category, name, description, steps, passes, dependencies)

MCP tools available to the agent:

feature_get_stats - Progress statistics
feature_get_next - Get highest-priority pending feature (respects dependencies)
feature_claim_next - Atomically claim next available feature (for parallel mode)
feature_get_for_regression - Random passing features for regression testing
feature_mark_passing - Mark feature complete
feature_skip - Move feature to end of queue (for external blockers only)
feature_create_bulk - Initialize all features (used by initializer)
feature_add_dependency - Add dependency between features (with cycle detection)
feature_remove_dependency - Remove a dependency

Feature Behavior & Precedence

Important: After initialization, the feature database becomes the authoritative source of truth for what the agent should build. This has specific implications:

Refactoring features override the original spec. If a refactoring feature says "migrate to TypeScript" but app_spec.txt said "use JavaScript", the feature takes precedence. The original spec is a starting point; features represent evolved requirements.
The current codebase state is not a constraint. If the code is currently in JavaScript but a feature says "migrate to TypeScript", the agent's job is to change it. The current state is the problem being solved, not an excuse to skip.
All feature categories are mandatory. Features come in three categories:
- functional - New functionality to build
- style - UI/UX requirements
- refactoring - Code improvements and migrations
All categories are equally mandatory. Refactoring features are not optional.
Skipping is for external blockers only. The feature_skip tool should only be used for genuine external blockers (missing API credentials, unavailable services, hardware limitations). Internal issues like "code doesn't exist" or "this is a big change" are not valid skip reasons.

Example: Adding a feature "Migrate frontend from JavaScript to TypeScript" will cause the agent to convert all .js/.jsx files to .ts/.tsx, regardless of what the original spec said about the tech stack.

React UI (ui/)

Tech stack: React 19, TypeScript, TanStack Query, Tailwind CSS v4, Radix UI, dagre (graph layout)
src/App.tsx - Main app with project selection, kanban board, agent controls
src/hooks/useWebSocket.ts - Real-time updates via WebSocket (progress, agent status, logs, agent updates)
src/hooks/useProjects.ts - React Query hooks for API calls
src/lib/api.ts - REST API client
src/lib/types.ts - TypeScript type definitions

Key components:

AgentMissionControl.tsx - Dashboard showing active agents with mascots (Spark, Fizz, Octo, Hoot, Buzz)
DependencyGraph.tsx - Interactive node graph visualization with dagre layout
CelebrationOverlay.tsx - Confetti animation on feature completion
FolderBrowser.tsx - Server-side filesystem browser for project folder selection

Keyboard shortcuts (press ? for help):

D - Toggle debug panel
G - Toggle Kanban/Graph view
N - Add new feature
A - Toggle AI assistant
, - Open settings

Project Structure for Generated Apps

Projects can be stored in any directory (registered in ~/.autocoder/registry.db). Each project contains:

prompts/app_spec.txt - Application specification (XML format)
prompts/initializer_prompt.md - First session prompt
prompts/coding_prompt.md - Continuation session prompt
features.db - SQLite database with feature test cases
.agent.lock - Lock file to prevent multiple agent instances
.autocoder/allowed_commands.yaml - Project-specific bash command allowlist (optional)

Security Model

Defense-in-depth approach configured in client.py:

OS-level sandbox for bash commands
Filesystem restricted to project directory only
Bash commands validated using hierarchical allowlist system

Extra Read Paths (Cross-Project File Access)

The agent can optionally read files from directories outside the project folder via the EXTRA_READ_PATHS environment variable. This enables referencing documentation, shared libraries, or other projects.

Configuration:

# Single path
EXTRA_READ_PATHS=/Users/me/docs

# Multiple paths (comma-separated)
EXTRA_READ_PATHS=/Users/me/docs,/opt/shared-libs,/Volumes/Data/reference

Security Controls:

All paths are validated before being granted read access:

Must be absolute paths (not relative)
Must exist and be directories
Paths are canonicalized via Path.resolve() to prevent .. traversal attacks
Sensitive directories are blocked (see blocklist below)
Only Read, Glob, and Grep operations are allowed (no Write/Edit)

Blocked Sensitive Directories:

The following directories (relative to home) are always blocked:

.ssh, .aws, .azure, .kube - Cloud/SSH credentials
.gnupg, .gpg, .password-store - Encryption keys
.docker, .config/gcloud - Container/cloud configs
.npmrc, .pypirc, .netrc - Package manager credentials

Example Output:

Created security settings at /path/to/project/.claude_settings.json
   - Sandbox enabled (OS-level bash isolation)
   - Filesystem restricted to: /path/to/project
   - Extra read paths (validated): /Users/me/docs, /opt/shared-libs

Per-Project Allowed Commands

The agent's bash command access is controlled through a hierarchical configuration system:

Command Hierarchy (highest to lowest priority):

Hardcoded Blocklist (security.py) - NEVER allowed (dd, sudo, shutdown, etc.)
Org Blocklist (~/.autocoder/config.yaml) - Cannot be overridden by projects
Org Allowlist (~/.autocoder/config.yaml) - Available to all projects
Global Allowlist (security.py) - Default commands (npm, git, curl, etc.)
Project Allowlist (.autocoder/allowed_commands.yaml) - Project-specific commands

Project Configuration:

Each project can define custom allowed commands in .autocoder/allowed_commands.yaml:

version: 1
commands:
  # Exact command names
  - name: swift
    description: Swift compiler

  # Prefix wildcards (matches swiftc, swiftlint, swiftformat)
  - name: swift*
    description: All Swift development tools

  # Local project scripts
  - name: ./scripts/build.sh
    description: Project build script

Organization Configuration:

System administrators can set org-wide policies in ~/.autocoder/config.yaml:

version: 1

# Commands available to ALL projects
allowed_commands:
  - name: jq
    description: JSON processor

# Commands blocked across ALL projects (cannot be overridden)
blocked_commands:
  - aws        # Prevent accidental cloud operations
  - kubectl    # Block production deployments

Pattern Matching:

Exact: swift matches only swift
Wildcard: swift* matches swift, swiftc, swiftlint, etc.
Scripts: ./scripts/build.sh matches the script by name from any directory

Limits:

Maximum 100 commands per project config
Blocklisted commands (sudo, dd, shutdown, etc.) can NEVER be allowed
Org-level blocked commands cannot be overridden by project configs

Files:

security.py - Command validation logic and hardcoded blocklist
test_security.py - Unit tests for security system (136 tests)
test_security_integration.py - Integration tests with real hooks (9 tests)
TEST_SECURITY.md - Quick testing reference guide
examples/project_allowed_commands.yaml - Project config example (all commented by default)
examples/org_config.yaml - Org config example (all commented by default)
examples/README.md - Comprehensive guide with use cases, testing, and troubleshooting
PHASE3_SPEC.md - Specification for mid-session approval feature (future enhancement)

Ollama Local Models (Optional)

Run coding agents using local models via Ollama v0.14.0+:

Install Ollama: https://ollama.com
Start Ollama: ollama serve
Pull a coding model: ollama pull qwen3-coder

Configure .env:

ANTHROPIC_BASE_URL=http://localhost:11434
ANTHROPIC_AUTH_TOKEN=ollama
API_TIMEOUT_MS=3000000
ANTHROPIC_DEFAULT_SONNET_MODEL=qwen3-coder
ANTHROPIC_DEFAULT_OPUS_MODEL=qwen3-coder
ANTHROPIC_DEFAULT_HAIKU_MODEL=qwen3-coder

Run autocoder normally - it will use your local Ollama models

Recommended coding models:

qwen3-coder - Good balance of speed and capability
deepseek-coder-v2 - Strong coding performance
codellama - Meta's code-focused model

Model tier mapping:

Use the same model for all tiers, or map different models per capability level
Larger models (70B+) work best for Opus tier
Smaller models (7B-20B) work well for Haiku tier

Known limitations:

Smaller context windows than Claude (model-dependent)
Extended context beta disabled (not supported by Ollama)
Performance depends on local hardware (GPU recommended)

Claude Code Integration

.claude/commands/create-spec.md - /create-spec slash command for interactive spec creation
.claude/skills/frontend-design/SKILL.md - Skill for distinctive UI design
.claude/templates/ - Prompt templates copied to new projects
examples/ - Configuration examples and documentation for security settings

Key Patterns

Prompt Loading Fallback Chain

Project-specific: {project_dir}/prompts/{name}.md
Base template: .claude/templates/{name}.template.md

Agent Session Flow

Check if features.db has features (determines initializer vs coding agent)
Create ClaudeSDKClient with security settings
Send prompt and stream response
Auto-continue with 3-second delay between sessions

Real-time UI Updates

The UI receives updates via WebSocket (/ws/projects/{project_name}):

progress - Test pass counts (passing, in_progress, total)
agent_status - Running/paused/stopped/crashed
log - Agent output lines with optional featureId/agentIndex for attribution
feature_update - Feature status changes
agent_update - Multi-agent state updates (thinking/working/testing/success/error) with mascot names

Parallel Mode

When running with --parallel, the orchestrator:

Spawns multiple Claude agents as subprocesses (up to --max-concurrency)
Each agent claims features atomically via feature_claim_next
Features blocked by unmet dependencies are skipped
Browser contexts are isolated per agent using --isolated flag
AgentTracker parses output and emits agent_update messages for UI

Process Limits (Parallel Mode)

The orchestrator enforces strict bounds on concurrent processes:

MAX_PARALLEL_AGENTS = 5 - Maximum concurrent coding agents
MAX_TOTAL_AGENTS = 10 - Hard limit on total agents (coding + testing)
Testing agents are capped at max_concurrency (same as coding agents)
Total process count never exceeds 11 Python processes (1 orchestrator + 5 coding + 5 testing)

Design System

The UI uses a neobrutalism design with Tailwind CSS v4:

CSS variables defined in ui/src/styles/globals.css via @theme directive
Custom animations: animate-slide-in, animate-pulse-neo, animate-shimmer
Color tokens: --color-neo-pending (yellow), --color-neo-progress (cyan), --color-neo-done (green)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Prerequisites

Project Overview

Commands

Quick Start (Recommended)

Python Backend (Manual)

YOLO Mode (Rapid Prototyping)

React UI (in ui/ directory)

Testing

Python

React UI

Code Quality

Architecture

Core Python Modules

Project Registry

Server API (server/)

Feature Management

Feature Behavior & Precedence

React UI (ui/)

Project Structure for Generated Apps

Security Model

Extra Read Paths (Cross-Project File Access)

Per-Project Allowed Commands

Ollama Local Models (Optional)

Claude Code Integration

Key Patterns

Prompt Loading Fallback Chain

Agent Session Flow

Real-time UI Updates

Parallel Mode

Process Limits (Parallel Mode)

Design System

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Prerequisites

Project Overview

Commands

Quick Start (Recommended)

Python Backend (Manual)

YOLO Mode (Rapid Prototyping)

React UI (in ui/ directory)

Testing

Python

React UI

Code Quality

Architecture

Core Python Modules

Project Registry

Server API (server/)

Feature Management

Feature Behavior & Precedence

React UI (ui/)

Project Structure for Generated Apps

Security Model

Extra Read Paths (Cross-Project File Access)

Per-Project Allowed Commands

Ollama Local Models (Optional)

Claude Code Integration

Key Patterns

Prompt Loading Fallback Chain

Agent Session Flow

Real-time UI Updates

Parallel Mode

Process Limits (Parallel Mode)

Design System