Skip to content

Latest commit

 

History

History
427 lines (315 loc) · 16.3 KB

File metadata and controls

427 lines (315 loc) · 16.3 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Prerequisites

  • Python 3.11+
  • Node.js 20+ (for UI development)
  • Claude Code CLI

Project Overview

This is an autonomous coding agent system with a React-based UI. It uses the Claude Agent SDK to build complete applications over multiple sessions using a two-agent pattern:

  1. Initializer Agent - First session reads an app spec and creates features in a SQLite database
  2. Coding Agent - Subsequent sessions implement features one by one, marking them as passing

Commands

Quick Start (Recommended)

# Windows - launches CLI menu
start.bat

# macOS/Linux
./start.sh

# Launch Web UI (serves pre-built React app)
start_ui.bat      # Windows
./start_ui.sh     # macOS/Linux

Python Backend (Manual)

# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate  # Windows
source venv/bin/activate  # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Run the main CLI launcher
python start.py

# Run agent directly for a project (use absolute path or registered name)
python autonomous_agent_demo.py --project-dir C:/Projects/my-app
python autonomous_agent_demo.py --project-dir my-app  # if registered

# YOLO mode: rapid prototyping without browser testing
python autonomous_agent_demo.py --project-dir my-app --yolo

# Parallel mode: run multiple agents concurrently (1-5 agents)
python autonomous_agent_demo.py --project-dir my-app --parallel --max-concurrency 3

YOLO Mode (Rapid Prototyping)

YOLO mode skips all testing for faster feature iteration:

# CLI
python autonomous_agent_demo.py --project-dir my-app --yolo

# UI: Toggle the lightning bolt button before starting the agent

What's different in YOLO mode:

  • No regression testing (skips feature_get_for_regression)
  • No Playwright MCP server (browser automation disabled)
  • Features marked passing after lint/type-check succeeds
  • Faster iteration for prototyping

What's the same:

  • Lint and type-check still run to verify code compiles
  • Feature MCP server for tracking progress
  • All other development tools available

When to use: Early prototyping when you want to quickly scaffold features without verification overhead. Switch back to standard mode for production-quality development.

React UI (in ui/ directory)

cd ui
npm install
npm run dev      # Development server (hot reload)
npm run build    # Production build (required for start_ui.bat)
npm run lint     # Run ESLint

Note: The start_ui.bat script serves the pre-built UI from ui/dist/. After making UI changes, run npm run build in the ui/ directory.

Testing

Python

ruff check .                      # Lint
mypy .                            # Type check
python test_security.py           # Security unit tests (136 tests)
python test_security_integration.py  # Integration tests (9 tests)

React UI

cd ui
npm run lint          # ESLint
npm run build         # Type check + build
npm run test:e2e      # Playwright end-to-end tests
npm run test:e2e:ui   # Playwright tests with UI

Code Quality

Configuration in pyproject.toml:

  • ruff: Line length 120, Python 3.11 target
  • mypy: Strict return type checking, ignores missing imports

Architecture

Core Python Modules

  • start.py - CLI launcher with project creation/selection menu
  • autonomous_agent_demo.py - Entry point for running the agent
  • agent.py - Agent session loop using Claude Agent SDK
  • client.py - ClaudeSDKClient configuration with security hooks and MCP servers
  • security.py - Bash command allowlist validation (ALLOWED_COMMANDS whitelist)
  • prompts.py - Prompt template loading with project-specific fallback
  • progress.py - Progress tracking, database queries, webhook notifications
  • registry.py - Project registry for mapping names to paths (cross-platform)
  • parallel_orchestrator.py - Concurrent agent execution with dependency-aware scheduling
  • api/dependency_resolver.py - Cycle detection (Kahn's algorithm + DFS) and dependency validation

Project Registry

Projects can be stored in any directory. The registry maps project names to paths using SQLite:

  • All platforms: ~/.autocoder/registry.db

The registry uses:

  • SQLite database with SQLAlchemy ORM
  • POSIX path format (forward slashes) for cross-platform compatibility
  • SQLite's built-in transaction handling for concurrency safety

Server API (server/)

The FastAPI server provides REST endpoints for the UI:

  • server/routers/projects.py - Project CRUD with registry integration
  • server/routers/features.py - Feature management
  • server/routers/agent.py - Agent control (start/stop/pause/resume)
  • server/routers/filesystem.py - Filesystem browser API with security controls
  • server/routers/spec_creation.py - WebSocket for interactive spec creation

Feature Management

Features are stored in SQLite (features.db) via SQLAlchemy. The agent interacts with features through an MCP server:

  • mcp_server/feature_mcp.py - MCP server exposing feature management tools
  • api/database.py - SQLAlchemy models (Feature table with priority, category, name, description, steps, passes, dependencies)

MCP tools available to the agent:

  • feature_get_stats - Progress statistics
  • feature_get_next - Get highest-priority pending feature (respects dependencies)
  • feature_claim_next - Atomically claim next available feature (for parallel mode)
  • feature_get_for_regression - Random passing features for regression testing
  • feature_mark_passing - Mark feature complete
  • feature_skip - Move feature to end of queue (for external blockers only)
  • feature_create_bulk - Initialize all features (used by initializer)
  • feature_add_dependency - Add dependency between features (with cycle detection)
  • feature_remove_dependency - Remove a dependency

Feature Behavior & Precedence

Important: After initialization, the feature database becomes the authoritative source of truth for what the agent should build. This has specific implications:

  1. Refactoring features override the original spec. If a refactoring feature says "migrate to TypeScript" but app_spec.txt said "use JavaScript", the feature takes precedence. The original spec is a starting point; features represent evolved requirements.

  2. The current codebase state is not a constraint. If the code is currently in JavaScript but a feature says "migrate to TypeScript", the agent's job is to change it. The current state is the problem being solved, not an excuse to skip.

  3. All feature categories are mandatory. Features come in three categories:

    • functional - New functionality to build
    • style - UI/UX requirements
    • refactoring - Code improvements and migrations

    All categories are equally mandatory. Refactoring features are not optional.

  4. Skipping is for external blockers only. The feature_skip tool should only be used for genuine external blockers (missing API credentials, unavailable services, hardware limitations). Internal issues like "code doesn't exist" or "this is a big change" are not valid skip reasons.

Example: Adding a feature "Migrate frontend from JavaScript to TypeScript" will cause the agent to convert all .js/.jsx files to .ts/.tsx, regardless of what the original spec said about the tech stack.

React UI (ui/)

  • Tech stack: React 19, TypeScript, TanStack Query, Tailwind CSS v4, Radix UI, dagre (graph layout)
  • src/App.tsx - Main app with project selection, kanban board, agent controls
  • src/hooks/useWebSocket.ts - Real-time updates via WebSocket (progress, agent status, logs, agent updates)
  • src/hooks/useProjects.ts - React Query hooks for API calls
  • src/lib/api.ts - REST API client
  • src/lib/types.ts - TypeScript type definitions

Key components:

  • AgentMissionControl.tsx - Dashboard showing active agents with mascots (Spark, Fizz, Octo, Hoot, Buzz)
  • DependencyGraph.tsx - Interactive node graph visualization with dagre layout
  • CelebrationOverlay.tsx - Confetti animation on feature completion
  • FolderBrowser.tsx - Server-side filesystem browser for project folder selection

Keyboard shortcuts (press ? for help):

  • D - Toggle debug panel
  • G - Toggle Kanban/Graph view
  • N - Add new feature
  • A - Toggle AI assistant
  • , - Open settings

Project Structure for Generated Apps

Projects can be stored in any directory (registered in ~/.autocoder/registry.db). Each project contains:

  • prompts/app_spec.txt - Application specification (XML format)
  • prompts/initializer_prompt.md - First session prompt
  • prompts/coding_prompt.md - Continuation session prompt
  • features.db - SQLite database with feature test cases
  • .agent.lock - Lock file to prevent multiple agent instances
  • .autocoder/allowed_commands.yaml - Project-specific bash command allowlist (optional)

Security Model

Defense-in-depth approach configured in client.py:

  1. OS-level sandbox for bash commands
  2. Filesystem restricted to project directory only
  3. Bash commands validated using hierarchical allowlist system

Extra Read Paths (Cross-Project File Access)

The agent can optionally read files from directories outside the project folder via the EXTRA_READ_PATHS environment variable. This enables referencing documentation, shared libraries, or other projects.

Configuration:

# Single path
EXTRA_READ_PATHS=/Users/me/docs

# Multiple paths (comma-separated)
EXTRA_READ_PATHS=/Users/me/docs,/opt/shared-libs,/Volumes/Data/reference

Security Controls:

All paths are validated before being granted read access:

  • Must be absolute paths (not relative)
  • Must exist and be directories
  • Paths are canonicalized via Path.resolve() to prevent .. traversal attacks
  • Sensitive directories are blocked (see blocklist below)
  • Only Read, Glob, and Grep operations are allowed (no Write/Edit)

Blocked Sensitive Directories:

The following directories (relative to home) are always blocked:

  • .ssh, .aws, .azure, .kube - Cloud/SSH credentials
  • .gnupg, .gpg, .password-store - Encryption keys
  • .docker, .config/gcloud - Container/cloud configs
  • .npmrc, .pypirc, .netrc - Package manager credentials

Example Output:

Created security settings at /path/to/project/.claude_settings.json
   - Sandbox enabled (OS-level bash isolation)
   - Filesystem restricted to: /path/to/project
   - Extra read paths (validated): /Users/me/docs, /opt/shared-libs

Per-Project Allowed Commands

The agent's bash command access is controlled through a hierarchical configuration system:

Command Hierarchy (highest to lowest priority):

  1. Hardcoded Blocklist (security.py) - NEVER allowed (dd, sudo, shutdown, etc.)
  2. Org Blocklist (~/.autocoder/config.yaml) - Cannot be overridden by projects
  3. Org Allowlist (~/.autocoder/config.yaml) - Available to all projects
  4. Global Allowlist (security.py) - Default commands (npm, git, curl, etc.)
  5. Project Allowlist (.autocoder/allowed_commands.yaml) - Project-specific commands

Project Configuration:

Each project can define custom allowed commands in .autocoder/allowed_commands.yaml:

version: 1
commands:
  # Exact command names
  - name: swift
    description: Swift compiler

  # Prefix wildcards (matches swiftc, swiftlint, swiftformat)
  - name: swift*
    description: All Swift development tools

  # Local project scripts
  - name: ./scripts/build.sh
    description: Project build script

Organization Configuration:

System administrators can set org-wide policies in ~/.autocoder/config.yaml:

version: 1

# Commands available to ALL projects
allowed_commands:
  - name: jq
    description: JSON processor

# Commands blocked across ALL projects (cannot be overridden)
blocked_commands:
  - aws        # Prevent accidental cloud operations
  - kubectl    # Block production deployments

Pattern Matching:

  • Exact: swift matches only swift
  • Wildcard: swift* matches swift, swiftc, swiftlint, etc.
  • Scripts: ./scripts/build.sh matches the script by name from any directory

Limits:

  • Maximum 100 commands per project config
  • Blocklisted commands (sudo, dd, shutdown, etc.) can NEVER be allowed
  • Org-level blocked commands cannot be overridden by project configs

Files:

  • security.py - Command validation logic and hardcoded blocklist
  • test_security.py - Unit tests for security system (136 tests)
  • test_security_integration.py - Integration tests with real hooks (9 tests)
  • TEST_SECURITY.md - Quick testing reference guide
  • examples/project_allowed_commands.yaml - Project config example (all commented by default)
  • examples/org_config.yaml - Org config example (all commented by default)
  • examples/README.md - Comprehensive guide with use cases, testing, and troubleshooting
  • PHASE3_SPEC.md - Specification for mid-session approval feature (future enhancement)

Ollama Local Models (Optional)

Run coding agents using local models via Ollama v0.14.0+:

  1. Install Ollama: https://ollama.com
  2. Start Ollama: ollama serve
  3. Pull a coding model: ollama pull qwen3-coder
  4. Configure .env:
    ANTHROPIC_BASE_URL=http://localhost:11434
    ANTHROPIC_AUTH_TOKEN=ollama
    API_TIMEOUT_MS=3000000
    ANTHROPIC_DEFAULT_SONNET_MODEL=qwen3-coder
    ANTHROPIC_DEFAULT_OPUS_MODEL=qwen3-coder
    ANTHROPIC_DEFAULT_HAIKU_MODEL=qwen3-coder
    
  5. Run autocoder normally - it will use your local Ollama models

Recommended coding models:

  • qwen3-coder - Good balance of speed and capability
  • deepseek-coder-v2 - Strong coding performance
  • codellama - Meta's code-focused model

Model tier mapping:

  • Use the same model for all tiers, or map different models per capability level
  • Larger models (70B+) work best for Opus tier
  • Smaller models (7B-20B) work well for Haiku tier

Known limitations:

  • Smaller context windows than Claude (model-dependent)
  • Extended context beta disabled (not supported by Ollama)
  • Performance depends on local hardware (GPU recommended)

Claude Code Integration

  • .claude/commands/create-spec.md - /create-spec slash command for interactive spec creation
  • .claude/skills/frontend-design/SKILL.md - Skill for distinctive UI design
  • .claude/templates/ - Prompt templates copied to new projects
  • examples/ - Configuration examples and documentation for security settings

Key Patterns

Prompt Loading Fallback Chain

  1. Project-specific: {project_dir}/prompts/{name}.md
  2. Base template: .claude/templates/{name}.template.md

Agent Session Flow

  1. Check if features.db has features (determines initializer vs coding agent)
  2. Create ClaudeSDKClient with security settings
  3. Send prompt and stream response
  4. Auto-continue with 3-second delay between sessions

Real-time UI Updates

The UI receives updates via WebSocket (/ws/projects/{project_name}):

  • progress - Test pass counts (passing, in_progress, total)
  • agent_status - Running/paused/stopped/crashed
  • log - Agent output lines with optional featureId/agentIndex for attribution
  • feature_update - Feature status changes
  • agent_update - Multi-agent state updates (thinking/working/testing/success/error) with mascot names

Parallel Mode

When running with --parallel, the orchestrator:

  1. Spawns multiple Claude agents as subprocesses (up to --max-concurrency)
  2. Each agent claims features atomically via feature_claim_next
  3. Features blocked by unmet dependencies are skipped
  4. Browser contexts are isolated per agent using --isolated flag
  5. AgentTracker parses output and emits agent_update messages for UI

Process Limits (Parallel Mode)

The orchestrator enforces strict bounds on concurrent processes:

  • MAX_PARALLEL_AGENTS = 5 - Maximum concurrent coding agents
  • MAX_TOTAL_AGENTS = 10 - Hard limit on total agents (coding + testing)
  • Testing agents are capped at max_concurrency (same as coding agents)
  • Total process count never exceeds 11 Python processes (1 orchestrator + 5 coding + 5 testing)

Design System

The UI uses a neobrutalism design with Tailwind CSS v4:

  • CSS variables defined in ui/src/styles/globals.css via @theme directive
  • Custom animations: animate-slide-in, animate-pulse-neo, animate-shimmer
  • Color tokens: --color-neo-pending (yellow), --color-neo-progress (cyan), --color-neo-done (green)