GitHub - maxritter/pilot-shell: Claude Code is powerful. Pilot Shell makes it reliable. Start a task, grab a coffee, come back to production-grade code. Tests enforced. Context preserved. Quality automated.

Claude Code is powerful. Pilot Shell makes it reliable.

Start a task, grab a coffee, come back to production-grade code.
Tests enforced. Context preserved. Quality automated.

⭐ Star this repo · 🌐 Website · 🔔 Follow for updates · 📋 Changelog · 📄 License

curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash

Works on macOS, Linux, and Windows (WSL2).

Why I Built This

I'm Max, a senior IT freelancer from Germany. My clients hire me to ship production-quality code — tested, typed, formatted, and reviewed.

Claude Code writes code fast. But without structure, it skips tests, loses context, and produces inconsistent results — especially on complex, established codebases where there are real conventions to follow and real regressions to catch. I tried other frameworks. Most of them add complexity — dozens of agents, elaborate scaffolding, thousands of lines of instruction files — but the output doesn't get better. You just burn more tokens, wait longer, and deal with more things breaking.

So I built Pilot Shell. Instead of adding process on top, it bakes quality into every interaction. Linting, formatting, and type checking run as enforced hooks on every edit. TDD is mandatory, not suggested. Context is preserved across sessions. Every rule exists because I hit a real problem: a bug that slipped through, a regression that shouldn't have happened, a session where Claude cut corners and nobody caught it.

This isn't a vibe coding tool, it's true agentic engineering, made simple. You install it in any existing project, run pilot, then /sync to learn your codebase. The guardrails are just there. The end result is that you can walk away — start a /spec task, approve the plan, go grab a coffee. When you come back, the work is tested, verified, formatted, and ready to ship.

Getting Started

Prerequisites

Claude Subscription: Solo developers should choose Max 5x for moderate usage or Max 20x for heavy usage. Teams and companies should use Team Premium which provides 6.25x usage per member plus SSO, admin tools, and billing management. Using the API instead may lead to much higher costs.

Installation

Works with any existing project. Pilot Shell doesn't scaffold or restructure your code — it installs alongside your project and adapts to your conventions. cd into your project folder, then run:

curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash

Choose your environment:

Local Installation — Install directly on your system using Homebrew. Works on macOS, Linux, and Windows (WSL2).
Dev Container — Pre-configured, isolated environment with all tools ready. No system conflicts and works on any OS.

After installation, run pilot or ccp in your project folder to start Pilot Shell.

What the installer does

8-step installer with progress tracking, rollback on failure, and idempotent re-runs:

Prerequisites — Checks Homebrew, Node.js, Python 3.12+, uv, git
Dependencies — Installs Vexor, playwright-cli, Claude Code, property-based testing tools
Shell integration — Auto-configures bash, fish, and zsh with pilot alias
Config & Claude files — Sets up .claude/ plugin, rules, commands, hooks, MCP servers
VS Code extensions — Installs recommended extensions for your stack
Dev Container — Auto-setup with all tools pre-configured
Automated updater — Checks for updates on launch with release notes and one-key upgrade
Cross-platform — macOS, Linux, Windows (WSL2)

Installing a specific version or uninstalling

Specific version (see releases):

export VERSION=7.0.2
curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash

Uninstall — removes the Pilot binary, plugin files, managed commands/rules, settings and shell aliases:

curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/uninstall.sh | bash

How It Works

/spec — Spec-Driven Development

Features, bug fixes, refactoring — describe it and /spec handles the rest. Auto-detects the task type and adapts the flow.

pilot
> /spec "Add user authentication with OAuth and JWT tokens"

Plan  →  Approve  →  Implement (TDD)  →  Verify  →  Done
                                            ↑         ↓
                                            └── Loop──┘

Plan Phase

Explores entire codebase with semantic search (Vexor)
Asks clarifying questions before committing to a design
Writes detailed spec to docs/plans/ with scope, tasks, and definition of done
Plan-verifier sub-agent independently validates completeness
Waits for your approval — you can edit the plan first

Implement Phase

Creates an isolated git worktree — main branch stays clean
Implements each task with strict TDD (RED → GREEN → REFACTOR)
Quality hooks auto-lint, format, and type-check every edit
Runs full test suite after each task to catch regressions early

Verify Phase

Runs full test suite — unit, integration, and E2E
Executes actual program to verify real-world behavior (not just tests)
Three review sub-agents in parallel: compliance, quality, and goal verification
Auto-fixes all findings, re-verifies until clean
Squash merges worktree back to main branch on success

Quick Mode

Just chat — no plan, no approval gate. Quality hooks and TDD enforcement still apply. Best for small tasks and exploration.

Other Commands

Command	What it does
`/sync`	Explores your codebase, discovers conventions, builds a search index, updates project rules. Run once initially, then anytime your project changes.
`/learn`	Captures non-obvious discoveries as reusable skills. Triggers automatically or on demand.
`/vault`	Shares rules, commands, and skills across your team via a private Git repository.

Extensibility

Create your own rules, commands, and skills in .claude/ — all plain markdown. Add MCP servers in .mcp.json and run /sync to generate docs.

Type	Loaded	Best for
Rules	Every session, or conditionally by file type	Guidelines Claude should always follow
Commands	On demand via `/command`	Specific workflows or multi-step tasks
Skills	On demand, created via `/learn`	Reusable knowledge from past sessions

Demo

A full-stack project — created from scratch with a single prompt, then extended with 3 features built in parallel using /spec. Every line of code was planned, implemented, tested, and verified entirely by AI. Zero manual code edits, zero bug fixes by a human. Watch the demo and browse the code →

Under the Hood

The Hooks Pipeline

Hooks fire automatically across the entire lifecycle — formatting, linting, type checking, TDD enforcement, context preservation, and memory capture. Every file edit triggers quality checks. Every session start restores state. Every session end persists context.

All hooks by lifecycle event

SessionStart (on startup, clear, or compact)

Hook	Type	What it does
Memory loader	Blocking	Loads persistent context from Pilot Shell Console memory
`post_compact_restore.py`	Blocking	After auto-compaction: re-injects active plan, task state, and context
Session tracker	Async	Initializes user message tracking for the session

PreToolUse (before search, web, or task tools)

Hook	Type	What it does
`tool_redirect.py`	Blocking	Blocks WebSearch/WebFetch (MCP alternatives exist), EnterPlanMode/ExitPlanMode (/spec conflict). Hints vexor for semantic Grep patterns.

PostToolUse (after every Write / Edit / MultiEdit)

Hook	Type	What it does
`file_checker.py`	Blocking	Dispatches to language-specific checkers: Python (ruff + basedpyright), TypeScript (Prettier + ESLint + tsc), Go (gofmt + golangci-lint). Auto-fixes formatting.
`tdd_enforcer.py`	Non-blocking	Checks if implementation files were modified without failing tests first. Shows reminder to write tests. Excludes test files, docs, config, TSX, and infrastructure.
`context_monitor.py`	Non-blocking	Monitors context usage. Warns at ~80% (informational) and ~90%+ (caution). Prompts `/learn` at key thresholds.
Memory observer	Async	Captures development observations to persistent memory.

PreCompact (before auto-compaction)

Hook	Type	What it does
`pre_compact.py`	Blocking	Captures Pilot Shell state (active plan, task list, key context) to persistent memory before compaction fires.

Stop (when Claude tries to finish)

Hook	Type	What it does
`spec_stop_guard.py`	Blocking	If an active spec exists with PENDING or COMPLETE status, blocks stopping. Forces verification to complete before the session can end.
Session summarizer	Async	Saves session observations to persistent memory for future sessions.

SessionEnd (when the session closes)

Hook	Type	What it does
`session_end.py`	Blocking	Stops the worker daemon when no other Pilot Shell sessions are active. Sends real-time dashboard notification.

Context Preservation

Pilot Shell preserves context automatically across compaction boundaries. Before compaction fires, hooks capture the active plan, task list, and key decisions to persistent memory. After compaction, hooks restore everything — work continues exactly where it left off. Multiple sessions can run in parallel on the same project without interference.

How the effective context display works

Claude Code reserves ~16.5% of the context window as a compaction buffer, triggering auto-compaction at ~83.5% raw usage. Pilot Shell rescales this to an effective 0–100% range so the status bar fills naturally to 100% right before compaction fires. A ▓ buffer indicator at the end of the bar shows the reserved zone. The context monitor warns at ~80% effective (informational) and ~90%+ effective (caution) — no confusing raw percentages.

Smart Model Routing

Opus for planning and verification — where reasoning quality matters most. Sonnet for implementation — where a clear spec makes fast execution predictable. All model assignments are configurable per-component via the Pilot Shell Console settings.

Phase-by-phase breakdown

Phase	Default	Why
Planning	Opus	Exploring your codebase, designing architecture, and writing the spec requires deep reasoning. A good plan is the foundation of everything.
Plan Verification	Opus	Catching gaps, missing edge cases, and requirement mismatches before implementation saves expensive rework.
Implementation	Sonnet	With a solid plan, writing code is straightforward. Sonnet is fast, cost-effective, and produces high-quality code when guided by a clear spec.
Code Verification	Opus	Independent code review against the plan requires the same reasoning depth as planning — catching subtle bugs, logic errors, and spec deviations.

Choose between Sonnet 4.6 and Opus 4.6 for the main session, each command, and sub-agents. A global "Extended Context (1M)" toggle enables the 1M token context window across all models simultaneously. Note: 1M context models require a Max (20x) or Enterprise subscription — not available to all users.

Built-in Rules & Standards

Production-tested best practices loaded into every session. Core rules cover workflow, testing, verification, debugging, and tools. Coding standards activate conditionally by file type. These aren't suggestions, they're enforced.

Core Workflow

task-and-workflow.md — Task management, /spec orchestration, deviation handling
testing.md — TDD workflow, test strategy, coverage requirements
verification.md — Execution verification, completion requirements

Development Practices

development-practices.md — Project policies, debugging methodology, git rules
context-continuation.md — Auto-compaction and context management protocol
pilot-memory.md — Persistent memory workflow, online learning triggers

Tools

research-tools.md — Context7, grep-mcp, web search, GitHub CLI
cli-tools.md — Pilot CLI, Vexor semantic search
playwright-cli.md — Browser automation for E2E UI testing

Collaboration

team-vault.md — Team Vault asset sharing via sx

Coding Standards (activated by file type)

Standard	Activates On	Coverage
Python	`*.py`	uv, pytest, ruff, basedpyright, type hints
TypeScript	`.ts`, `.tsx`, `.js`, `.jsx`	npm/pnpm, Jest, ESLint, Prettier, React patterns
Go	`*.go`	Modules, testing, formatting, error handling
Frontend	`.tsx`, `.jsx`, `.html`, `.vue`, `*.css`	Components, CSS, accessibility, responsive design
Backend	`/models/`, `/routes/`, `/api/`, etc.	API design, data models, query optimization, migrations

MCP Servers

MCP servers provide external context in every session — library docs, persistent memory, web search, GitHub code search, and web page fetching.

All servers

Server	Purpose
lib-docs	Library documentation lookup — get API docs for any dependency
mem-search	Persistent memory search — recall context from past sessions
web-search	Web search via DuckDuckGo, Bing, and Exa
grep-mcp	GitHub code search — find real-world usage patterns across repos
web-fetch	Web page fetching — read documentation, APIs, references

Language Servers (LSP)

Real-time diagnostics and go-to-definition for Python (basedpyright), TypeScript (vtsls), and Go (gopls). Auto-installed, auto-configured via .lsp.json, and auto-restart on crash.

Pilot Shell Console

A local web dashboard at localhost:41777 with 7 views: workspace dashboard, spec progress tracking, browsable memory, session history, token usage analytics, team vault, and per-component model settings. Real-time notifications via SSE when Claude needs your input or a spec phase completes.

All views

View	What it shows
Dashboard	Workspace status, active sessions, spec progress, git info, recent activity
Specifications	All spec plans with task progress, phase tracking, and iteration history
Memories	Browsable observations — decisions, discoveries, bugfixes — with type filters and search
Sessions	Active and past sessions with observation counts and duration
Usage	Daily token costs, model routing breakdown, and usage trends
Vault	Shared team assets with version tracking
Settings	Model selection per command/sub-agent, extended context toggle

Pilot Shell CLI

The pilot binary (~/.pilot/bin/pilot) manages sessions, worktrees, licensing, and context. Run pilot or ccp to start Claude with Pilot Shell enhancements. All commands support --json for structured output.

Session & Context

Command	Purpose
`pilot`	Start Claude with Pilot Shell enhancements, auto-update, and license check
`pilot run [args...]`	Same as above, with optional flags (e.g., `--skip-update-check`)
`pilot check-context --json`	Get current context usage percentage
`pilot register-plan <path> <status>`	Associate a plan file with the current session
`pilot sessions [--json]`	Show count of active Pilot Shell sessions

Worktree Isolation

Command	Purpose
`pilot worktree create --json <slug>`	Create isolated git worktree for safe experimentation
`pilot worktree detect --json <slug>`	Check if a worktree already exists
`pilot worktree diff --json <slug>`	List changed files in the worktree
`pilot worktree sync --json <slug>`	Squash merge worktree changes back to base branch
`pilot worktree cleanup --json <slug>`	Remove worktree and branch when done
`pilot worktree status --json`	Show active worktree info for current session

License & Auth

Command	Purpose
`pilot activate <key>`	Activate a license key on this machine
`pilot deactivate`	Deactivate license on this machine
`pilot status [--json]`	Show current license status
`pilot verify [--json]`	Verify license (used by hooks)
`pilot trial --check [--json]`	Check trial eligibility
`pilot trial --start [--json]`	Start a trial

What Users Say

"I stopped reviewing every line Claude writes. The hooks catch formatting and type errors automatically, TDD catches logic errors, and the spec verifier catches everything else. I review the plan, approve it, and the output is production-grade."

"Other frameworks I tried added so much overhead that half my tokens went to the system itself. Pilot Shell is lean — quick mode has zero scaffolding, and even /spec only adds structure where it matters. More of my context goes to actual work."

"The persistent memory changed everything. I can pick up a project after a week and Claude already knows my architecture decisions, the bugs we fixed, and why we chose certain patterns. No more re-explaining the same context every session."

License

Pilot Shell is source-available under a commercial license. See the LICENSE file for full terms.

Tier	Seats	Includes
Solo	1	All features, continuous updates, bug reports via GitHub Issues
Team	Multi	Solo + multiple seats, priority email support, feature requests

Details and licensing at pilot-shell.com.

FAQ

Does Pilot Shell send my code or data to external services?

No code, files, prompts, project data, or personal information ever leaves your machine through Pilot Shell. All development tools — vector search (Vexor), persistent memory (Pilot Shell Console), session state, and quality hooks — run entirely locally.

Pilot Shell makes external calls only for licensing. Here is the complete list:

When	Where	What is sent
License validation (once per 24h)	`api.polar.sh`	License key, organization ID
License activation (once)	`api.polar.sh`	License key, machine fingerprint
Trial start (once)	`pilot-shell.com`	Hashed hardware fingerprint

That's it — three calls total, each sent at most once (validation re-checks daily). No OS, no architecture, no Python version, no locale, no analytics, no heartbeats. The validation result is cached locally, and Pilot Shell works fully offline for up to 7 days between checks. Beyond these licensing calls, the only external communication is between Claude Code and Anthropic's API — using your own subscription or API key.

Is Pilot Shell enterprise-compliant for data privacy?

Yes. Your source code, project files, and development context never leave your machine through Pilot Shell. The only external calls are license validation (daily, license key only) and one-time activation/trial start (machine fingerprint only). No OS info, no version strings, no analytics, no telemetry. Enterprises using Claude Code with their own API key or Anthropic Enterprise subscription can add Pilot Shell without changing their data compliance posture.

What are the licenses of Pilot Shell's dependencies?

All external tools and dependencies that Pilot Shell installs and uses are open source with permissive licenses (MIT, Apache 2.0, BSD). This includes ruff, basedpyright, Prettier, ESLint, gofmt, uv, Vexor, playwright-cli, and all MCP servers. No copyleft or restrictive-licensed dependencies are introduced into your environment.

Do I need a separate Anthropic subscription?

Yes. Pilot Shell enhances Claude Code — it doesn't replace it. You need an active Claude subscription — Max 5x or 20x for solo developers, or Team Premium for teams and companies. Using the Anthropic API directly is also possible but may lead to much higher costs. Pilot Shell adds quality automation on top of whatever Claude Code access you already have.

Does Pilot Shell work with Codex, Gemini CLI, OpenCode, or other AI coding tools?

No. Pilot Shell is built exclusively for Claude Code. Every hook, rule, command, and workflow is engineered specifically for Claude's tool-use protocol, prompt format, and session lifecycle. Pilot Shell also only supports Claude Sonnet 4.6 and Claude Opus 4.6 — these are the models that produce the best results, and every rule and prompt is optimized for their behavior. Supporting other tools or models would mean compromising on quality, which is the opposite of what Pilot Shell is designed to do.

Does Pilot Shell work with existing projects?

Yes — that's the primary use case. Pilot Shell doesn't scaffold or restructure your code. You install it, run /sync, and it explores your codebase to discover your tech stack, conventions, and patterns. From there, every session has full context about your project. The more complex and established your codebase, the more value Pilot Shell adds — quality hooks catch regressions, persistent memory preserves decisions across sessions, and /spec plans features against your real architecture.

Does Pilot Shell work with any programming language?

Pilot Shell's quality hooks (auto-formatting, linting, type checking) currently support Python, TypeScript/JavaScript, and Go out of the box. TDD enforcement, spec-driven development, persistent memory, context preservation hooks, and all rules and standards work with any language that Claude Code supports. You can add custom hooks for additional languages.

Can I use Pilot Shell on multiple projects?

Yes. Pilot Shell installs once and works across all your projects. Each project can have its own .claude/ rules, custom skills, and MCP servers. Run /sync in each project to generate project-specific documentation and standards.

Can I add my own rules, commands, and skills?

Yes. Create your own in your project's .claude/ folder — rules, commands, and skills are all plain markdown files. Your project-level assets are loaded alongside Pilot Shell's built-in defaults and take precedence when they overlap. /sync auto-discovers your codebase patterns and generates project-specific rules for you. /learn extracts reusable knowledge from sessions into custom skills. Hooks can be extended for additional languages. Use /vault to share your custom assets across your team.

Changelog

See the full changelog at pilot.openchangelog.com.

Contributing

Pull Requests — New features, improvements, and bug fixes are welcome. You can improve Pilot Shell with Pilot Shell — a self-improving loop where your contributions make the tool that makes contributions better.

Bug Reports — Found a bug? Open an issue on GitHub.

Claude Code is powerful. Pilot Shell makes it reliable.

Name		Name	Last commit message	Last commit date
Latest commit History 1,034 Commits
.devcontainer		.devcontainer
.githooks		.githooks
.github		.github
.vscode		.vscode
console		console
docs		docs
installer		installer
launcher		launcher
pilot		pilot
.coderabbit.yaml		.coderabbit.yaml
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
.releaserc.json		.releaserc.json
.trivyignore		.trivyignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
cliff.toml		cliff.toml
install.sh		install.sh
pyproject.toml		pyproject.toml
uninstall.sh		uninstall.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Code is powerful. Pilot Shell makes it reliable.

Why I Built This

Getting Started

Prerequisites

Installation

How It Works

/spec — Spec-Driven Development

Quick Mode

Other Commands

Extensibility

Demo

Under the Hood

The Hooks Pipeline

SessionStart (on startup, clear, or compact)

PreToolUse (before search, web, or task tools)

PostToolUse (after every Write / Edit / MultiEdit)

PreCompact (before auto-compaction)

Stop (when Claude tries to finish)

SessionEnd (when the session closes)

Context Preservation

Smart Model Routing

Built-in Rules & Standards

MCP Servers

Language Servers (LSP)

Pilot Shell Console

Pilot Shell CLI

What Users Say

License

FAQ

Changelog

Contributing

About

Uh oh!

Releases 66

Contributors 8

Languages

License

maxritter/pilot-shell

Folders and files

Latest commit

History

Repository files navigation

Claude Code is powerful. Pilot Shell makes it reliable.

Why I Built This

Getting Started

Prerequisites

Installation

How It Works

/spec — Spec-Driven Development

Quick Mode

Other Commands

Extensibility

Demo

Under the Hood

The Hooks Pipeline

SessionStart (on startup, clear, or compact)

PreToolUse (before search, web, or task tools)

PostToolUse (after every Write / Edit / MultiEdit)

PreCompact (before auto-compaction)

Stop (when Claude tries to finish)

SessionEnd (when the session closes)

Context Preservation

Smart Model Routing

Built-in Rules & Standards

MCP Servers

Language Servers (LSP)

Pilot Shell Console

Pilot Shell CLI

What Users Say

License

FAQ

Changelog

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 66

Contributors 8

Languages