Skip to content

Commit c73e266

Browse files
committed
fix: vexor test mocking, site responsiveness, and README consolidation
- Add missing _is_vexor_local_functional mock to installer tests (CI fix) - Add tests for _get_uv_tool_vexor_bin and _is_vexor_local_functional - Add mock audit rule to testing.md to prevent unmocked dependency issues - Move Smart Model Routing from Usage to Under the Hood (site + README) - Merge Why I Built This and Why This Approach Works into single section - Fix install command box wrapping on mobile - Fix NavBar tablet responsiveness (desktop nav at lg: breakpoint)
1 parent 5a6c320 commit c73e266

7 files changed

Lines changed: 215 additions & 74 deletions

File tree

README.md

Lines changed: 36 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,13 @@ curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.
2828

2929
## Why I Built This
3030

31-
I'm a senior IT freelancer from Germany. My clients hire me to ship production-quality code — tested, typed, formatted, and reviewed. When something goes into production under my name, quality isn't optional.
31+
I'm Max, a senior IT freelancer from Germany. My clients hire me to ship production-quality code — tested, typed, formatted, and reviewed. When something goes into production under my name, quality isn't optional.
3232

33-
Claude Code writes code fast. But without structure, it skips tests, loses context, and produces inconsistent results — especially on complex, established codebases where there are real conventions to follow and real regressions to catch. I tried other frameworks — they burned tokens on bloated prompts without adding real value. Some added process without enforcement. Others were prompt templates that Claude ignored when context got tight. None made Claude reliably produce production-grade code.
33+
Claude Code writes code fast. But without structure, it skips tests, loses context, and produces inconsistent results — especially on complex, established codebases where there are real conventions to follow and real regressions to catch. I tried other frameworks. Most of them add complexity — dozens of agents, elaborate scaffolding, thousands of lines of instruction files — but the output doesn't get better. You just burn more tokens, wait longer, and deal with more things breaking.
3434

35-
So I built Pilot Shell. Instead of adding process on top, it bakes quality into every interaction. Linting, formatting, and type checking run as enforced hooks on every edit. TDD is mandatory, not suggested. Context is monitored and preserved across sessions. Every piece of work goes through verification before it's marked done.
35+
So I built Pilot Shell. Instead of adding process on top, it bakes quality into every interaction. Linting, formatting, and type checking run as enforced hooks on every edit. TDD is mandatory, not suggested. Context is preserved across sessions. Every rule exists because I hit a real problem: a bug that slipped through, a regression that shouldn't have happened, a session where Claude cut corners and nobody caught it.
36+
37+
This isn't a vibe coding tool, it's true agentic engineering, made simple. You install it in any existing project, run `pilot`, then `/sync` to learn your codebase. The guardrails are just there. The end result is that you can walk away — start a `/spec` task, approve the plan, go grab a coffee. When you come back, the work is tested, verified, formatted, and ready to ship.
3638

3739
---
3840

@@ -61,7 +63,7 @@ Each `/spec` prompt one-shotted a complete feature — plan, TDD implementation,
6163
| Writes code, skips tests | TDD enforced — RED, GREEN, REFACTOR on every feature |
6264
| No quality checks | Hooks auto-lint, format, type-check on every file edit |
6365
| Context degrades mid-task | Hooks preserve and restore state across compaction cycles |
64-
| Every session starts fresh | Persistent memory across sessions via Pilot Shell Console |
66+
| Every session starts fresh | Persistent memory across sessions via Pilot Shell Console |
6567
| Hope it works | Verifier sub-agents perform code review before marking complete |
6668
| No codebase knowledge | Production-tested rules loaded into every session |
6769
| Generic suggestions | Coding standards activated conditionally by file type |
@@ -71,20 +73,6 @@ Each `/spec` prompt one-shotted a complete feature — plan, TDD implementation,
7173

7274
---
7375

74-
## Why This Approach Works
75-
76-
There are other AI coding frameworks out there. I tried them. They add complexity — dozens of agents, elaborate scaffolding, thousands of lines of instruction files — but the output doesn't improve proportionally. More machinery burns more tokens, increases latency, and creates more failure modes. Complexity is not a feature.
77-
78-
**Pilot Shell optimizes for output quality, not system complexity.** The rules are minimal and focused. There's no big learning curve, no project scaffolding to set up, no state files to manage. You install it in any existing project — no matter how complex — run `pilot`, then `/sync` to learn your codebase, and the quality guardrails are just there — hooks, TDD, type checking, formatting — enforced automatically on every edit, in every session.
79-
80-
This isn't a vibe coding tool. It's built for developers who ship to production and need code that actually works. Every rule in the system comes from daily professional use: real bugs caught, real regressions prevented, real sessions where the AI cut corners and the hooks stopped it. The rules are continuously refined based on what measurably improves output.
81-
82-
**The result: you can actually walk away.** Start a `/spec` task, approve the plan, then go grab a coffee. When you come back, the work is done — tested, verified, formatted, and ready to ship. Hooks preserve state across compaction cycles, persistent memory carries context between sessions, quality hooks catch every mistake along the way, and verifier agents review the code before marking it complete. No babysitting required.
83-
84-
The system stays fast because it stays simple. Quick mode is direct execution with zero overhead — no sub-agents, no plan files, no directory scaffolding. You describe the task and it gets done. `/spec` adds structure only when you need it: plan verification, TDD enforcement, independent code review, automated quality checks. Both modes share the same quality hooks. Both modes benefit from persistent memory and hooks that preserve state across compaction.
85-
86-
---
87-
8876
## Getting Started
8977

9078
### Prerequisites
@@ -231,21 +219,6 @@ Discuss → Plan → Approve → Implement → Verify → Done
231219

232220
</details>
233221

234-
### Smart Model Routing
235-
236-
Pilot Shell uses the right model for each phase — Opus where reasoning quality matters most, Sonnet where speed and cost matter:
237-
238-
| Phase | Default | Why |
239-
| --------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
240-
| **Planning** | Opus | Exploring your codebase, designing architecture, and writing the spec requires deep reasoning. A good plan is the foundation of everything. |
241-
| **Plan Verification** | Opus | Catching gaps, missing edge cases, and requirement mismatches before implementation saves expensive rework. |
242-
| **Implementation** | Sonnet | With a solid plan, writing code is straightforward. Sonnet is fast, cost-effective, and produces high-quality code when guided by a clear spec. |
243-
| **Code Verification** | Opus | Independent code review against the plan requires the same reasoning depth as planning — catching subtle bugs, logic errors, and spec deviations. |
244-
245-
**The insight:** Implementation is the easy part when the plan is good and verification is thorough. Pilot Shell invests reasoning power where it has the highest impact — planning and verification — and uses fast execution where a clear spec makes quality predictable.
246-
247-
**Configurable:** All model assignments are configurable per-component via the Pilot Shell Console settings. Choose between Sonnet 4.6 and Opus 4.6 for the main session, each command, and sub-agents. A global "Extended Context (1M)" toggle enables the 1M token context window across all models simultaneously. **Note:** 1M context models require a Max (20x) or Enterprise subscription — not available to all users.
248-
249222
### Quick Mode
250223

251224
Just chat. No plan file, no approval gate. All quality hooks and TDD enforcement still apply. Best for small tasks, exploration, and quick questions.
@@ -285,12 +258,12 @@ The `pilot` binary (`~/.pilot/bin/pilot`) manages sessions, worktrees, licensing
285258
<details>
286259
<summary><b>Session & Context</b></summary>
287260

288-
| Command | Purpose |
289-
| ------------------------------------- | -------------------------------------------------------------------- |
261+
| Command | Purpose |
262+
| ------------------------------------- | -------------------------------------------------------------------------- |
290263
| `pilot` | Start Claude with Pilot Shell enhancements, auto-update, and license check |
291-
| `pilot run [args...]` | Same as above, with optional flags (e.g., `--skip-update-check`) |
292-
| `pilot check-context --json` | Get current context usage percentage |
293-
| `pilot register-plan <path> <status>` | Associate a plan file with the current session |
264+
| `pilot run [args...]` | Same as above, with optional flags (e.g., `--skip-update-check`) |
265+
| `pilot check-context --json` | Get current context usage percentage |
266+
| `pilot register-plan <path> <status>` | Associate a plan file with the current session |
294267
| `pilot sessions [--json]` | Show count of active Pilot Shell sessions |
295268

296269
</details>
@@ -353,7 +326,7 @@ Add your own MCP servers in `.mcp.json`. Run `/sync` after adding servers to gen
353326

354327
| Hook | Type | What it does |
355328
| ------------------------- | -------- | ---------------------------------------------------------------------- |
356-
| Memory loader | Blocking | Loads persistent context from Pilot Shell Console memory |
329+
| Memory loader | Blocking | Loads persistent context from Pilot Shell Console memory |
357330
| `post_compact_restore.py` | Blocking | After auto-compaction: re-injects active plan, task state, and context |
358331
| Session tracker | Async | Initializes user message tracking for the session |
359332

@@ -376,8 +349,8 @@ After **every single file edit**, these hooks fire:
376349

377350
#### PreCompact (before auto-compaction)
378351

379-
| Hook | Type | What it does |
380-
| ---------------- | -------- | -------------------------------------------------------------------------------------------------------- |
352+
| Hook | Type | What it does |
353+
| ---------------- | -------- | -------------------------------------------------------------------------------------------------------------- |
381354
| `pre_compact.py` | Blocking | Captures Pilot Shell state (active plan, task list, key context) to persistent memory before compaction fires. |
382355

383356
#### Stop (when Claude tries to finish)
@@ -389,8 +362,8 @@ After **every single file edit**, these hooks fire:
389362

390363
#### SessionEnd (when the session closes)
391364

392-
| Hook | Type | What it does |
393-
| ---------------- | -------- | -------------------------------------------------------------------------------------------------------- |
365+
| Hook | Type | What it does |
366+
| ---------------- | -------- | -------------------------------------------------------------------------------------------------------------- |
394367
| `session_end.py` | Blocking | Stops the worker daemon when no other Pilot Shell sessions are active. Sends real-time dashboard notification. |
395368

396369
### Context Preservation
@@ -404,6 +377,21 @@ Pilot Shell preserves context automatically across compaction boundaries:
404377

405378
**Effective context display:** Claude Code reserves ~16.5% of the context window as a compaction buffer, triggering auto-compaction at ~83.5% raw usage. Pilot Shell rescales this to an **effective 0–100% range** so the status bar fills naturally to 100% right before compaction fires. A `` buffer indicator at the end of the bar shows the reserved zone. The context monitor warns at ~80% effective (informational) and ~90%+ effective (caution) — no confusing raw percentages.
406379

380+
### Smart Model Routing
381+
382+
Pilot Shell uses the right model for each phase — Opus where reasoning quality matters most, Sonnet where speed and cost matter:
383+
384+
| Phase | Default | Why |
385+
| --------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
386+
| **Planning** | Opus | Exploring your codebase, designing architecture, and writing the spec requires deep reasoning. A good plan is the foundation of everything. |
387+
| **Plan Verification** | Opus | Catching gaps, missing edge cases, and requirement mismatches before implementation saves expensive rework. |
388+
| **Implementation** | Sonnet | With a solid plan, writing code is straightforward. Sonnet is fast, cost-effective, and produces high-quality code when guided by a clear spec. |
389+
| **Code Verification** | Opus | Independent code review against the plan requires the same reasoning depth as planning — catching subtle bugs, logic errors, and spec deviations. |
390+
391+
**The insight:** Implementation is the easy part when the plan is good and verification is thorough. Pilot Shell invests reasoning power where it has the highest impact — planning and verification — and uses fast execution where a clear spec makes quality predictable.
392+
393+
**Configurable:** All model assignments are configurable per-component via the Pilot Shell Console settings. Choose between Sonnet 4.6 and Opus 4.6 for the main session, each command, and sub-agents. A global "Extended Context (1M)" toggle enables the 1M token context window across all models simultaneously. **Note:** 1M context models require a Max (20x) or Enterprise subscription — not available to all users.
394+
407395
### Built-in Rules & Standards
408396

409397
Production-tested best practices loaded into **every session**. These aren't suggestions — they're enforced standards. Coding standards activate conditionally by file type.
@@ -533,11 +521,11 @@ Details and licensing at [pilot-shell.com](https://pilot-shell.com).
533521

534522
Pilot Shell makes external calls **only for licensing**. Here is the complete list:
535523

536-
| When | Where | What is sent |
537-
| --------------------------------- | ------------------ | ---------------------------------- |
538-
| License validation (once per 24h) | `api.polar.sh` | License key, organization ID |
539-
| License activation (once) | `api.polar.sh` | License key, machine fingerprint |
540-
| Trial start (once) | `pilot-shell.com` | Hashed hardware fingerprint |
524+
| When | Where | What is sent |
525+
| --------------------------------- | ----------------- | -------------------------------- |
526+
| License validation (once per 24h) | `api.polar.sh` | License key, organization ID |
527+
| License activation (once) | `api.polar.sh` | License key, machine fingerprint |
528+
| Trial start (once) | `pilot-shell.com` | Hashed hardware fingerprint |
541529

542530
That's it — three calls total, each sent at most once (validation re-checks daily). No OS, no architecture, no Python version, no locale, no analytics, no heartbeats. The validation result is cached locally, and Pilot Shell works fully offline for up to 7 days between checks. Beyond these licensing calls, the only external communication is between Claude Code and Anthropic's API — using your own subscription or API key.
543531

docs/site/src/components/DeepDiveSection.tsx

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ import {
1313
Layers,
1414
Cpu,
1515
RefreshCw,
16+
Route,
1617
} from "lucide-react";
1718
import { useInView } from "@/hooks/use-in-view";
1819

@@ -174,6 +175,7 @@ const mcpServers = [
174175
const DeepDiveSection = () => {
175176
const [headerRef, headerInView] = useInView<HTMLDivElement>();
176177
const [hooksRef, hooksInView] = useInView<HTMLDivElement>();
178+
const [routingRef, routingInView] = useInView<HTMLDivElement>();
177179
const [rulesRef, rulesInView] = useInView<HTMLDivElement>();
178180
const [mcpRef, mcpInView] = useInView<HTMLDivElement>();
179181

@@ -276,6 +278,68 @@ const DeepDiveSection = () => {
276278
</div>
277279
</div>
278280

281+
{/* Smart Model Routing */}
282+
<div
283+
ref={routingRef}
284+
className={`mb-16 ${routingInView ? "animate-fade-in-up" : "opacity-0"}`}
285+
>
286+
<div className="flex items-center gap-3 mb-8">
287+
<div className="w-10 h-10 bg-violet-400/10 rounded-xl flex items-center justify-center">
288+
<Route className="h-5 w-5 text-violet-400" />
289+
</div>
290+
<div>
291+
<h3 className="text-2xl font-bold text-foreground">
292+
Smart Model Routing
293+
</h3>
294+
<p className="text-sm text-muted-foreground">
295+
The right model for each phase — reasoning power where it
296+
matters most
297+
</p>
298+
</div>
299+
</div>
300+
301+
<div className="grid md:grid-cols-2 gap-4 mb-4">
302+
<div className="rounded-2xl p-5 border border-violet-400/30 bg-violet-400/5 backdrop-blur-sm">
303+
<div className="flex items-center gap-2 mb-3">
304+
<span className="text-sm font-mono font-semibold text-violet-400 bg-violet-400/10 px-3 py-1 rounded-lg">
305+
OPUS
306+
</span>
307+
<span className="text-sm text-muted-foreground">
308+
Planning & Verification
309+
</span>
310+
</div>
311+
<p className="text-xs text-muted-foreground leading-relaxed">
312+
Exploring your codebase, designing architecture, catching gaps,
313+
and reviewing code against the plan. Deep reasoning prevents
314+
expensive rework.
315+
</p>
316+
</div>
317+
<div className="rounded-2xl p-5 border border-primary/30 bg-primary/5 backdrop-blur-sm">
318+
<div className="flex items-center gap-2 mb-3">
319+
<span className="text-sm font-mono font-semibold text-primary bg-primary/10 px-3 py-1 rounded-lg">
320+
SONNET
321+
</span>
322+
<span className="text-sm text-muted-foreground">
323+
Implementation
324+
</span>
325+
</div>
326+
<p className="text-xs text-muted-foreground leading-relaxed">
327+
With a solid plan, writing code is straightforward. Fast,
328+
cost-effective, and produces high-quality code when guided by a
329+
clear spec.
330+
</p>
331+
</div>
332+
</div>
333+
334+
<div className="rounded-2xl p-4 border border-border/30 bg-card/20 backdrop-blur-sm">
335+
<p className="text-xs text-muted-foreground text-center">
336+
Implementation is the easy part when the plan is good and
337+
verification is thorough. All model assignments are configurable
338+
per-component via the Pilot Shell Console settings.
339+
</p>
340+
</div>
341+
</div>
342+
279343
{/* Rules System */}
280344
<div
281345
ref={rulesRef}

0 commit comments

Comments
 (0)