A tipple is the sorting-and-rail-switching station of a coal mine — this one switches rails for prompts across models.
A model/effort router for Claude Code — delegate a task you can do but that is large and cheap down to a cheaper tier to save tokens, and hand a task beyond your reach up to a stronger tier for quality.
Benchmark · Contributing · Changelog · Security · Privacy · Releases
Part of TheColliery — siblings: CoalMine (quality canaries) · CoalBoard (consensus & debate board).
Caution
Claude Code only. CoalTipple's routing only actuates where an agent can pick a spawned worker's model + effort. Today that is Claude Code. Antigravity does NOT work -- its subagents inherit the parent's model (no per-spawn model parameter, no separate effort knob), so routing cannot actuate there. Other platforms (Codex, Cursor, ...) are under monthly review.
A tipple is the sorting-and-rail-switching station of a coal mine. This tool switches rails for prompts across models (alongside CoalMine).
You are main. CoalTipple decides, per task, whether to:
| Direction | When | Why |
|---|---|---|
| delegate-DOWN | Task is mechanical and large | A cheaper tier does the bulk → saves tokens |
| escalate-UP | Task is beyond the current tier's competence | A stronger tier does it right → protects quality |
| stay (route OFF) | Task is small / no valid ranking | Bypasses routing to prevent overhead |
Routing logic lives inside SKILL.md — the model reads and routes natively. No background daemon.
- Claude Code (validated live across the 2.1.x line): Built Claude-Code-first and run end-to-end across all model tiers (Haiku, Sonnet, Opus). Routing degrades safe on any CC version — an unfamiliar model classifies strong, a failed spawn falls, and the platform resolves each alias to its current best model at spawn-time (the ranking is the alias floor + pins — nothing to enumerate).
- Routing actuates on Claude Code only: CT needs a platform where an agent can pick a spawned worker's model + effort. CC's Agent/Task tool takes a
modelparam -- that is the requirement. - Subagent-capable != qualifies: a platform can spawn workers yet give the agent no model choice (e.g. Antigravity, where the worker inherits the parent's model). There CT does not cleanly self-degrade -- a weak main hallucinates a delegate-down it cannot perform -- so CT is gated to CC. Other platforms (Cursor, Codex) are under monthly review.
claude plugin marketplace add TheColliery/CoalTipple
claude plugin install coaltipple@coaltipple
# Restart Claude Code to load the /coaltipple commands (stats | off | memory)Optional per-project config override: <project>/.claude/.coaltipple.json.
node scripts/verify.mjs # validates config, schemas, plugin files
node scripts/test.mjs # runs zero-dependency unit testsRouting adjusts two independent knobs (always raise effort before tier):
| Knob | Axis | Scale |
|---|---|---|
| TIER | correctness — which model | Coarse (low < mid < heavy < reasoning) |
| EFFORT | size — output volume / iteration | Fine-grained (low → max) |
- TIER tracks difficulty/sensitivity; EFFORT tracks output size. A short cryptographic function wants a high tier but low effort. A large mechanical template wants a cheap tier but high effort.
qualityBar (0–100, default 60) defines the acceptable quality threshold:
- The task's grade picks the starting tier (cheapest possible).
- The worker runs, and output is verified against the task contract.
- Passes → done. Fails → climb one rung. Out of attempts/fails hard → jump to top tier.
- Tune
qualityBarby risk: raise (~85) for critical logic; lower (~45) for quick drafts.
- No Down-Delegation for Sensitive Tasks: Cryptography, auth, payments, and security paths are forced to the
heavytier based on keywords. They never fall to cheap tiers, even under quota limits. - Overhead Floor: Tasks below
delegateMinLines(default 120) stay on main to avoid spawn overhead. - Prose Preservation: User-facing writing and translation stay on main to protect voice.
- Verify, Do Not Eyeball: Output merges require passing objective checks (
qaOnMerge: strict/standard/off). - Workers are Leaves: By policy a worker is given a bounded task contract and returns to main rather than spawning its own workers — routing stays depth-0 whether or not the platform allows nesting.
- Isolation: Uses git worktree-isolation (or local
.claude/.coaltipple/proposed/sandbox withstate.jsonjournaling) to protect files from mid-run failures. - Rate Limits: Automatically falls back to the next available tier on limit-hits, but never below a sensitive task's minimum tier.
- Side Effects: Commands with external side-effects (e.g. bash mutations, commits) are never delegated.
The Lock guarantees CoalTipple is only ever in one of two states: routing correctly or routing off.
- Always Buildable: The ranking is the alias floor
haiku < sonnet < opus(→low/mid/heavy, reasoning =opus) overlaid with yourmodelTierspins — a constant, no enumeration. Unknown models default toheavy. - Validity-Gated: Checks ranking schema, hash, and completeness before writing.
- Fails Safe: Bypasses routing if the model ranking is broken.
- Spawn-Time Resolution: The platform resolves each alias to its current best model at spawn-time, and a failed spawn falls to the next available tier — so the floor never goes stale and there is no refresh cadence.
Workers start context-fresh. A memory anchor file gives a fresh worker project context.
- If
contextFilesis empty, CoalTipple auto-loadsCLAUDE.md/AGENTS.md. - Offers once to set up an anchor on new projects. Manage manually via
/coaltipple memory.
Ships zero-config with optimal defaults in .coaltipple.json. Precedence: project override → global config → schema default.
Key settings (see scripts/lib/config-schema.mjs for the full SSoT schema):
| Key | Type | Default | What it does |
|---|---|---|---|
enableRouting |
Boolean | true |
Master routing switch |
mode |
Enum | auto |
Direction: delegation (down) | escalation (up) | auto | off |
qualityBar |
Integer 0–100 | 60 |
Quality threshold for the staircase |
maxTotalAttempts |
Integer 1–5 | 2 |
Staircase attempt budget before jumping to top tier |
delegateMinLines |
Integer | 120 |
Minimum lines threshold for down-delegation |
qaOnMerge |
Enum | standard |
Merge verification rigor (strict | standard | off) |
modelTiers |
Object | unset | Optional pins overlaying the alias floor ({ tier: "model" }) — the one human override for a model the agent cannot see |
node scripts/configure.mjs --list # show merged config
node scripts/configure.mjs --qualityBar 85 # update global qualityBar
node scripts/configure.mjs --project --mode delegation # write project override
node scripts/configure.mjs --help # view all schema-driven flagsWe evaluate the final output correctness after the main escalates one rung, and the token savings of delegating mechanical bulk down — each dated, on small honest samples, never inlined here so a copied number cannot drift.
- Output quality (per-tier matrix): on objectively-verifiable tasks every tier (Haiku → Sonnet → Opus) delivers correct → delegate-down preserves quality (the cost saving is free); on high-precision sensitive work the mid tier reproducibly errs (Sonnet collapses the legal "to the extent" carve-out 2/3 of the time, Opus holds) → escalate-up / never-delegate-sensitive-down is data-justified (measured on v1.0.20, 2026-06-22; small samples + the honest method caveats — incl. per-cell judge-variance — are in the record).
- Routing savings: delegating a big mechanical task down from Opus to a cheaper tier ran ~70–75% cheaper — holding only above the
delegateMinLinesfloor and never on sensitive work (measured 2026-06-19, version-sensitive rates).
Full harnesses, per-task scoring, and the dated figures live in the series umbrella: TheColliery/.github/benchmarks/CoalTipple (output-quality RESULTS · routing savings).
CoalTipple shares its engineering doctrine with CoalMine and CoalBoard: Phoenix-13 hooks (zero-dependency, no network, fail-silent, no child processes, deterministic), single-source-of-truth config schemas, and a strict no-overkill discipline. Install one and it stands alone; install all and they compose without conflict.
MIT License. See LICENSE for details.