How Cortex grew from a routing table to a cognitive layer — with all the wrong turns, crashes, and course corrections along the way.
This isn't a polished product history. It's an honest changelog of building infrastructure while using it daily. Every decision has context. Every pivot had a reason.
When: Early February 2026
Claude Code ships with a plugin ecosystem. I installed liberally: 95+ plugins active, 14 MCP servers, dozens of hooks. Docker helpers, Kubernetes managers, Terraform — none relevant to a Greek lawyer doing legal work.
One plugin alone (claude-octopus, a 30-skill persona system) injected ~4,000 tokens per session through CLAUDE.md. It required Codex and Gemini CLI binaries that weren't even installed on my machine.
The question that started everything: I have all these tools. How does Claude know which one to use?
It didn't. Claude guessed based on description competition. Sometimes it picked the right MCP server. Often it didn't. And every session started from zero — no memory of what worked yesterday.
Lesson: Existence ≠ function. Having 95 plugins installed is not the same as having a system.
When: Pre-February 2026 (on Claude Desktop)
The first Cortex wasn't built for Claude Code at all. It was a meta-orchestrator for Cowork (Claude Desktop's collaborative surface):
- Router: 10-step intent scoring pipeline. Trigger matching, anti-trigger veto, confidence tiers (autoload ≥0.8, suggest 0.6–0.79, silent <0.6), cooldown enforcement, shadow registry for uninstalled plugins.
- Memory: Real MCP server (Node.js, sql.js SQLite, 11 tools).
- Analyst: Progressive-depth analysis engine.
The registry existed in three forms: registry.md (human-editable), registry.json (compiled, SHA-256 hashed), digest.md (~30 lines for cheap context injection). Scoring: trigger×0.4 + domain×0.3 + entity×0.2 + context×0.1.
All dependencies (@modelcontextprotocol/sdk, sql.js, zod) were platform-agnostic. The core logic was portable. What wasn't: the plugin manifest format, skill activation mechanism, hook system, and command syntax — all Cowork-specific.
When: February 13–18, 2026
Feb 13: Searched for any existing routing infrastructure in Claude Code. Nothing. Blank slate — no registries, no routing, no MCP servers configured.
Feb 17: Reverse-engineered the entire Cowork Cortex codebase (26 files). Documented the architecture for porting. Designed a 4-Layer Memory Stack, placing Cortex as Layer 4: the intelligent router that sends queries to the correct memory layer.
Feb 18: First install. Cortex v1.0.0 dropped into ~/.claude/plugins/cache/local/cortex/1.0.0/. Three skills, seven commands, one MCP server.
First gotcha: The plugin cache existed on disk, but Cortex was NOT registered in installed_plugins.json. Cache ≠ active. Classic existence-check failure.
Same day: MCP server load test passed. The SQLite-backed memory server worked fine over stdio. But the routing layer — the whole reason Cortex existed — didn't activate.
When: February 20, 2026
This is the most significant architectural decision in Cortex's evolution.
Three options evaluated:
- Pure MCP — Clean, but requires explicit tool calls. No automatic routing.
- Pure hooks — Automatic, but hooks can't see conversation context. Limited error handling.
- Hybrid —
UserPromptSubmithook for automatic lightweight routing + MCP server for interactive queries.
The hybrid won on paper. But during implementation planning, three Cowork features were permanently scrapped:
- MCP bridge scanner — Too complex for the value delivered.
- LLM calls in the routing hot path — Latency killer. Unacceptable in a hook that fires on every user message.
- Context budget management — Unnecessary with the passive approach.
What was built instead: 4 passive query tools. No active routing. No interception.
cortex_digest → pre-computed 596-token capability summary
cortex_search → keyword matching against trigger arrays (~5ms)
cortex_lookup → full entry details by ID
cortex_status → registry health check
The reasoning: "Defer active routing until data proves Claude's native matching insufficient." The bet was that Claude Code's built-in description competition was good enough, and active routing would only be added if metrics showed failure.
That data never materialized. Passive routing worked. The active routing phase was never built.
What was abandoned from the original design:
| Feature | Why Dropped |
|---|---|
| Active routing via UserPromptSubmit hook | Claude's native routing proved sufficient |
| LLM calls in routing hot path | Latency — scrapped permanently |
| MCP bridge scanner | Too complex for the value |
| Context budget management | Unnecessary with passive approach |
cortex_record / cortex_outcome / cortex_learn (DB write-back) |
Deferred, never needed |
| Cooldown system / polite interruption protocol | Deferred |
Shadow recommend_install flow |
Not applicable in Claude Code |
When: February 20, 2026
The registry had never been compiled. The database was empty. Despite MEMORY.md claiming 163 capabilities, the actual count was 39.
What was wrong:
compile.jsexisted but had never been runregistry.jsonwas empty/stale- MEMORY.md had a fictional count (163)
- Legal entries were Cowork-era corporate templates (NDA, GDPR) — wrong for Greek civil law
Fix: Registry expanded from 39 → 53 entries (43 installed + 10 shadow). Legal entries flagged for replacement with Greek law entries.
Same day, worse discovery: Two Cortex MCP servers were running simultaneously:
- The old plugin-managed server at
~/.claude/plugins/cache/local/cortex/1.0.0/ - The new standalone server at
~/.claude/cortex/server/index.js
The cortex.memory.db had: 1 session, 43 capability metrics, 23 preferences, 4 anti-patterns, 0 routing decisions. The router logic existed only as pseudocode specs.
Lesson: Audit before building. The v0.6 → v1.0 roadmap was created: fix what's broken (v0.7), close the loops (v0.8), add domain intelligence (v0.9), achieve closed-loop learning (v1.0).
When: February 20–24, 2026
The learning loop had a PostToolUse observer that logged tool calls to observations.jsonl. It accumulated thousands of observations. But the analysis pipeline was open — data flowed in and collected. Nothing evolved.
Feb 24 audit found the pipeline was 50% broken:
- Observer config set to
enabled: false - thousands of observations collected, never analyzed
- All instinct directories empty
- 41.7% duplicate entries — the observe hook was registered on BOTH PreToolUse and PostToolUse. PreToolUse events have no output. Pure waste.
Root cause: settings.json had observe.sh on both hook events.
Fix: PreToolUse hook removed. 270 tool_start ghost entries cleaned. Each tool now fires exactly one observation.
Feb 25: Second bug in observe.sh. Line 98 wrote None for the input field on PostToolUse even though the data was in the hook JSON. Fix: input captured properly, enabling analyze.py's file path extraction.
When: February 26, 2026 (B1/C1 phase of the evolution master plan)
Deep audit of analyze.py (495 lines, 14 functions) revealed why patterns never promoted:
- Error confidence capped at 0.8 (line 214:
min(0.3 + retries * 0.03, 0.8)). Promotion gate was 0.9. Mathematical impossibility. - File pattern confidence flat at 0.4. Also never promotes.
- Clustering: designed but zero code after
build_instincts()at line 246. auto_evolve: false— the curator was explicitly disabled.- 4 CLI commands documented in SKILL.md, 0 built.
The C1 fix:
- Config loading wired up properly
- Confidence caps raised (error: 0.8 → 0.95, file: 0.4 → dynamic)
- Jaccard + union-find clustering implemented
promote_to_rules/decay_stale_instinctspipeline builtauto_evolve: trueactivated- 16/16 tests passing
First auto-promotion: Tool chain patterns (Bash→Bash→Bash seen 120x, Read→Read→Read seen 38x) promoted to ~/.claude/rules/generated-rules.md — automatically loaded every session.
When: February–March 2026
The memory plugin (claude-mem) shipped with ChromaDB for vector search. For weeks, everyone assumed it worked.
The discovery: ChromaDB stored thousands of embeddings at 384 dimensions. The MCP search interface didn't use them. Every search went through FTS5 keyword matching only. The vector layer was structurally present and functionally dead.
The decision (D4.1 in the evolution coordination doc):
Three options:
- Fix ChromaDB integration
- Replace with sqlite-vec
- Migrate to external service
sqlite-vec won: same SQLite file (no separate process), native FTS5 hybrid, 1024-dim BGE-M3 embeddings for proper multilingual support (the 384-dim ChromaDB vectors were too small for Greek text anyway — dimension upgrade would require re-embedding everything regardless).
ChromaDB: thousands of embeddings deleted. Archived at ~/.claude-mem/archive/.
sqlite-vec: hundreds of vectors at 1024-dim, over a thousand semantic links via KNN (cosine > 0.75), temporal decay per collection (30d for memory, 90d for cases, 365d for templates, null for evergreen).
When: February–March 2026
Before building a custom RAG pipeline, 20+ existing MCP servers were evaluated: GNO, mcp-local-rag, gnosis-mcp, RAGLite, kb-mcp-server, knowledge-mcp (LightRAG), and others.
None satisfied all constraints: DOCX support + Greek language + hybrid search + local-only + MCP protocol.
Fatal gaps:
- gnosis-mcp: no DOCX (eliminates hundreds of legal templates)
- GNO: requires Bun runtime
- RAGLite: requires Pandoc
- All cloud options: disqualified (legal documents stay local, non-negotiable)
The embedding model journey:
nomic-embed-text-v2-moe (768-dim) → demoted for weak Greek → BGE-M3 (1024-dim, 8K context, best multilingual) became the winner. The Greek-specific stsb-xlm-r-greek-transfer was considered but killed by 512-token context limit (too short for legal documents).
Zone-based chunking insight (late addition): 60% of legal template tokens are waste — placeholder dots, identical headers, near-duplicate forms. hundreds of power-of-attorney forms are 70–85% identical. Solution: Strip+Tag+Embed — segment into HEADER/PARTIES/FACTS/LAW/REQUEST/BOILERPLATE zones, embed only semantic zones with metadata prefix. Raw corpus millions of tokens → cleaned ~5M tokens.
The Ollama crash (still unresolved): Everything depended on Ollama for local embeddings. Ollama 0.17.x introduced MLX Metal initialization that crashes unconditionally on M1 Air — SIGABRT before serving. CPU fallback also fails (MLX init is unconditional). Current: 49% vectorized (roughly half the chunks). The remaining 51% blocked on a broken binary.
When: March 2, 2026, 2:29 AM
macOS OOM killed 4 concurrent Claude Code sessions. 4 × ~1.5GB = ~6GB on an 8GB M1 Air.
What the crash revealed about the system:
The healthcheck reported 51/51 PASS while three critical systems were non-functional:
- claude-mem MCP search: returning empty —
chromaStrategy=null, no FTS5 fallback - Ollama: SIGABRT on launch (MLX crash)
- PreCompact hook: reading
conversationkey that doesn't exist (actual:transcript_path). Had never worked. Passed healthcheck.
"You have existence checks, not functional tests." — SRE expert, post-crash analysis
Data casualty: A Google Drive restore was running mid-crash. DCIM photos recovered (4,314 files confirmed). Munich trip photos: 0 files, unrecoverable unless restored from Google's trash before ~April 1 purge.
The non-destructive rules codified from this crash:
- Never delete without verified recovery path
- Verify identity by CONTENT (hash), not just name/path
- Context matters — folder structure carries meaning
- Override requires explicit per-action user confirmation
- A plan is a hypothesis, not authority — re-verify at execution time
The context rot discovery (same session cluster): Softmax attention dilution at high token counts = structural amnesia. The fix: re-inject critical rules near the recency position every N turns (300 tokens, forced eval snippet, exploiting recency bias against RoPE geometric decay). Cost: zero engineering.
When: February 26, 2026
After 82+ sessions of organic growth, the system had accumulated 5 memory layers, 12 MCP servers, 6 hooks, 36 plugins, RAG at 49%, broken plugin hooks, and auto_evolve: false.
The evolution master plan (starry-chasing-flask.md): ~38 hours across 12–15 sessions.
Key planning pivot: The original approach was parallel plan-then-implement cycles. Changed to strict sequential: all 5 B-phase planning sessions complete before any C-phase implementation begins. Reason: early implementation was making decisions that conflicted with later planning insights.
What each phase added:
| Phase | Focus | Key Deliverables |
|---|---|---|
| B1/C1 | Learning Loop | Fixed confidence caps, added clustering, promotion pipeline, temporal decay, auto_evolve: true |
| B2/C2 | Session Lifecycle | Stop hook, enhanced PreCompact, /refresh-streams skill, rotation automation |
| B3/C3 | Personas | Evaluated 29 octopus personas (kept all — zero passive cost), created Greek legal persona, /persona command |
| B4/C4 | Memory Architecture | ChromaDB → sqlite-vec, observation_links table, temporal decay per collection, semantic KNN linking |
| B5/C5 | Cross-Layer Search | Unified Search MCP: Promise.allSettled + 2s timeout/layer, RRF k=60, Jaccard MMR λ=0.7, 5-min cache |
| C5.1 | Audit & Fixes | Injection detection (5 regex patterns), XML memory wrapping, memory result trust boundaries |
All phases complete as of March 2026.
When: February 25, 2026
OpenClaw: 100K+ stars, 675K LOC TypeScript, 23 lifecycle hooks, 52 skills. Analyzed for stealable patterns.
Adopted immediately:
- Injection detection: 5 regexes blocking "ignore instructions", XML tags, tool invocation patterns in memory retrieval paths
- Memory XML wrapping:
<relevant-memories>tag + "untrusted historical data" label + HTML entity escaping - Temporal decay formula:
score *= exp(-ln(2)/30 * ageInDays)— adopted verbatim (evergreen files exempt) - detect-secrets baseline for the
~/.claudegit repo
Deferred:
- Progressive disclosure (SKILL.md < 500 lines +
references/on-demand) - Auto-capture triggers (preference/decision detection from conversation text)
- Embedding cache (SQLite LRU)
Gap comparison at time of analysis vs. now:
| Feature | Feb 25 State | Current State |
|---|---|---|
| Hooks | 6 | 8 |
| Temporal decay | None | Implemented (C4) |
| Injection detection | None | 5 patterns (C5.1) |
| MMR diversity | None | Jaccard λ=0.7 (C5) |
| XML memory wrapping | None | Implemented (C5.1) |
| Progressive disclosure | Full skill load | Still full load |
| Auto-capture | Manual | Partially automated (PreCompact) |
When: February 25, 2026 (Chapter 1), March 4, 2026 (Chapters 2–3)
The brain metaphor wasn't part of the original design. It was discovered after the system was already built.
Reading Buschman et al. (2025) and Artem Kirsanov's explainer ("Why the Brain Doesn't Start From Scratch"), I realized: the system I'd built for practical reasons — routing, suppression of irrelevant context, pattern promotion, session lifecycle — mapped almost perfectly to known brain architecture.
The recognition was convergent, not deliberate. Same constraints (limited working memory, need for context routing, value of not starting from scratch) → same solutions.
Chapter 1 (Composition) was already built: Cortex routing = thalamic relay, domain rules = gain control suppression, agent delegation = dynamic routing ("railroad switch" in the Buschman paper).
Chapter 2 (Consolidation) was the identified gap. An archive quality audit proved it empirically:
- Feb 24–26 done blocks (with consolidation): narrative + decisions + reasoning, 8–9/10
- Mar 1–3 done blocks (crisis mode, no ritual): pure checklists, 4–5/10
The /bye skill was designed to fill this gap: hippocampal replay → session summary, episodic→semantic transfer → claude-mem persistence, pruning → session rotation.
Chapter 3 (Activation) existed already via Synergatis + SessionStart hooks, but wasn't framed as such until the three-chapter model crystallized.
Looking across 100+ sessions of building this system, five patterns recur:
1. Existence checks masquerading as functional tests. ChromaDB "working" but unused. Healthcheck passing while 3 systems broken. Observer "running" but never evolving. The system consistently accumulated the appearance of functionality before the reality of it.
2. Organic growth followed by consolidation crisis. 82+ sessions of additive work created extraordinary breadth and significant debt. The evolution master plan was the consolidation response. The system had to be planned after being built.
3. Plans survived crashes; execution didn't. Every crashed session had a plan file on disk. The recovery protocol: resume = read plan + execute, no re-planning. Persistent state on disk is the only reliable thing.
4. The AuDHD hyperfocus trap. Session overreach was consistent: 4 crashed sessions each had 3–5 major tasks running concurrently. The ADHD coach framing: "The overreach is hyperfocus-within-scope (gravity, not escape). Fix the plumbing before adding more fixtures."
5. Convergent evolution with neuroscience. The brain metaphor wasn't imposed — it was discovered. The same constraints produce the same solutions whether you're a primate prefrontal cortex or a CLI tool managing 14 MCP servers.
Living system (daily production):
- 60+ registry entries across 12 namespaces
- 8 hooks across full Claude Code lifecycle
- Learning loop: hundreds of observations, dozens of instincts, auto-promotion active
- 5-layer memory: MEMORY.md → claude-mem (hundreds of observations + hundreds of vectors + over a thousand semantic links) → Obsidian (dozens of notes) → Cortex registry → Unified Search (RRF fusion)
- RAG: thousands of docs and chunks, partially vectorized (blocked on Ollama crash)
- 12 Greek law tools in production
- Dashboard: 16 sections, dual-tier polling, standalone PWA
This repo (v0.1.0):
- Working
cortex initCLI (end-to-end initialization) - 4-tool MCP server (raw JSON-RPC, zero dependencies)
- 3 installable hooks (session-start, observe, prompt-router)
- Starter capability registry (5 entries — your production system will grow from here)
- Architecture documentation + config schema + domain pack format spec
Next: Extract learning loop (analyze.py → Node.js). Generalize hooks to use $CORTEX_HOME. Add tests. Collapse 9 package stubs to ~4 active packages.
Every wrong turn taught something. The crashes were the best audits. The system that exists today is the sum of everything that broke.