MAP Framework Architecture

Deep technical documentation for MAP (Modular Agentic Planner) implementation.

Research Foundation: Nature Communications research (2025) — 74% improvement in planning tasks

Architecture Overview
- .map/ Artifact Specifications
Agent Specifications
MCP Integration
Customization Guide
Template Maintenance
Context Engineering

Architecture Overview

High-Level Design

MAP Framework implements cognitive architecture inspired by prefrontal cortex functions, orchestrating 11 specialized agents for software development with automatic quality validation.

Key Design Principle: Each slash surface has its own unique workflow with different agent sequences. There is no single "standard" workflow. Most orchestration logic lives in .claude/commands/map-*.md; /map-learn is maintained skill-first in .claude/skills/map-learn/SKILL.md so the learning workflow has a single source of truth.

┌─────────────────────────────────────────────────────────────────┐
│                     SLASH COMMANDS                               │
│  Each command orchestrates its own unique agent sequence        │
└───────────────────┬─────────────────────────────────────────────┘
                    │
    ┌───────────────┼───────────────────────────────────────┐
    │               │               │               │        │
    ▼               ▼               ▼               ▼        ▼
┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐  ┌────────┐  ┌────────┐
│EFFICIENT│    │  TDD   │    │ DEBUG  │    │ DEBATE │  │ REVIEW │  │  FAST  │
└────┬────┘    └────┬────┘   └────┬────┘   └────┬────┘  └────┬────┘  └────┬────┘
     │              │             │              │            │            │
     ▼              ▼             ▼              ▼            ▼            ▼
┌─────────────────────────────────────────────────────────────────┐
│                   WORKFLOW-SPECIFIC SEQUENCES                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  /map-efficient (⭐ RECOMMENDED):                                │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ TaskDecomposer → For each subtask:                       │   │
│  │   ├─ Standard: Actor → Monitor → [Predictor if risky]    │   │
│  │   └─ Self-MoA: 3×Actor → 3×Monitor → Synthesizer → Mon.  │   │
│  │ No Evaluator. Learning via /map-learn (optional)         │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  /map-tdd (test-first development):                              │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ TaskDecomposer → For each subtask:                       │   │
│  │   TEST_WRITER (tests from spec) → TEST_FAIL_GATE (Red)  │   │
│  │   → Actor (code_only) → Monitor → [Predictor if risky]  │   │
│  │ Tests written BEFORE implementation. 8 phases.          │   │
│  │ Single-subtask: /map-tdd ST-001 (TDD for one subtask)   │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  /map-task (single subtask execution):                           │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ Runs one subtask from existing plan (no decomposition).  │   │
│  │ Usage: /map-task ST-001                                  │   │
│  │ Requires: /map-plan completed first.                     │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  /map-debug (debugging-specific):                               │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ TaskDecomposer → For each step:                          │   │
│  │   Investigation: Actor (analyze) → Monitor               │   │
│  │   Fix: Actor → Monitor → Predictor → Evaluator           │   │
│  │ Includes both investigation AND implementation phases     │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  /map-review (interactive 4-section):                            │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ git diff analysis                                         │   │
│  │ → [Monitor + Predictor + Evaluator] (all 3 parallel)     │   │
│  │ → Interactive: Architecture → Quality → Tests → Perf     │   │
│  │ → Verdict: PROCEED / REVISE / BLOCK                      │   │
│  │ --ci mode: batch report, no interaction                   │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  /map-fast (⚠️ minimal, low-risk only):                        │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ TaskDecomposer → Actor → Monitor                         │   │
│  │ No Predictor, no Evaluator, no learning                  │   │
│  │ Max 3 iterations. Use only for small, low-risk changes   │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  /map-release (7-phase release workflow):                       │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ Phase 1: 12 validation gates (tests, lint, CI, etc.)     │   │
│  │ Phase 2: Version determination (user decides bump type)  │   │
│  │ Phase 3: Execute bump-version.sh                         │   │
│  │ Phase 4: Push tag (⚠️ IRREVERSIBLE)                      │   │
│  │ Phase 5: Monitor CI/CD, create GitHub release            │   │
│  │ Phase 6: Verify PyPI availability + installation test    │   │
│  │ Phase 7: Summary                                         │   │
│  │ No agents. Bash scripts + GitHub CLI orchestration       │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  /map-learn (post-workflow learning):                           │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ Reflector → Verification                                  │   │
│  │ Standalone command. Run AFTER any workflow completes.    │   │
│  │ Extracts patterns from workflow outcomes.                 │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  RESEARCH-AGENT (on-demand in any workflow):                    │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ Heavy codebase reading with compressed output            │   │
│  │ Called conditionally when context gathering needed       │   │
│  │ Runs in isolation to avoid polluting main context        │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Orchestration Model

Command-Driven Workflow:

Most orchestration logic is implemented in slash command prompts (.claude/commands/map-*.md)
/map-learn is the exception: its canonical implementation lives in .claude/skills/map-learn/SKILL.md instead of a duplicate command file
NOT a separate agent file
When you run /map-efficient, the command prompt coordinates the workflow by calling agents sequentially via the Task tool

Workflow Stages:

Task Decomposition (TaskDecomposer)
- Receives high-level goal
- Breaks into atomic subtasks
- Estimates complexity and dependencies
- Outputs structured task plan
Implementation Loop (per subtask)
- Code Generation (Actor): Generates solution
- Validation (Monitor): Checks quality, security, correctness
- Feedback Loop: If validation fails, return to Actor with feedback (max 3-5 iterations)
Impact Analysis (Predictor)
- Analyzes change ripple effects across codebase
- Identifies affected components
- Flags potential breaking changes
Quality Scoring (Evaluator)
- Rates solution on multiple dimensions
- Functionality, security, testability, maintainability
- Scores 0-10, approval threshold >7.0
Learning Cycle (Reflector)
- Extracts patterns from successes and failures
- Enables continuous improvement

Agent Coordination Protocol

Sequential Execution:

Each agent receives structured input from previous agent
Agents communicate via JSON output format
Orchestrator enforces strict agent ordering

Error Handling:

Actor-Monitor feedback loops limited to 3-5 iterations
Infinite loop detection at orchestrator level
Graceful degradation if agent fails

State Management:

Workflow checkpoint stored in .map/progress.md (YAML frontmatter + markdown)
Task plan stored in .map/<branch>/task_plan_*.md
Workflow logs in .map/workflow_logs/
Metrics tracked in .claude/metrics/agent_metrics.jsonl

.map/ Artifact Specifications

MAP Framework stores workflow artifacts in the .map/ directory. All artifacts follow JSON schemas defined in src/mapify_cli/schemas.py.

For branch-scoped workflows, MAP also keeps .map/<branch>/artifact_manifest.json as the high-level stage ledger for:

workflow_fit
spec
plan
test_contract
implementation
review
verification
learn_handoff

Targeted TDD flows additionally persist test_contract_<subtask>.md and test_handoff_<subtask>.json. Those artifacts are what let /map-task ST-001 resume implementation from a clean red-phase handoff instead of reusing the full test-authoring context.

1. State Artifact (`state_<branch>.json`)

Purpose: Track workflow state including terminal status and early termination.

Written by: src/mapify_cli/workflow_state.py (WorkflowState class)

Schema: STATE_ARTIFACT_SCHEMA in src/mapify_cli/schemas.py

Example:

{
  "workflow": "map-efficient",
  "terminal_status": "complete",
  "ended_early": null,
  "subtasks": [
    {
      "id": "ST-001",
      "title": "Create User model",
      "status": "complete",
      "validation_criteria": [
        "Model includes email field",
        "Password hashing implemented"
      ]
    },
    {
      "id": "ST-002",
      "title": "Implement login endpoint",
      "status": "complete",
      "validation_criteria": []
    }
  ]
}

Early Termination Example:

{
  "workflow": "map-efficient",
  "terminal_status": "won't_do",
  "ended_early": {
    "by_user": true,
    "reason": "User requested early termination",
    "at_subtask_id": "ST-003"
  },
  "subtasks": [
    {
      "id": "ST-001",
      "title": "Create User model",
      "status": "complete",
      "validation_criteria": []
    },
    {
      "id": "ST-002",
      "title": "Implement login endpoint",
      "status": "won't_do",
      "validation_criteria": []
    }
  ]
}

Terminal Status Values:

Status	Description
`pending`	Workflow not started or in progress
`complete`	All subtasks completed successfully
`blocked`	Workflow blocked by unresolved issue
`won't_do`	Workflow terminated early by user
`superseded`	Workflow replaced by newer workflow

2. Verification Results Artifact (`verification_results_<branch>.json`)

Purpose: Machine-readable record of hook verification checks for CI/CD integration.

Written by: src/mapify_cli/verification_recorder.py (record_verification_result function)

Schema: VERIFICATION_RESULTS_SCHEMA in src/mapify_cli/schemas.py

Example:

{
  "overall": "pass",
  "recipes": [
    {
      "id": "check_ruff",
      "status": "pass",
      "summary": "ruff passed",
      "duration_ms": 1200
    },
    {
      "id": "check_secrets",
      "status": "skipped",
      "summary": "No staged files to check",
      "duration_ms": 50,
      "skip_reason": "No files were staged for commit"
    },
    {
      "id": "check_mypy",
      "status": "fail",
      "summary": "mypy failed",
      "duration_ms": 3500
    }
  ]
}

Overall Status Aggregation:

Condition	Overall Status
ANY recipe is `fail`	`fail`
ALL recipes are `pass`	`pass`
Otherwise	`unknown`

Recipe Status Values:

Status	Description
`pass`	Check completed successfully
`fail`	Check found problems
`skipped`	Check intentionally skipped (see `skip_reason`)

3. Repo Insight Artifact (`repo_insight_<branch>.json`)

Purpose: Project metadata for language detection and suggested checks.

Written by: src/mapify_cli/repo_insight.py (create_repo_insight function)

Schema: REPO_INSIGHT_SCHEMA in src/mapify_cli/schemas.py

Example:

{
  "language": "python",
  "suggested_checks": [
    "make check",
    "pytest tests/test_template_sync.py -v",
    "make sync-templates"
  ],
  "key_dirs": [
    "src",
    "tests",
    ".claude"
  ]
}

Language Values:

Language	Detection Marker
`python`	`pyproject.toml`, `setup.py`, `requirements.txt`
`typescript`	`tsconfig.json` (takes precedence over `package.json`)
`javascript`	`package.json`
`go`	`go.mod`
`rust`	`Cargo.toml`
`unknown`	No marker files found

Constraints:

key_dirs maximum 5 entries
All key_dirs paths are relative (no leading /)
suggested_checks filtered based on available tools (e.g., make commands only if Makefile exists)

Schema Cross-Reference

All JSON schemas are defined in src/mapify_cli/schemas.py:

Schema Constant	Artifact File	JSON Schema Draft
`STATE_ARTIFACT_SCHEMA`	`state_<branch>.json`	2020-12
`VERIFICATION_RESULTS_SCHEMA`	`verification_results_<branch>.json`	2020-12
`REPO_INSIGHT_SCHEMA`	`repo_insight_<branch>.json`	2020-12

Workflow Variants

MAP Framework provides multiple workflow variants with different agent orchestration strategies:

1. `/map-efficient` - Optimized Pipeline (4-6 Agents) ⭐ RECOMMENDED

Agent Sequence: TaskDecomposer → [conditional ResearchAgent] → (Actor → Monitor → [conditional Predictor]) per subtask → FinalVerifier

With Self-MoA (--self-moa flag OR high risk/complexity): TaskDecomposer → [conditional ResearchAgent] → (3×Actor parallel → 3×Monitor parallel → Synthesizer → final Monitor → [conditional Predictor]) per subtask → FinalVerifier

Optimizations:

Conditional Predictor (token savings)
- Only called if TaskDecomposer assigns risk_level='high'/'medium'
- OR if Monitor sets escalation_required=true
- Low-risk subtasks (simple CRUD, UI updates) skip impact analysis
Evaluator Skipped (token savings)
- Monitor provides sufficient validation for most tasks
- Evaluator's 6-dimension scoring rarely changes proceed/reject decision
- Quality still ensured by Monitor's comprehensive checks
Learning is a deferred closeout via /map-learn
- Workflow does NOT include Reflector inline
- Completion writes learning-handoff.md / .json under .map/<branch>/
- Completion also updates learning-metrics.json with repeated learned-rule violation signals when current findings overlap existing learned rules
- Separation keeps workflows fast while preserving the context needed for later learning

Token Usage: Baseline for production workflows Learning: Deferred via /map-learn, powered by branch-scoped learning handoff artifacts and learning-effectiveness metrics Quality Gates: Essential agents (Monitor, conditional Predictor)

Technical Details:

# Conditional Predictor Logic (Orchestrator)
for subtask in subtasks:
    actor_output = call_actor(subtask)
    monitor_output = call_monitor(actor_output)

    if monitor_output.valid:
        # Only call Predictor if high risk
        if (subtask.risk_level in ['high', 'medium'] or
            monitor_output.escalation_required):
            predictor_output = call_predictor(actor_output)
        # Apply changes
        apply_code_changes(actor_output)

# At end: write branch-scoped learning handoff, record repeated-rule signals, then suggest /map-learn
write_learning_handoff(...)
print("Run /map-learn now, or later from the generated handoff")

Use for:

Production code where token costs matter (RECOMMENDED)
Well-understood features (standard CRUD, APIs, UI)
Iterative development with frequent workflows
Any task where /map-fast feels too risky

2. `/map-fast` - Minimal Pipeline (3 Agents) ⚠️

Agent Sequence: TaskDecomposer → (Actor → Monitor) per subtask

Agents SKIPPED:

❌ Predictor (no impact analysis)
❌ Evaluator (no quality scoring)
❌ Reflector (no lesson extraction)

Token Usage: 50-60% of baseline Learning: None (defeats MAP's purpose) Quality Gates: Basic only (Monitor validation)

Architectural Consequences:

Knowledge base remains static (no continuous improvement)
Breaking changes undetected (no Predictor)
Security/performance issues may slip through (no Evaluator)
Same mistakes repeated (no Reflector)

Use ONLY for:

Small, low-risk changes with clear acceptance criteria
Localized fixes with minimal blast radius

Avoid for:

Security-sensitive functionality
Broad refactors or multi-module changes
High uncertainty requirements

Technical Details:

# Debate Workflow Orchestrator Logic
for subtask in subtasks:
    # Phase 1: Generate 3 variants in parallel
    variants = parallel_execute([
        call_actor(subtask, approach_focus="security"),
        call_actor(subtask, approach_focus="performance"),
        call_actor(subtask, approach_focus="simplicity")
    ])

    # Phase 2: Validate all variants in parallel
    validations = parallel_execute([
        call_monitor(variants[0]),
        call_monitor(variants[1]),
        call_monitor(variants[2])
    ])

    # Phase 3: Debate-Arbiter cross-evaluation + synthesis (Opus)
    # DebateArbiter both evaluates AND synthesizes in single call
    arbiter_output = call_debate_arbiter(
        variants=variants,
        validations=validations,
        model="claude-opus-4-5"
    )
    # arbiter_output includes: comparison_matrix, decision_rationales,
    # synthesis_reasoning, and synthesized code

    # Phase 4: Final validation and impact analysis
    final_monitor = call_monitor(arbiter_output.synthesized_code)
    if final_monitor.valid:
        if subtask.risk_level in ['high', 'medium']:
            predictor_output = call_predictor(arbiter_output)
        apply_code_changes(arbiter_output.synthesized_code)

Trade-offs:

Pro: Maximum solution quality through variant exploration
Pro: Discovers optimal patterns for knowledge base
Pro: Arbiter reasoning provides detailed decision documentation
Con: Higher token cost (3× Actor + Opus arbiter)
Con: Longer execution time (parallel but still 3× work)

4. `/map-debug` - Debugging Workflow (5 Agents)

Agent Sequence: TaskDecomposer → For each step: Actor → Monitor → Predictor → Evaluator

Debugging-Specific Features:

Pre-Analysis Phase
- Identify affected files via Grep/Glob
Step Types (defined by TaskDecomposer):
- investigation: Analyze code, logs, reproduce issue (Actor read-only)
- fix: Implement solution (Actor generates code changes)
- verification: Test fix, check for regressions
Full Agent Pipeline for Fixes
- Unlike /map-efficient, debugging fixes go through ALL agents
- Predictor checks for similar issues elsewhere in codebase
- Evaluator verifies fix quality and edge case coverage

Token Usage: 70-80% of baseline Learning: Optional via /map-learn Quality Gates: All agents for fixes, reduced for investigation

Use for:

Bug fixes and issue resolution
Root cause analysis
Regression debugging

5. `/map-review` - Interactive Code Review (3 Agents)

Agent Sequence: git diff → [Monitor + Predictor + Evaluator] (all 3 parallel) → Interactive 4-section presentation → Verdict

Review-Specific Features:

No TaskDecomposer - Reviews current branch changes as-is
Parallel Agent Launch - 3 agents launched in a single message
Interactive 4-Section Presentation:
- Architecture (primary: Predictor — breaking changes, affected components)
- Code Quality (primary: Monitor — correctness, maintainability issues)
- Tests (primary: Monitor — testability, coverage gaps)
- Performance (primary: Monitor — performance issues, cross-ref Predictor risk)
Review Section Protocol — each section presents top N issues (BIG=4, SMALL=1) with options and tradeoffs, user picks resolution via AskUserQuestion
BIG/SMALL mode — user selects review depth at start
CI/Auto mode (--ci/--auto flag) — batch report with no interaction, auto-selects recommended options
Verdict Logic:
- PROCEED: Monitor approved + valid AND Evaluator proceed
- REVISE: Monitor needs_revision OR Evaluator improve
- BLOCK: Monitor rejected OR Evaluator reconsider OR security/functionality < 5 OR (Predictor high risk + breaking changes)
- Priority: BLOCK > REVISE > PROCEED

Token Usage: ~15-25K tokens (parallel agents + interactive 4-section presentation; --ci mode ~12-15K) Learning: Optional via /map-learn Quality Gates: All 3 review agents

Use for:

Pre-commit code review
PR review automation
Quality gate before merge
CI pipeline integration (--ci mode)

6. `/map-release` - Release Workflow (No Agents)

Workflow: 7 sequential phases with validation gates (no AI agents)

Phases:

Pre-release validation (12 gates: tests, lint, CI, security, CHANGELOG)
Version determination (user chooses bump type)
Execute bump-version.sh (updates pyproject.toml, CHANGELOG, creates tag)
Push tag (⚠️ IRREVERSIBLE - triggers CI/CD)
Monitor CI/CD, create GitHub release
Verify PyPI availability + installation test
Summary

Unique Characteristics:

No AI agents - bash scripts + GitHub CLI orchestration
User confirmation required before irreversible tag push
Rollback procedures documented for each failure scenario

Use for:

Package releases to PyPI
Version bumping with full validation

7. `/map-learn` - Post-Workflow Learning (1 Agent)

Agent Sequence: Reflector → Verification

Standalone Learning:

Run AFTER any workflow completes (not during)
Extracts patterns from Actor/Monitor/Predictor outputs

Token Usage: 5-8K tokens (depends on workflow size) When to use:

After /map-efficient completes with valuable patterns
After /map-debug reveals debugging techniques
Retroactively for /map-fast workflows

Token Breakdown by Agent

Typical token consumption per subtask (estimated):

Agent	Prompt	Output	Total	Notes
TaskDecomposer	1.5K	1K	2.5K	One-time (not per subtask)
Actor	2K	3-4K	5-6K	Largest consumer (full file content)
Monitor	1.5K	1K	2.5K	Always included
Predictor	1.5K	1K	2.5K	Conditional in /map-efficient, always in /map-debug
Evaluator	2K	1K	3K	Only in /map-debug, /map-review
Reflector	2K	1K	3K	Only via /map-learn
Synthesizer	2K	3K	5K	/map-efficient Self-MoA only
ResearchAgent	2K	4K	6K	Heavy codebase reading, on-demand in any workflow

Per-subtask totals:

/map-efficient (standard): ~9-12K tokens (baseline)
/map-efficient (Self-MoA): ~25-30K tokens (3× Actor + Synthesizer)
/map-fast: ~8-10K tokens (minimal, no learning)
/map-debug: ~15-20K tokens (full pipeline with Evaluator)
/map-review: ~15-25K tokens (parallel agents + interactive 4-section presentation; --ci mode ~12-15K)

For 5-subtask workflow:

/map-efficient: ~45-60K tokens (learning optional via /map-learn: +5-8K)
/map-fast: ~40-50K tokens (no learning support)

Workflow Variant Selection

See USAGE.md - Workflow Variants for detailed decision guide, real-world examples, and cost analysis.

Hook-Based Context Injection (v2.0.0+)

Problem: Long command files (995 lines, ~5.4K tokens) cause attention dilution → Claude skips critical workflow steps like research and self-audit (20% compliance rate).

Solution: State-machine orchestration + PreToolUse hook injection

Architecture

┌─────────────────────────────────────────────────────────────┐
│  PreToolUse Hook (workflow-context-injector.py)             │
│  • Reads: .map/<branch>/step_state.json                     │
│  • Injects: ~150 token reminder before EVERY tool call      │
│  • Shows: Current step, progress, mandatory next action     │
│  • Non-blocking: Always allows tool execution               │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│  map-efficient.md (~1.75K tokens, down from ~5.4K)          │
│  1. Get next step instruction (map_orchestrator.py)         │
│  2. Route to executor (Actor/Monitor/etc)              │
│  3. Execute step                                            │
│  4. Validate completion → Update state                      │
│  5. Recurse if more steps; else complete                    │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│  State Machine (.map/scripts/map_orchestrator.py)                │
│  • 8 step phases (DECOMPOSE → SUBTASK_APPROVAL + 2 TDD)     │
│  • State file: .map/<branch>/step_state.json                │
│  • Enforces: Sequential execution, no step skipping         │
│  • CLI: get_next_step, validate_step, initialize,            │
│         monitor_failed, wave_monitor_failed, skip_step,      │
│         set_waves, get_wave_step, advance_wave, + more       │
└─────────────────────────────────────────────────────────────┘

Key Innovation: Constant Reminders

Pattern borrowed from ralph-loop's build_loop_context(): Inject small, frequent reminders rather than upfront instructions.

Hook Output Example:

╔═══════════════════════════════════════════════════════════╗
║ MAP WORKFLOW CHECKPOINT                                   ║
╠═══════════════════════════════════════════════════════════╣
║ Current Step:  2.2 - RESEARCH
║ Progress:      Subtask 1/5
║ Completed:     1.0_DECOMPOSE, 1.5_INIT_PLAN, 1.6_INIT_STATE
║
║ ⚠️  MANDATORY NEXT ACTION:
║    Call research-agent BEFORE Actor
╚═══════════════════════════════════════════════════════════╝

Injected into system prompt before EVERY tool call → Claude cannot "forget" the current step.

Results

Metric	Before (v1.x)	After (v2.0.0)
Step compliance	~20%	~85% (predicted)
Command file tokens	~5,400	~1,750
Research skip rate	80%	~5% (predicted)
Self-audit skip rate	90%	~10% (predicted)
User interventions	~3 per workflow	~0.3 (predicted)
Hook latency	N/A	<100ms

Token Economics

Before: 5,400 tokens per invocation × 10 invocations = 54,000 tokens
After: 1,750 tokens + (150 hook tokens × 50 tool calls) = 9,250 tokens
Net savings: ~83% reduction despite hook overhead

Implementation Details

8 Step Phases (6 standard + 2 TDD):

1.0 DECOMPOSE - task-decomposer agent
1.5 INIT_PLAN - Generate task_plan.md
1.55 REVIEW_PLAN - User approval checkpoint
1.56 CHOOSE_MODE - Auto-skipped (always batch mode)
1.6 INIT_STATE - Create step_state.json
2.2 RESEARCH - research-agent (conditional)
2.25 TEST_WRITER - TDD: write tests from spec (TDD mode only, auto-skipped otherwise)
2.26 TEST_FAIL_GATE - TDD: verify tests fail without impl (TDD mode only)
2.3 ACTOR - Actor agent implementation (code-only in TDD mode)
2.4 MONITOR - Monitor validation (retry up to 5 times)

State File:

step_state.json - Single source of truth for step sequencing, hook injection, and gate enforcement

Migration Guide (v1.x → v2.0.0)

Breaking Change: /map-efficient now requires Python state machine.

User Action:

# Update MAP Framework installation
mapify init  # Regenerates .claude/ with new hooks and scripts

# Existing workflows continue automatically
# No manual migration needed for in-progress workflows

For Custom Workflows: If you modified .claude/commands/map-efficient.md, you must manually integrate state machine calls:

Replace monolithic step logic with map_orchestrator.py CLI calls
See template: src/mapify_cli/templates/commands/map-efficient.md

Agent Specifications

1. TaskDecomposer

Responsibility: Break high-level goals into atomic, executable subtasks.

Input:

{
  "goal": "implement user authentication with JWT tokens",
  "context": {
    "language": "Python",
    "framework": "Flask",
    "existing_files": ["app.py", "models.py"]
  }
}

Output:

{
  "subtasks": [
    {
      "id": "auth_001",
      "description": "Create User model with password hashing",
      "estimated_complexity": "medium",
      "dependencies": []
    },
    {
      "id": "auth_002",
      "description": "Implement /login endpoint with JWT generation",
      "estimated_complexity": "high",
      "dependencies": ["auth_001"]
    }
  ]
}

Key Behaviors:

Each subtask should be completable in <100 lines of code
Explicit dependency tracking
Complexity estimation (low/medium/high)
Considers existing codebase structure

2. Actor

Responsibility: Generate code and solutions for subtasks.

Input:

{
  "subtask_description": "Implement /login endpoint with JWT generation",
  "language": "Python",
  "framework": "Flask",
  "existing_patterns": ["impl-0042: Use bcrypt for password hashing"],
  "feedback": "Missing error handling for invalid credentials"
}

Output Structure:

Approach (2-3 sentences)
Code Changes (complete implementations, no ellipsis)
Trade-offs (alternatives considered, decisions made)
Testing Considerations (critical test cases)
Used Patterns (pattern IDs applied)

Key Behaviors:

Fetches current docs for external libraries (via deepwiki)
Explicit error handling required (no silent failures)
Complete code, not sketches or placeholders
Security-first approach for auth/data access

MCP Tool Usage:

mcp__deepwiki__read_wiki_contents: Get current library/project documentation

3. Monitor

Responsibility: Validate code quality, security, and correctness.

Input: Actor's complete output (approach, code, trade-offs, tests)

Output:

{
  "validation_passed": false,
  "issues": [
    {
      "severity": "critical",
      "category": "security",
      "description": "Password not hashed before storage",
      "suggested_fix": "Use bcrypt.hashpw() before db.session.add()"
    }
  ],
  "feedback": "Add password hashing using bcrypt library. Import bcrypt at top of file."
}

Validation Criteria:

✅ Error handling present (no silent failures)
✅ Security best practices (OWASP Top 10 compliance)
✅ File scope respected (no out-of-scope modifications)
✅ Code completeness (no ellipsis/placeholders)
✅ Dependency justification (if new deps added)

Key Behaviors:

Severity classification: critical/major/minor
Specific, actionable feedback
Checks against project coding standards

4. Predictor

Responsibility: Analyze change impact across codebase.

Input: Actor's code changes

Output:

{
  "impact_analysis": {
    "affected_files": ["app.py", "models.py", "tests/test_auth.py"],
    "breaking_changes": false,
    "risk_level": "medium",
    "ripple_effects": [
      {
        "component": "User API",
        "effect": "New endpoint requires documentation update",
        "action_required": "Update API docs"
      }
    ]
  }
}

Analysis Dimensions:

File dependencies (imports, function calls)
API contract changes
Database schema modifications
Configuration requirements
Test coverage gaps

Model Used: Sonnet (impact analysis requires complex reasoning)

5. Evaluator

Responsibility: Score solution quality on multiple dimensions.

Input: Actor's output + Predictor's impact analysis

Output:

{
  "scores": {
    "functionality": 9,
    "security": 8,
    "testability": 7,
    "maintainability": 8,
    "overall": 8.0
  },
  "approved": true,
  "rationale": "Strong implementation with proper error handling. Consider adding integration tests."
}

Scoring Rubric (0-10):

Functionality: Does it solve the problem completely?
Security: OWASP compliance, input validation, secure defaults
Testability: Can it be easily tested? Clear test cases provided?
Maintainability: Clear code, good naming, documented trade-offs

Approval Threshold: >7.0 overall score

Model Used: Sonnet (evaluation requires nuanced judgment)

6. Reflector

Responsibility: Extract lessons from successes and failures.

Input: Complete workflow context (Actor, Monitor, Predictor, Evaluator outputs)

Output:

{
  "patterns_extracted": [
    {
      "pattern_id": "auth_jwt_001",
      "category": "implementation",
      "content": "Use bcrypt for password hashing with work factor 12",
      "when_to_use": "User authentication with password storage",
      "trade_offs": "Slower than SHA256 but much more secure",
      "code_snippet": "hashed = bcrypt.hashpw(password.encode('utf-8'), bcrypt.gensalt(12))"
    }
  ]
}

Key Behaviors:

Extracts both successful patterns and failure lessons
Contextualizes lessons (when to apply, when to avoid)
Links to specific workflow outcomes

MCP Tool Usage:

mcp__sequential-thinking__sequentialthinking: Structure reasoning process

7. DocumentationReviewer

Responsibility: Check documentation completeness and correctness.

Input: Documentation files + related code

Output:

{
  "completeness_score": 8,
  "issues": [
    {
      "file": "API.md",
      "issue": "Missing error response format for 401 Unauthorized",
      "suggested_fix": "Add example JSON response for 401 errors"
    }
  ]
}

Validation Criteria:

✅ API endpoints documented with request/response examples
✅ Error codes and responses documented
✅ Configuration options explained
✅ Examples match actual code behavior

8. Synthesizer

Responsibility: Merge best elements from multiple Actor variants in Self-MoA (Mixture of Agents) workflows.

Input: Multiple Actor variants (typically 3) with different optimization focuses + DebateArbiter guidance

Output:

{
  "synthesized_solution": {
    "approach": "Hybrid approach combining security validation from v1, performance optimization from v2, and clear structure from v3",
    "code_changes": "// Complete merged implementation",
    "trade_offs": "Decision points resolved based on arbiter analysis",
    "testing_considerations": "Merged test cases covering all variants' scenarios",
    "decisions_resolved": [
      {
        "decision": "Error handling strategy",
        "variants": {
          "v1_security": "Comprehensive validation with detailed errors",
          "v2_performance": "Fast-fail with minimal overhead",
          "v3_simplicity": "Standard try-catch blocks"
        },
        "chosen": "v1_security with v2_performance optimizations",
        "rationale": "Arbiter recommended comprehensive validation is critical; optimized by caching validation results"
      }
    ]
  }
}

Key Behaviors:

Analyzes decision points from all variants
Resolves conflicts using DebateArbiter guidance
Preserves best practices from each variant
Creates coherent unified solution (not patchwork)
Documents synthesis rationale for learning

Model Used: Sonnet (requires strong reasoning for synthesis)

Usage Context: Invoked in /map-efficient --self-moa workflow for multi-variant synthesis

9. ResearchAgent

Responsibility: Heavy codebase reading with context isolation and compressed output for Actor/Monitor consumption.

Input:

{
  "research_goal": "Find all authentication implementations",
  "file_patterns": ["**/*auth*.py", "**/*login*.js"],
  "symbols": ["authenticate", "login", "verify_token"],
  "intent": "locate|understand|pattern|impact"
}

Output:

{
  "relevant_locations": [
    {
      "file": "app/auth/jwt.py",
      "lines": [45, 67],
      "signatures": ["def verify_token(token: str) -> User"],
      "description": "JWT token validation with expiration check"
    }
  ],
  "patterns_found": [
    "All auth functions use bcrypt for password hashing",
    "Token refresh logic in separate module (app/auth/refresh.py)"
  ],
  "confidence": 0.85
}

Key Behaviors:

Reads multiple files without polluting Actor context
Compresses findings to essential information
Provides file locations and signatures (not full code)
Returns confidence score for search completeness
Enables Actor to Read() only necessary files

Model Used: Sonnet (requires understanding code semantics)

Usage Context: Called by Actor when implementing features that integrate with existing code

Performance:

Reads 10-50 files per invocation
Outputs compressed summary (<2K tokens)
Prevents Actor context bloat (would be 20-50K tokens if Actor read directly)

11. FinalVerifier

Responsibility: Adversarial verifier applying the "Four-Eyes Principle" — verifies the ENTIRE task goal is achieved, not just individual subtasks. Catches premature completion and hallucinated success.

Input:

{
  "original_goal": "From .map/<branch>/task_plan_<branch>.md",
  "acceptance_criteria": "From task plan table",
  "completed_subtasks": "From progress_<branch>.md checkboxes",
  "validation_criteria": "From orchestrator"
}

Output:

{
  "verdict": "PASS",
  "confidence": 0.95,
  "criteria_met": ["All acceptance criteria verified"],
  "root_cause": null,
  "recommendation": "COMPLETE"
}

Verification Process:

Read original goal and acceptance criteria from .map/ checkpoint files
Verify each acceptance criterion against actual file state (Read, Grep, Bash)
Run tests if specified in validation criteria
Apply root cause analysis if verification fails
Return verdict: PASS → COMPLETE, FAIL → RE_DECOMPOSE or ESCALATE

Model Used: Sonnet (adversarial verification requires strong reasoning)

Usage Context: Mandatory final step in /map-efficient and invoked by /map-check

MCP Integration

Overview

MAP uses MCP (Model Context Protocol) servers for enhanced capabilities beyond base Claude Code functionality.

Available MCP Servers

MCP Server	Purpose	Required For	Performance Notes
sequential-thinking	Chain-of-thought reasoning	Complex problem solving	Medium latency (~1-3s)
deepwiki	GitHub repository analysis	Research phase	Medium latency (~3-7s)

Configuration

MCP servers are configured differently depending on the usage context:

Project-Specific Configuration

File: .claude/mcp_config.json

{
  "mcp_servers": {
    "sequential-thinking": {
      "enabled": true,
      "description": "Chain-of-thought reasoning for complex problems"
    },
    "deepwiki": {
      "enabled": true,
      "description": "GitHub repository analysis and documentation"
    }
  }
}

MCP Tool Usage Patterns

Pattern 1: Documentation Lookup (Actor)

**WHEN using external libraries or researching projects:**

1. Read wiki structure:
   - Tool: mcp__deepwiki__read_wiki_structure
   - Input: Repository owner/name (e.g., "pallets/flask")

2. Read wiki contents:
   - Tool: mcp__deepwiki__read_wiki_contents
   - Parameters: repo_name, page path

3. Use docs for:
   - API signature verification
   - Best practices
   - Deprecation warnings

MCP Server Availability

Commonly Available:

sequential-thinking (reasoning)

May Require Installation:

deepwiki (check Claude Code documentation)

To verify availability:

# Inside Claude Code session
/tools list

Performance Considerations

Latency Budget (per subtask):

deepwiki docs: ~3-7s per fetch (Actor: 1-2 fetches)
Total overhead: ~3-7s per subtask

Optimization Strategies:

Batch similar searches where possible
Enable MCP caching when available (Phase 2 roadmap)

Customization Guide

Modifying Agent Prompts

Agent prompts are located in .claude/agents/*.md and use Handlebars template syntax for dynamic context injection.

Safe Modifications

✅ You CAN modify:

Instructions and examples
MCP tool usage guidance
Output format specifications
Domain-specific requirements
Validation criteria
Decision frameworks

Example:

# Add to Monitor agent:

## Additional Security Checks

- OWASP Top 10 compliance required
- All user inputs must be sanitized
- No hardcoded credentials allowed
- SQL queries must use parameterized statements

Unsafe Modifications

❌ You CANNOT remove:

Template variables: {{language}}, {{project_name}}, {{framework}}
Conditional blocks: {{#if existing_patterns}}...{{/if}}
Context sections: {{subtask_description}}, {{feedback}}

Why they're critical:

Orchestrator fills these at runtime with project context
Removing them breaks multi-language support and feedback loops
Git pre-commit hook validates their presence (see Hooks Integration)

Template Variable Reference

Available in all agents:

{{project_name}}           # e.g., "my-web-app"
{{language}}               # e.g., "Python", "JavaScript"
{{framework}}              # e.g., "Flask", "Next.js"
{{standards_doc}}          # Link to coding standards

Actor-specific:

{{subtask_description}}    # From TaskDecomposer
{{existing_patterns}}      # Relevant patterns from context
{{#if feedback}}           # Monitor feedback (retry loop)
  {{feedback}}
{{/if}}
{{allowed_scope}}          # Files allowed to modify

Monitor-specific:

{{#if feedback}}           # Previous iteration feedback
  {{feedback}}
{{/if}}

Reflector-specific:

{{plan_context}}           # Full workflow context

Model Selection Per Agent

MAP Framework uses intelligent model selection to balance quality and cost.

Current Configuration:

Agent	Model	Rationale
TaskDecomposer	sonnet-4-5	Quality-critical: task planning
Actor	sonnet-4-5	Quality-critical: code generation
Monitor	sonnet-4-5	Quality-critical: validation
Predictor	sonnet-4-5	Impact analysis requires complex reasoning
Evaluator	sonnet-4-5	Evaluation requires nuanced judgment
Reflector	sonnet-4-5	Quality-critical: pattern extraction
DocumentationReviewer	sonnet-4-5	Quality-critical: doc validation
Synthesizer	sonnet-4-5	Quality-critical: variant synthesis
DebateArbiter	opus-4-5	Highest quality: cross-variant reasoning
ResearchAgent	sonnet-4-5	Quality-critical: codebase understanding

Override Model Per Agent:

Edit .claude/agents/{agent}.md frontmatter:

---
model: claude-sonnet-4-5  # or claude-haiku-3-5
---

Cost vs Quality Trade-offs:

All Sonnet/Opus (current): Highest quality, Opus only for DebateArbiter
Downgrade to Haiku: Lower cost, risk of quality degradation in analysis and scoring

Recommended:

Keep on Sonnet: TaskDecomposer, Actor, Monitor, Predictor, Evaluator, Reflector, DocumentationReviewer, Synthesizer, ResearchAgent
Keep on Opus: DebateArbiter (cross-variant reasoning requires highest quality)
Safe to downgrade to Haiku: Predictor, Evaluator (if cost reduction is priority)

Adding Custom Agents

Use Case: Add domain-specific agent (e.g., SecurityAuditor, PerformanceOptimizer)

Steps:

Create agent file:

touch .claude/agents/security-auditor.md

Add YAML frontmatter:

---
version: 1.0.0
model: claude-sonnet-4-5
last_updated: 2025-10-23
---

Define agent role and context:

# IDENTITY
You are a security auditor specializing in OWASP Top 10 vulnerabilities.

## CONTEXT
- **Project**: {{project_name}}
- **Language**: {{language}}
- **Framework**: {{framework}}

Define output format:

## OUTPUT FORMAT

```json
{
  "vulnerabilities": [
    {
      "severity": "critical|high|medium|low",
      "owasp_category": "A01:2021 - Broken Access Control",
      "description": "...",
      "suggested_fix": "...",
      "references": ["..."]
    }
  ]
}

Update orchestration: Edit .claude/commands/map-efficient.md to call new agent:

## After Monitor validates:

**6. Security Audit** (SecurityAuditor):
- Call: Task(subagent_type="security-auditor", input=actor_output)
- Verify no critical vulnerabilities

Adapting to Project Conventions

Common Customizations:

Add project-specific coding standards: Edit Actor agent:

## PROJECT STANDARDS

- Use TypeScript strict mode
- All functions require JSDoc comments
- Max function length: 50 lines
- Prefer functional programming patterns

Add custom validation rules: Edit Monitor agent:

## CUSTOM VALIDATION

- [ ] All API endpoints have rate limiting
- [ ] Database queries use connection pooling
- [ ] Logs use structured JSON format

Integrate with CI/CD: Edit Evaluator agent:

## CI/CD INTEGRATION

**After approval:**
- Run: `npm run lint`
- Run: `npm test`
- Run: `npm run build`
- Only approve if all checks pass

Template Variables in Custom Agents

Access project context:

{{project_name}}    # From .claude/config.json
{{language}}        # From .claude/config.json
{{framework}}       # From .claude/config.json
{{standards_doc}}   # From .claude/config.json

Pass custom variables:

In orchestrator prompt:

Task(
  subagent_type="security-auditor",
  input={
    "code": actor_output,
    "compliance_level": "{{compliance_level}}"  # Custom variable
  }
)

In agent template:

{{compliance_level}}  # Will be filled by orchestrator

Template Maintenance

Template Validation

Automated Linter:

python scripts/lint-agent-templates.py

Checks performed:

✅ YAML frontmatter completeness (version, last_updated, changelog)
✅ Required sections present (mcp_integration, context, examples)
✅ Template variable syntax ({{variable}} - no spaces)
✅ XML tag matching (<section></section>)
✅ MCP tool description consistency
✅ Output format specifications

Example output:

✅ actor.md - PASSED
✅ monitor.md - PASSED
❌ predictor.md - FAILED
   - Missing section: <mcp_integration>
   - Unmatched tag: </examples>
   - Invalid template variable: {{ language }} (has spaces)

Git Pre-Commit Hook

Automatic validation before commits:

Located at: .git/hooks/pre-commit

Prevents commits if:

Template variables removed from agents
Critical sections deleted (feedback, context)
Massive deletions (>500 lines) without review

Example block:

❌ BLOCKED: Agent file is missing critical template variables!

File: .claude/agents/actor.md
Missing templates:
  - {{language}}
  - {{#if existing_patterns}}

These template variables are used by Orchestrator for context injection.
See .claude/agents/README.md for details.

To bypass (emergency only):

git commit --no-verify -m "message"

Template Versioning

Version Metadata:

All agent templates include:

---
version: 2.0.0
last_updated: 2025-10-17
changelog: .claude/agents/CHANGELOG.md
---

Version Scheme (Semantic Versioning):

Major (X.0.0): Breaking changes (template variable removal, output format changes)
Minor (2.X.0): New features (new MCP tool integration, new sections)
Patch (2.0.X): Bug fixes, clarifications, typo fixes

Changelog:

Agent template changes are tracked in the project's main CHANGELOG.md.

Example entry:

## [4.0.0] - 2025-01-14

### Breaking Changes
- Actor: Changed output format to include `used_patterns` array

### Fixed
- Monitor: Clarified validation criteria for error handling

MCP Patterns Reference

Centralized MCP guidance is embedded directly in agent templates:

Contents:

Common MCP tool usage patterns
Decision frameworks for tool selection
Agent-specific MCP integration guidelines
Best practices and anti-patterns
Troubleshooting common issues

Usage: Each agent template contains its own MCP Tool Selection Matrix with:

Conditions for when to use each tool
Query patterns for effective searches
Skip conditions to avoid unnecessary calls

Updating Strategies

When to update agent templates:

Research insights: New papers on prompt engineering, context engineering
Performance degradation: Monitor approval rate drops, Evaluator scores decline
New MCP tools: Additional capabilities become available
User feedback: Agents consistently make same mistakes

Update Process:

Analyze metrics:

python scripts/analyze-metrics.py
# Check: approval rate, iteration count, quality scores

Identify root cause:
- Low Monitor approval → Actor needs better guidance
- High iteration count → Monitor giving unclear feedback
- Low Evaluator scores → Evaluator rubric too strict/loose
Update template:
- Add examples of correct behavior
- Clarify ambiguous instructions
- Update MCP tool usage patterns
Validate:
```
python scripts/lint-agent-templates.py
```
Test:
- Run /map-efficient on known task
- Compare metrics before/after
- Ensure no regressions
Document:
- Update version and last_updated in frontmatter
- Add entry to CHANGELOG.md
- Update MCP Tool Selection Matrix in agent template if tool usage changed

Rollback if needed:

git checkout HEAD~1 .claude/agents/actor.md

Context Engineering

MAP Framework applies cutting-edge context engineering principles for AI agents, based on research from Manus.im and academic papers.

Recitation Pattern (Phase 1.1)

Problem: On long tasks (5+ subtasks), models lose focus and forget goals as context window fills.

Solution: Attention focus mechanism — .map/progress.md is updated before each step, keeping goals "fresh" in the context window.

Mechanism:

TaskDecomposer creates initial plan:

# Task: feat_auth
## Goal: Implement JWT authentication
## Subtasks:
- [ ] 1/5: Create User model
- [ ] 2/5: Implement login endpoint
- [ ] 3/5: Add token validation middleware
- [ ] 4/5: Add refresh token logic
- [ ] 5/5: Write integration tests

Orchestrator updates before each subtask:

# Current Task: feat_auth
## Progress: 2/5 completed
- [✓] 1/5: Create User model
- [→] 2/5: Implement login endpoint (CURRENT, Iteration 2)
  - Last error: Missing JWT import
- [☐] 3/5: Add token validation middleware
- [☐] 4/5: Add refresh token logic
- [☐] 5/5: Write integration tests

Actor receives plan in context:

## Current Task Plan (Recitation Pattern)

{{plan_context}}

**Your current subtask is marked with (CURRENT)**

Implementation:

Workflow state is managed through file-based persistence in .map/ directory:

.map/progress.md - Workflow checkpoint (YAML frontmatter + markdown body)
.map/<branch>/task_plan_*.md - Task decomposition with validation criteria
.map/dev_docs/context.md - Project context
.map/dev_docs/tasks.md - Task checklist

Benefits:

✅ +20-30% success rate on complex tasks (5+ subtasks)
✅ -20-30% token usage (prevents re-explaining context)
✅ +50% observability (clear progress tracking)
✅ Error context persistence (retry loops retain error history)

Context-Aware Step Injection (Phase 1.2)

Problem: When a plan has 10+ subtasks, injecting the entire plan and all logs wastes tokens and dilutes attention on the current step.

Solution: Two-layer "active window" injection that shows only relevant context:

Hook layer (workflow-context-injector.py PreToolUse hook):
- Fires on every Edit/Write/significant Bash command
- Injects ≤500 char reminder: goal + current subtask title + progress
- Uses load_goal_and_title() to extract goal from task_plan.md and title from blueprint.json
- Graceful fallback to original format when blueprint missing
Actor prompt layer (map-efficient.md ACTOR phase):
- Fires once per subtask when Actor agent is spawned
- Injects structured <map_context> block (target: ≤4 000 tokens, best-effort) containing:
  - # Goal — one sentence from task_plan.md
  - # Current Subtask — full AAG contract, affected files, validation criteria
  - # Plan Overview — all subtasks as one-liners with [x]/[ ]/[>>] status markers
  - # Upstream Results — only results from dependency subtasks (from step_state.json subtask_results)
  - # Repo Delta — files changed since last subtask (via git diff from last_subtask_commit_sha)
- Built by build_context_block() in map_step_runner.py

Key data sources:

blueprint.json — subtask metadata (deps, files, criteria). Single source of truth.
step_state.json — subtask_results dict (per-subtask files_changed + status), last_subtask_commit_sha
task_plan.md — goal text only (never parsed for structured data)

Benefits:

30-60% fewer tokens in system prompt on long workflows
Actor focuses on current subtask criteria, not future steps
Dependency results passed explicitly — no re-reading completed files

Compaction Resilience

Problem: Context compaction (conversation history clearing) would normally lose workflow state, forcing restart from scratch.

Solution: File-based persistence architecture where all workflow state persists to disk, surviving compaction.

Architecture:

Filesystem (persists forever)           Conversation Memory (clears on compaction)
─────────────────────────────           ─────────────────────────────────────────
.map/
├── current_plan.json                   ← Structured state
│   ├── task_id, goal                   ← NEVER lost
│   ├── subtasks[]
│   │   ├── id, description
│   │   ├── status (pending/in_progress/completed)
│   │   ├── iterations, errors
│   │   └── depends_on[]
│   └── current_subtask_id
│
├── progress.md                         ← Workflow checkpoint
│   ├── YAML frontmatter (machine state)
│   └── Markdown body (human-readable)
│
├── task_plan_*.md                      ← Task decomposition
│   └── Subtasks with validation criteria
│
└── dev_docs/
    ├── context.md                      ← Project-specific context
    └── tasks.md                        ← Auto-generated task list

Persistence Mechanism:

Automatic Saves (every workflow step):
- Status changes automatically update .map/progress.md
- WorkflowState class handles serialization/deserialization

Recovery Workflow (after compaction):

User: /map-resume

Claude: ## Found Incomplete Workflow
        Progress: 3/5 completed
        Resume from last checkpoint? [Y/n]

User: Y

Claude: Resuming workflow from ST-004...
        [continues Actor→Monitor loop]

Why This Works:

Storage Type	Compaction Effect	MAP's Choice
Conversation memory	❌ Cleared	Not used for state
File system (.map/)	✅ Persists	Used for all state
Automatic updates	✅ Always current	No manual checkpointing

Comparison to Manual Approaches:

Manual checkpointing (e.g., "/update-dev-docs"): Requires user to remember command before compaction. Risk of forgetting.
MAP's approach: Automatic persistence with optional checkpoint command for guidance. Zero cognitive load.

Benefits:

✅ Zero data loss - All progress persists across compactions
✅ Automatic - No manual checkpointing required
✅ Always current - Files update on every status change
✅ Cross-session - Resume in any new conversation

Implementation:

Checkpoint: .map/progress.md (YAML frontmatter + markdown body)
Task plan: .map/<branch>/task_plan_*.md (subtask decomposition with validation criteria)
Recovery: /map-resume command (detects checkpoint and offers to resume)

Automatic Recovery (Phase 2)

Problem: Manual recovery (Phase 1) requires users to reference checkpoint files after compaction, adding cognitive load and causing 60% workflow abandonment rate.

Solution: /map-resume command detects .map/progress.md checkpoint and offers to resume incomplete workflow with a simple Y/n prompt.

Architecture:

User runs /map-resume command
        ↓
Command checks .map/progress.md existence
        ↓
    [Checkpoint exists?]
        ↓ Yes
    Parse YAML frontmatter for workflow state
        ↓
    Display progress summary:
    - Task plan
    - Completed subtasks (with checkmarks)
    - Remaining subtasks
        ↓
    Prompt: "Resume from last checkpoint? [Y/n]"
        ↓
    [User confirms?]
        ↓ Yes
    Load task plan from .map/<branch>/task_plan_*.md
        ↓
    Continue Actor→Monitor loop for remaining subtasks
        ↓
    [Workflow continues from checkpoint]

Implementation:

Component	Location	Purpose
Resume command	`.claude/commands/map-resume.md`	User-facing recovery workflow
WorkflowState class	`src/mapify_cli/workflow_state.py`	Checkpoint serialization/deserialization
Checkpoint file	`.map/progress.md`	YAML frontmatter + markdown progress
Task plan	`.map/<branch>/task_plan_*.md`	Subtask decomposition with validation
Unit tests	`tests/test_workflow_state.py`	WorkflowState logic coverage

Execution Flow:

User runs /map-resume - Explicit recovery command (no auto-injection)
Command checks checkpoint - Tests if .map/progress.md exists
YAML frontmatter parsed - WorkflowState.load() extracts machine state
Progress summary displayed - Shows completed/remaining subtasks
User confirms Y/n - Simple prompt, Y resumes, n clears checkpoint
Task plan loaded - Full decomposition with validation criteria
Workflow resumes - Actor→Monitor loop continues from last incomplete subtask

Security Validation (Defense-in-Depth):

All validation layers use AND logic - checkpoint must pass all 4 layers to be injected.

Layer 1: Path Traversal Prevention

Rationale: Prevent attackers from injecting arbitrary files (e.g., ../../../etc/passwd)

Implementation:

# Resolve to absolute path (handles .., symlinks)
resolved = Path(file_path).resolve()
base_path = Path(".map").resolve()

# Security check: Ensure resolved path is within .map/
if not resolved.is_relative_to(base_path):
    return {"valid": False, "error": "Path traversal detected"}

Rejects:

Absolute paths outside .map/
Symlinks pointing outside .map/
Relative paths with ../ escaping .map/

Layer 2: Size Bomb Protection

Rationale: Prevent memory exhaustion attacks via multi-GB files

Implementation:

MAX_FILE_SIZE_BYTES = 256 * 1024  # 256KB

# Check size BEFORE reading into memory
size_bytes = file_path.stat().st_size

if size_bytes > MAX_FILE_SIZE_BYTES:
    return {"valid": False, "error": f"File too large: {size_kb}KB exceeds 256KB limit"}

Performance: File size check completes in <0.05s without loading file content

Layer 3: UTF-8 Validation

Rationale: Prevent binary file injection (executables, images, malformed text)

Implementation:

# Strict UTF-8 decoding - raises UnicodeDecodeError on invalid bytes
content = file_path.read_text(encoding='utf-8', errors='strict')

Rejects:

Binary files (executables, images)
Non-UTF-8 encoded text
Files with invalid byte sequences

Layer 4: Content Sanitization

Rationale: Prevent terminal injection via ANSI escape codes and control characters

Implementation:

# Regex strips control characters except newlines (\n) and tabs (\t)
CONTROL_CHAR_PATTERN = re.compile(r'[\x00-\x08\x0b-\x0d\x0e-\x1f\x7f\u0080-\u009f\u2028\u2029]')

sanitized = CONTROL_CHAR_PATTERN.sub('', content)

Removes:

NULL bytes (\x00)
ANSI escape codes (\x1b[...)
Carriage returns (\r) for terminal safety
Unicode control characters (\u2028, \u2029)

Preserves:

Newlines (\n) - Required for markdown formatting
Tabs (\t) - Required for code indentation

Bash Hook Limitations:

Claude Code hooks run in subprocess with restricted capabilities:

Capability	Available?	Workaround
MCP tool access	❌ No	Hooks can't call MCP tools like `sequential-thinking`
Python imports	❌ No	Must call separate Python script via subprocess
Async operations	❌ No	Synchronous execution only (5s timeout)
External scripts	✅ Yes	Can call `python3`, `jq`, bash utilities
Filesystem access	✅ Yes	Direct read/write to `.map/` directory

Why no MCP tools? Hooks execute in isolated subprocess without access to Claude Code's MCP server connections. Use helpers for complex logic.

Performance Characteristics:

Metric	Typical	Maximum	Notes
Total execution time	<0.5s	5s	Hook timeout enforced by Claude Code
Validation overhead	~0.1s	0.2s	4-layer security checks
File I/O	<0.05s	0.1s	Read 256KB checkpoint file
JSON parsing	<0.01s	0.02s	Parse validator output with `jq`

Test Results (64 total tests):

✅ 41 unit tests (validation logic) - 95% coverage
✅ 23 integration tests (end-to-end hook) - All pass
✅ Security tests: Path traversal, size bombs, control characters, UTF-8 errors
✅ Performance tests: <0.5s for 5KB checkpoint, <1s for 256KB checkpoint

Integration with .map/ Persistence:

Without Recovery vs With /map-resume:

Without Recovery                       With /map-resume
────────────────                       ────────────────
Context exhausted                      Context exhausted
        ↓                                      ↓
Workflow state lost                    .map/progress.md persists
        ↓                                      ↓
Start over from scratch                User runs /map-resume
        ↓                                      ↓
Re-explain everything                  Checkpoint parsed
        ↓                                      ↓
[Workflow abandoned]                   Progress summary shown
                                               ↓
                                       User confirms Y/n
                                               ↓
                                       [Workflow continues]

Key Differences:

Aspect	Phase 1 (Manual)	Phase 2 (Automatic)
User action required	✅ Yes (copy/paste paths)	❌ No (zero-touch)
Cognitive load	Medium (remember 3 file paths)	Zero (invisible)
Error prone	Yes (typos, wrong files)	No (validated automatically)
Workflow abandonment	~30% (users forget)	~5% (edge cases only)
Time to resume	30-60s (manual steps)	0s (instant)

Benefits:

✅ Zero cognitive load - Users never think about compaction recovery
✅ Seamless UX - Invisible to users, "just works" experience
✅ Secure by design - 4-layer validation prevents all known attack vectors
✅ Always current - Reads latest checkpoint (auto-saved by Phase 1)
✅ Non-blocking - Hook failures don't prevent session start (exit 0)
✅ Observable - Logs to stderr for debugging ([session-start] ...)
✅ Tested - 64 tests with >90% coverage

Failure Modes & Handling:

All failures are non-blocking - hook returns {"continue": true} and logs error to stderr:

Failure Scenario	Hook Behavior	User Impact
No checkpoint file	Skip injection, continue	None (new session, expected)
Validator script missing	Skip injection, continue	None (fallback to Phase 1 manual)
Path traversal detected	Reject file, continue	None (security protection)
File too large (>256KB)	Reject file, continue	None (size bomb protection)
Invalid UTF-8 encoding	Reject file, continue	None (binary file protection)
Control characters found	Sanitize + inject	None (transparent cleanup)
Validator crashes	Skip injection, continue	None (error logged to stderr)

Design Principle: Session start must always succeed. Security validation prevents injection of malicious content, but never blocks users from starting new sessions.

References:

User research: Reddit feedback analysis showing 60% manual recovery confusion rate
Implementation: Phase 2 addresses Monitor finding: "Missing compaction recovery workflow docs"

Workflow Logging (Phase 1.2)

Problem: Debugging failed workflows requires manual correlation of agent outputs.

Solution: Structured logging with workflow context in .map/workflow_logs/.

Log Format:

Note: subtask_id is an integer (not string) matching the id field from TaskDecomposer output. TaskDecomposer generates subtask IDs as sequential integers: 1, 2, 3, etc.

{
  "task_id": "feat_auth_20251023_143022",
  "goal": "Implement JWT authentication",
  "start_time": "2025-10-23T14:30:22Z",
  "subtasks": [
    {
      "subtask_id": 1,
      "description": "Create User model",
      "status": "completed",
      "iterations": 1,
      "agents": {
        "actor": {
          "start_time": "2025-10-23T14:30:25Z",
          "end_time": "2025-10-23T14:31:10Z",
          "duration_seconds": 45,
          "output_summary": "Generated User model with password hashing"
        },
        "monitor": {
          "validation_passed": true,
          "issues": []
        },
        "evaluator": {
          "overall_score": 8.5,
          "approved": true
        }
      }
    }
  ]
}

Implementation:

Class: MapWorkflowLogger (246 lines)
Location: scripts/utils/map_workflow_logger.py

API:

logger = MapWorkflowLogger(task_id, goal)
logger.start_subtask(subtask_id, description)
logger.log_agent_output(agent_name, output)
logger.complete_subtask(subtask_id, status="completed")
logger.finalize()

Benefits:

✅ Post-mortem analysis of failures
✅ Performance benchmarking per agent
✅ Audit trail for compliance
✅ Metrics dashboard input

Template Optimization (Phase 1.3)

Problem: Verbose agent outputs waste tokens without adding value.

Changes:

Monitor: Reduced validation output verbosity (-9.6% tokens)
- Before: Full code review with line-by-line feedback
- After: Issue summaries with severity and category
Evaluator: Structured scoring format
- Before: Prose explanation of scores
- After: JSON scores + brief rationale

Results:

✅ 9.6% overall token reduction (Monitor, Evaluator)
✅ Maintained validation quality (no decrease in approval rates)
✅ Faster parsing of agent outputs

Context Engineering Roadmap

Phase 1 ✅ COMPLETED (2025-10-18):

RecitationManager (482 lines): Recitation Pattern for focus
MapWorkflowLogger (246 lines): Detailed workflow logging
Pattern limit=5: Limit retrieved patterns
Template Optimization: Optimize verbose outputs (-9.6% tokens)

Phase 1 Results:

✅ 9.6% reduction in token usage (Monitor, Evaluator templates)
✅ Documentation-driven orchestration architecture
✅ 728 lines of new infrastructure

Phase 2 (Prioritized):

Checkpoints (high impact) — Workflow resumption after interruption
MCP caching (medium-high) — Latency reduction for MCP servers
Keyword+semantic search (medium) — Hybrid retrieval accuracy
Pattern variation (low-medium) — Few-shot bias reduction

Phase 3-4: Parallelism, auto-testing, temperature per agent

Research Foundation:

"Context Engineering for AI Agents" (Manus.im, 2025)

Success Metrics

Target KPIs:

Monitor approval rate: >80% first try (current: varies by task complexity)
Evaluator scores: average >7.0/10 (approval threshold)
Iteration count: <3 per subtask (indicates clear feedback)
Knowledge growth: increasing high-quality patterns over time

Tracking:

# View metrics dashboard
python scripts/analyze-metrics.py

# Check specific workflow
cat .map/workflow_logs/feat_auth_20251023_143022.json | jq '.subtasks[].agents.evaluator.overall_score'

References

For usage examples and best practices, see USAGE.md. For installation and setup, see README.md.

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

MAP Framework Architecture

Table of Contents

Architecture Overview

High-Level Design

Orchestration Model

Agent Coordination Protocol

.map/ Artifact Specifications

1. State Artifact (state_<branch>.json)

2. Verification Results Artifact (verification_results_<branch>.json)

3. Repo Insight Artifact (repo_insight_<branch>.json)

Schema Cross-Reference

Workflow Variants

1. /map-efficient - Optimized Pipeline (4-6 Agents) ⭐ RECOMMENDED

2. /map-fast - Minimal Pipeline (3 Agents) ⚠️

4. /map-debug - Debugging Workflow (5 Agents)

5. /map-review - Interactive Code Review (3 Agents)

6. /map-release - Release Workflow (No Agents)

7. /map-learn - Post-Workflow Learning (1 Agent)

Token Breakdown by Agent

Workflow Variant Selection

Hook-Based Context Injection (v2.0.0+)

Architecture

Key Innovation: Constant Reminders

Results

Token Economics

Implementation Details

Migration Guide (v1.x → v2.0.0)

Agent Specifications

1. TaskDecomposer

2. Actor

3. Monitor

4. Predictor

5. Evaluator

6. Reflector

7. DocumentationReviewer

8. Synthesizer

9. ResearchAgent

11. FinalVerifier

MCP Integration

Overview

Available MCP Servers

Configuration

Project-Specific Configuration

MCP Tool Usage Patterns

Pattern 1: Documentation Lookup (Actor)

MCP Server Availability

Performance Considerations

Customization Guide

Modifying Agent Prompts

Safe Modifications

Unsafe Modifications

Template Variable Reference

Model Selection Per Agent

Adding Custom Agents

Adapting to Project Conventions

Template Variables in Custom Agents

Template Maintenance

Template Validation

Git Pre-Commit Hook

Template Versioning

MCP Patterns Reference

Updating Strategies

Context Engineering

Recitation Pattern (Phase 1.1)

Context-Aware Step Injection (Phase 1.2)

Compaction Resilience

Automatic Recovery (Phase 2)

Workflow Logging (Phase 1.2)

Template Optimization (Phase 1.3)

Context Engineering Roadmap

Success Metrics

References

1. State Artifact (`state_<branch>.json`)

2. Verification Results Artifact (`verification_results_<branch>.json`)

3. Repo Insight Artifact (`repo_insight_<branch>.json`)

1. `/map-efficient` - Optimized Pipeline (4-6 Agents) ⭐ RECOMMENDED

2. `/map-fast` - Minimal Pipeline (3 Agents) ⚠️

4. `/map-debug` - Debugging Workflow (5 Agents)

5. `/map-review` - Interactive Code Review (3 Agents)

6. `/map-release` - Release Workflow (No Agents)

7. `/map-learn` - Post-Workflow Learning (1 Agent)