Skip to content

Latest commit

 

History

History
2375 lines (1878 loc) · 85.3 KB

File metadata and controls

2375 lines (1878 loc) · 85.3 KB

Plan Mode Integration - Implementation Milestones

Implementation Milestones

Version: 1.0 Based On: PRD.md and TDD.md (Plan Mode Integration) Target: Deepen Plan Mode integration with quality guardrails and rich context Prerequisites: Phases 0-6 complete (v2.9.20+)


Table of Contents

  1. Executive Summary
  2. Phase 7: Foundation - Plan Mode Detection & Hook Infrastructure
  3. Phase 8: MCP Context Integration
  4. Phase 9: Quality Guardrails Framework
  5. Phase 10: Auto-Revision System
  6. Phase 11: Prompt Augmentation & Exploration Hints
  7. Phase 12: Plan QA Verification
  8. Phase 13: Polish, Testing & Documentation
  9. Phase 14: MCP Server Enhancement - Testing Foundation
  10. Phase 15: MCP Server Code Quality & Documentation
  11. Phase 16: MCP Server CI/CD Completion
  12. Phase 17: MCP Server Error Handling & Security
  13. Phase 18: MCP Server Resilience & Resource Management
  14. Phase 19: MCP Server Structured Logging
  15. Appendix: Rule Specifications

Executive Summary

Vision

Enhance Claude Code's Plan Mode to produce implementation plans that are thorough, context-aware, and aligned with project best practices. The system proactively catches architectural/design issues during planning and ensures plans include necessary tasks (testing, documentation) while avoiding redundant or ill-advised changes.

Key Design Decisions

  1. Hook into Claude Code's Plan Mode - Use hooks/MCP to enhance native Plan Mode behavior
  2. Auto-revise plans - Automatically add missing tasks before showing to user
  3. Full MCP context - Code index, design docs, and issue tracking (Linear, GitHub)
  4. Hybrid parallelism - Inject prompts that guide Claude Code's sub-agent exploration

Current State (Prerequisites Complete)

Component Status Notes
CLI Infrastructure v2.9.20 Full claude-indexer with hooks
MCP Server Complete 8+ tools, streaming responses
Hook Framework Complete SessionStart, UserPromptSubmit, PreToolUse, PostToolUse
Memory Guard v4.3 Complete 27 pattern checks
UI Consistency Complete 15+ rules, 3-tier architecture

Success Metrics (from PRD)

  • Plan Completeness: >90% of plans include test/doc tasks when appropriate
  • Duplicate Detection: Existing code reuse suggested in >80% of applicable cases
  • Plan Approval Rate: <10% of plans require user revision before approval
  • User Confidence: Reviewers have minimal additional pointers to add
  • Performance: <100ms overhead for plan augmentation

Phase 7: Foundation - Plan Mode Detection & Hook Infrastructure

Goal: Establish the infrastructure for detecting Plan Mode activation and intercepting plan generation.

Milestone 7.1: Plan Mode Detection

Objective: Reliably detect when Claude Code enters Plan Mode.

Tasks

ID Task Priority Status
7.1.1 Create hooks/plan_mode_detector.py with pattern detection HIGH DONE
7.1.2 Implement explicit marker detection (@agent-plan, @plan) HIGH DONE
7.1.3 Add planning keyword detection (confidence scoring) MEDIUM DONE
7.1.4 Implement environment variable detection (CLAUDE_PLAN_MODE) MEDIUM DONE
7.1.5 Create session state tracking for Plan Mode persistence MEDIUM DONE
7.1.6 Add unit tests for detection accuracy (>95% target) HIGH DONE

Detection Patterns:

# Explicit markers (1.0 confidence)
EXPLICIT_PATTERNS = r'@agent-plan|@plan\b|--plan\b|plan\s*mode'

# Planning keywords (0.7 confidence)
PLANNING_KEYWORDS = r'\b(create|make|write|design|implement)\s+(a\s+)?plan\b'

Testing Requirements:

  • Unit tests for each detection method
  • Test with real Claude Code Plan Mode sessions
  • Verify <10ms detection latency

Success Criteria:

  • >95% detection accuracy for Plan Mode (achieved: 100% on 30-case benchmark)
  • <10ms detection latency (achieved: <1ms average)
  • Zero false positives on non-plan prompts

Milestone 7.2: Hook Infrastructure Extension

Objective: Extend existing hook system to support Plan Mode interception.

Tasks

ID Task Priority Status
7.2.1 Extend UserPromptSubmit hook for Plan Mode detection HIGH DONE
7.2.2 Create plan context injection mechanism HIGH DONE
7.2.3 Implement hook chaining for plan augmentation MEDIUM DONE
7.2.4 Add Plan Mode state to SessionContext MEDIUM DONE
7.2.5 Create PlanModeContext dataclass for state tracking MEDIUM DONE
7.2.6 Update .claude/settings.json template for Plan hooks LOW DONE

Implementation Details (Milestone 7.2 Complete):

  • Created claude_indexer/hooks/planning/ package with:
    • guidelines.py - PlanningGuidelinesGenerator (<20ms)
    • exploration.py - ExplorationHintsGenerator (<30ms)
    • injector.py - PlanContextInjector (<50ms total)
  • Modified hooks/prompt_handler.py to inject guidelines and hints
  • Configuration via CLAUDE_PLAN_MODE_CONFIG env var or CLAUDE_PLAN_MODE_COMPACT

Hook Flow:

UserPromptSubmit Hook
       |
       +---> Plan Mode Detection
       |        |
       |        +---> (Yes) -> Inject Planning Guidelines
       |        |               |
       |        |               +---> Generate Exploration Hints
       |        |               |
       |        |               +---> Enable Plan QA Verification
       |        |
       |        +---> (No) -> Pass through unchanged
       |
       +---> Continue to Claude

Testing Requirements:

  • Test hook invocation order
  • Test state persistence across turns
  • Verify non-blocking behavior

Documentation:

  • Update HOOKS.md with Plan Mode hooks
  • Add configuration examples

Success Criteria:

  • Hooks execute in correct order
  • State persists correctly (via SessionContext.plan_mode)
  • <20ms hook overhead (guidelines <20ms, hints <30ms, total <50ms)

Phase 8: MCP Context Integration

Goal: Enable Plan Mode to query rich context via MCP - code index, design docs, and issue trackers.

Milestone 8.1: Design Document Indexing

Objective: Index and search design documents (PRD, TDD, ADR, specs).

Tasks

ID Task Priority Status
8.1.1 Create new EntityTypes: SPEC, PRD, TDD, ADR, REQUIREMENT HIGH DONE
8.1.2 Implement DesignDocParser in claude_indexer/analysis/ HIGH DONE
8.1.3 Add document type detection (patterns in filename/content) HIGH DONE
8.1.4 Implement section extraction from markdown MEDIUM DONE
8.1.5 Extract individual requirements as separate entities MEDIUM DONE
8.1.6 Create relations between docs and code components MEDIUM DONE
8.1.7 Add design doc paths to configuration LOW DONE

Entity Types:

class EntityType(Enum):
    # Existing types...
    SPEC = "spec"                    # Design specifications
    PRD = "prd"                      # Product requirements documents
    TDD = "tdd"                      # Technical design documents
    ADR = "adr"                      # Architecture decision records
    REQUIREMENT = "requirement"      # Individual requirements from specs

DesignDocParser:

class DesignDocParser(CodeParser):
    """Parser for design documents (PRD, TDD, ADR, specs)."""

    DOC_TYPE_PATTERNS = {
        "prd": [r"product\s+requirements?\s+document", r"^prd"],
        "tdd": [r"technical\s+design\s+document", r"^tdd"],
        "adr": [r"architecture\s+decision\s+record", r"^adr"],
        "spec": [r"specification", r"^spec"],
    }

    def parse(self, file_path: Path) -> ParserResult:
        # Extract sections, requirements, decisions
        ...

Configuration (in unified_config.py):

class DesignDocsConfig(BaseModel):
    enabled: bool = True
    paths: list[str] = ["docs/", "specs/", "design/", "*.md"]
    doc_patterns: dict[str, str] = {
        "prd": "**/PRD*.md",
        "tdd": "**/TDD*.md",
        "adr": "**/adr/*.md",
    }

Testing Requirements:

  • Test document type detection with various formats
  • Test section extraction from real PRD/TDD files
  • Test requirement entity creation
  • Verify relations to code components

Success Criteria:

  • Auto-detect document type with >90% accuracy
  • Extract sections and requirements correctly
  • Create searchable entities for all design docs

Implementation Details (Milestone 8.1 Complete):

  • Created claude_indexer/analysis/design_doc_parser.py with DesignDocParser
  • Added 5 new EntityTypes: SPEC, PRD, TDD, ADR, REQUIREMENT
  • Added DesignDocsConfig to claude_indexer/config/unified_config.py
  • Document type detection via filename patterns and content patterns
  • Section extraction respects configurable max_section_depth
  • Requirement extraction supports RFC 2119 (MUST/SHALL/SHOULD/MAY), [REQ-XXX], and numbered lists
  • Relations created between documents, sections, and requirements
  • Tests: 30 unit tests covering all functionality

Milestone 8.2: New MCP Tools for Documents

Objective: Add MCP tools for searching and retrieving design documents.

Tasks

ID Task Priority Status
8.2.1 Add search_docs tool to MCP server HIGH DONE
8.2.2 Add get_doc tool for full document retrieval HIGH DONE
8.2.3 Implement docTypes filtering (prd, tdd, adr, spec) MEDIUM DONE
8.2.4 Add section-specific retrieval LOW DONE
8.2.5 Create TypeScript interfaces for doc tools MEDIUM DONE
8.2.6 Add validation for doc tool inputs MEDIUM DONE

MCP Tool: search_docs:

{
  name: "search_docs",
  description: "Search design documents, specifications, PRDs, and ADRs",
  inputSchema: {
    type: "object",
    properties: {
      query: { type: "string", description: "Search query" },
      docTypes: {
        type: "array",
        items: { type: "string" },
        description: "Filter: prd, tdd, spec, adr"
      },
      limit: { type: "number", default: 10 }
    },
    required: ["query"]
  }
}

MCP Tool: get_doc:

{
  name: "get_doc",
  description: "Retrieve full content of a specific design document",
  inputSchema: {
    type: "object",
    properties: {
      docId: { type: "string", description: "Document ID or file path" },
      section: { type: "string", description: "Optional: specific section" }
    },
    required: ["docId"]
  }
}

Testing Requirements:

  • Test search with various queries
  • Test filtering by document type
  • Test full document retrieval
  • Benchmark search latency (<50ms)

Success Criteria:

  • Both tools implemented and tested
  • Filter by document type works correctly
  • <50ms search latency

Implementation Details (Milestone 8.2 Complete):

  • Created DocSearchResult and DocContent interfaces in mcp-qdrant-memory/src/types.ts
  • Added SearchDocsRequest and GetDocRequest interfaces in validation.ts
  • Implemented validateSearchDocsRequest and validateGetDocRequest validators
  • Added searchDocs() and getDoc() methods to QdrantPersistence class
  • Added search_docs and get_doc tool definitions to MCP server
  • Supports docTypes filtering: prd, tdd, adr, spec
  • Section-specific retrieval via section parameter
  • Multi-project support via collection parameter
  • Returns sections sorted by line number, requirements with type classification

Milestone 8.3: Issue Tracker Integration

Objective: Enable querying tickets from Linear and GitHub Issues.

Tasks

ID Task Priority Status
8.3.1 Create claude_indexer/integrations/ package HIGH DONE
8.3.2 Implement LinearClient with GraphQL queries HIGH DONE
8.3.3 Implement GitHubIssuesClient with REST API HIGH DONE
8.3.4 Create TicketEntity data model HIGH DONE
8.3.5 Add search_tickets MCP tool HIGH DONE
8.3.6 Add get_ticket MCP tool with comments/PRs HIGH DONE
8.3.7 Implement authentication configuration MEDIUM DONE
8.3.8 Add rate limiting and caching MEDIUM DONE
8.3.9 Create ticket sync service (background) LOW DEFERRED

TicketEntity Data Model:

@dataclass
class TicketEntity:
    id: str                           # e.g., "AVO-123", "github#456"
    source: str                       # "linear", "github"
    title: str
    description: str
    status: str                       # "open", "in_progress", "done"
    assignee: str | None
    labels: list[str]
    priority: str | None
    acceptance_criteria: list[str]    # Extracted requirements
    linked_prs: list[str]             # PR references

    def to_entity(self) -> Entity:
        """Convert to standard Entity for indexing."""
        ...

LinearClient:

class LinearClient:
    BASE_URL = "https://api.linear.app/graphql"

    async def search_issues(
        self,
        query: str,
        status: list[str] | None = None,
        labels: list[str] | None = None,
        limit: int = 20
    ) -> list[TicketEntity]:
        ...

    async def get_issue(self, identifier: str) -> TicketEntity:
        ...

MCP Tool: search_tickets:

{
  name: "search_tickets",
  description: "Search issue tracker for relevant tickets (Linear, GitHub)",
  inputSchema: {
    type: "object",
    properties: {
      query: { type: "string" },
      status: { type: "array", items: { type: "string" } },
      labels: { type: "array", items: { type: "string" } },
      source: { type: "string", enum: ["linear", "github", "all"] }
    },
    required: ["query"]
  }
}

Configuration:

class LinearConfig(BaseModel):
    enabled: bool = False
    api_key: str = ""  # From LINEAR_API_KEY env var
    team_id: str = ""

class GitHubIssuesConfig(BaseModel):
    enabled: bool = False
    token: str = ""  # From GITHUB_TOKEN env var
    owner: str = ""
    repo: str = ""

Testing Requirements:

  • Test Linear API integration (mock responses)
  • Test GitHub API integration (mock responses)
  • Test authentication handling
  • Test rate limiting

Documentation:

  • API key configuration guide
  • Ticket search examples

Success Criteria:

  • Both integrations work with API keys
  • Search returns relevant tickets
  • Acceptance criteria extracted correctly

Milestone 8.4: Plan Mode Tool Access Control

Objective: Ensure Plan Mode only has read-only access to MCP tools.

Tasks

ID Task Priority Status
8.4.1 Create PlanModeGuard class in MCP server HIGH DONE
8.4.2 Define allowed tools list for Plan Mode HIGH DONE
8.4.3 Define blocked tools list (write operations) HIGH DONE
8.4.4 Integrate guard into MCP request handler HIGH DONE
8.4.5 Add set_plan_mode internal tool MEDIUM DONE
8.4.6 Create clear error messages for blocked tools MEDIUM DONE

PlanModeGuard:

const PLAN_MODE_ALLOWED = [
  // Read-only code memory
  "search_similar", "read_graph", "get_implementation",
  // Read-only documents
  "search_docs", "get_doc",
  // Read-only tickets
  "search_tickets", "get_ticket",
];

const PLAN_MODE_BLOCKED = [
  // Write operations
  "create_entities", "create_relations", "add_observations",
  "delete_entities", "delete_observations", "delete_relations",
];

class PlanModeGuard {
  private isPlanMode: boolean = false;

  setPlanMode(enabled: boolean): void { ... }

  isToolAllowed(toolName: string): boolean {
    if (!this.isPlanMode) return true;
    return !PLAN_MODE_BLOCKED.includes(toolName);
  }
}

Testing Requirements:

  • Test tool blocking in Plan Mode
  • Test allowed tools work correctly
  • Verify error messages are clear

Success Criteria:

  • Write tools blocked in Plan Mode
  • Clear error messages for blocked operations
  • No security bypass possible

Implementation Details (Milestone 8.4 Complete):

  • Created mcp-qdrant-memory/src/plan-mode-guard.ts with PlanModeGuard class
  • Environment variable detection via CLAUDE_PLAN_MODE (matches Python implementation)
  • Blocked tools: create_entities, create_relations, add_observations, delete_entities, delete_observations, delete_relations
  • Allowed tools: search_similar, read_graph, get_implementation, search_docs, get_doc, search_tickets, get_ticket, set_plan_mode
  • Added set_plan_mode MCP tool to enable/disable Plan Mode
  • Error responses include blocked tools list and hint for resolution
  • Version bumped to 0.6.4

Phase 9: Quality Guardrails Framework

Goal: Create the framework for validating implementation plans against quality rules.

Milestone 9.1: Core Guardrails Data Model

Objective: Define data structures for plan validation rules.

Tasks

ID Task Priority Status
9.1.1 Create claude_indexer/ui/plan/guardrails/ package HIGH DONE
9.1.2 Define PlanValidationContext dataclass HIGH DONE
9.1.3 Define PlanValidationFinding dataclass HIGH DONE
9.1.4 Define PlanRevision and RevisionType HIGH DONE
9.1.5 Create abstract PlanValidationRule base class HIGH DONE
9.1.6 Create PlanGuardrailConfig for rule configuration MEDIUM DONE

Core Data Structures:

# claude_indexer/ui/plan/guardrails/base.py

class RevisionType(Enum):
    ADD_TASK = "add_task"
    MODIFY_TASK = "modify_task"
    REMOVE_TASK = "remove_task"
    ADD_DEPENDENCY = "add_dependency"
    REORDER_TASKS = "reorder_tasks"

@dataclass
class PlanValidationContext:
    plan: ImplementationPlan
    memory_client: Any | None = None
    collection_name: str | None = None
    project_path: Path = field(default_factory=Path.cwd)
    config: PlanGuardrailConfig = None
    source_requirements: str = ""

    def search_memory(self, query: str, **kwargs) -> list[dict]:
        """Search semantic memory for similar code/patterns."""
        ...

@dataclass
class PlanValidationFinding:
    rule_id: str
    severity: Severity
    summary: str
    affected_tasks: list[str] = field(default_factory=list)
    suggestion: str | None = None
    can_auto_revise: bool = False
    confidence: float = 1.0
    evidence: list[Evidence] = field(default_factory=list)
    suggested_revision: PlanRevision | None = None

@dataclass
class PlanRevision:
    revision_type: RevisionType
    rationale: str
    target_task_id: str | None = None
    new_task: Task | None = None
    modifications: dict[str, Any] = field(default_factory=dict)
    dependency_additions: list[tuple[str, str]] = field(default_factory=list)

class PlanValidationRule(ABC):
    @property
    @abstractmethod
    def rule_id(self) -> str: ...

    @property
    @abstractmethod
    def category(self) -> str: ...  # coverage, consistency, architecture, performance

    @abstractmethod
    def validate(self, context: PlanValidationContext) -> list[PlanValidationFinding]: ...

    @abstractmethod
    def suggest_revision(self, finding: PlanValidationFinding, context: PlanValidationContext) -> PlanRevision | None: ...

Testing Requirements:

  • Unit tests for all data classes
  • Test serialization/deserialization
  • Test validation context memory search

Success Criteria:

  • All data structures defined and tested
  • Follows existing pattern from claude_indexer/rules/base.py
  • Memory search integration works

Implementation Details (Milestone 9.1 Complete):

  • Created claude_indexer/ui/plan/guardrails/ package with base.py, config.py, init.py
  • RevisionType enum with ADD_TASK, MODIFY_TASK, REMOVE_TASK, ADD_DEPENDENCY, REORDER_TASKS
  • PlanRevision dataclass with serialization support
  • PlanValidationFinding dataclass with evidence and suggested revision
  • PlanValidationContext with plan, config, memory search integration
  • PlanValidationRule ABC following rules/base.py pattern
  • PlanGuardrailConfig Pydantic model with category toggles, auto-revise settings
  • Added plan_guardrails to UnifiedConfig
  • Tests: 52 unit tests covering all functionality

Milestone 9.2: Plan Guardrail Engine

Objective: Create the engine that orchestrates rule validation.

Tasks

ID Task Priority Status
9.2.1 Create PlanGuardrailEngine coordinator class HIGH DONE
9.2.2 Implement rule discovery (auto-load from directory) HIGH DONE
9.2.3 Add parallel rule execution support MEDIUM DONE
9.2.4 Implement severity filtering MEDIUM DONE
9.2.5 Create PlanGuardrailResult aggregation MEDIUM DONE
9.2.6 Add performance timing for rules LOW DONE

PlanGuardrailEngine:

class PlanGuardrailEngine:
    def __init__(self, config: PlanGuardrailConfig):
        self.config = config
        self.rules: dict[str, PlanValidationRule] = {}
        self._discover_rules()

    def _discover_rules(self) -> None:
        """Auto-discover rules from guardrails/rules/ directory."""
        ...

    def validate(self, context: PlanValidationContext) -> list[PlanValidationFinding]:
        """Run all enabled rules against the plan."""
        findings = []
        for rule_id, rule in self.rules.items():
            if self.config.is_rule_enabled(rule_id):
                findings.extend(rule.validate(context))
        return findings

    def auto_revise(
        self,
        plan: ImplementationPlan,
        findings: list[PlanValidationFinding]
    ) -> RevisedPlan:
        """Apply auto-revisions based on findings."""
        ...

Testing Requirements:

  • Test rule discovery
  • Test parallel execution (Milestone 13.5)
  • Test severity filtering
  • Benchmark validation latency

Success Criteria:

  • Rules auto-discovered from directory
  • Parallel execution reduces latency (Milestone 13.5)
  • <500ms total validation time

Implementation Details (Milestone 9.2 Complete):

  • Created claude_indexer/ui/plan/guardrails/engine.py with:
    • PlanGuardrailEngineConfig dataclass (timeout, error handling, confidence threshold)
    • RuleExecutionResult dataclass for individual rule results
    • PlanGuardrailResult dataclass with findings, statistics, error tracking
    • PlanGuardrailEngine class with full rule lifecycle management
  • Key features:
    • register() / unregister() for manual rule registration
    • discover_rules() for auto-discovery from directory
    • validate() runs all enabled rules with configurable filtering
    • validate_fast() runs only fast rules (<100ms) for sync checks
    • validate_category() runs rules in specific category
    • Confidence threshold filtering
    • Max findings per rule limiting
    • Error handling with continue_on_error option
    • Performance timing recorded per rule and total
  • Tests: 42 unit tests covering all functionality

Parallel Execution (Milestone 13.5):

  • Added parallel_execution flag to PlanGuardrailEngineConfig (default: False)
  • Added max_parallel_workers config option (default: 4)
  • Implemented _validate_parallel() using ThreadPoolExecutor
  • Extracted _validate_sequential() for the original behavior
  • validate() method now conditionally uses parallel or sequential execution
  • Tests: 8 additional tests for parallel execution behavior

Milestone 9.3: Plan Validation Rules (5 Rules)

Objective: Implement the core quality validation rules.

Tasks

ID Task Priority Rule Name Category Status
9.3.1 Test Requirement Detection HIGH PLAN.TEST_REQUIREMENT coverage DONE
9.3.2 Documentation Requirement Detection HIGH PLAN.DOC_REQUIREMENT coverage DONE
9.3.3 Duplicate Code Detection HIGH PLAN.DUPLICATE_DETECTION consistency DONE
9.3.4 Architectural Consistency Check MEDIUM PLAN.ARCHITECTURAL_CONSISTENCY architecture DONE
9.3.5 Performance Pattern Detection MEDIUM PLAN.PERFORMANCE_PATTERN performance DONE

Implementation Location: claude_indexer/ui/plan/guardrails/rules/

Rule 1: TestRequirementRule

class TestRequirementRule(PlanValidationRule):
    """Ensures new features have corresponding test tasks."""

    FEATURE_KEYWORDS = ["implement", "add", "create", "build", "develop", "new"]
    TEST_KEYWORDS = ["test", "spec", "unittest", "pytest", "jest"]
    TRIVIAL_PATTERNS = [r"^(fix|rename|move|delete)\s+(typo|comment|readme)"]

    @property
    def rule_id(self) -> str:
        return "PLAN.TEST_REQUIREMENT"

    def validate(self, context: PlanValidationContext) -> list[PlanValidationFinding]:
        findings = []
        for task in context.plan.all_tasks:
            if self._is_feature_task(task) and not self._has_test_task(task, context.plan):
                if not self._is_trivial(task):
                    findings.append(self._create_finding(task))
        return findings

    def suggest_revision(self, finding, context) -> PlanRevision:
        """Auto-add test task."""
        feature_task = self._get_task(finding.affected_tasks[0], context.plan)
        test_task = Task(
            id=f"TASK-TST-{feature_task.id[-4:]}",
            title=f"Add tests for {feature_task.title}",
            description=f"Write tests for {feature_task.title}",
            scope=feature_task.scope,
            priority=feature_task.priority + 1,
            estimated_effort="low",
            impact=feature_task.impact * 0.8,
            acceptance_criteria=["Unit tests cover main functionality", "Tests pass in CI"],
            dependencies=[feature_task.id],
            tags=["testing", "quality"],
        )
        return PlanRevision(
            revision_type=RevisionType.ADD_TASK,
            rationale=f"Feature '{feature_task.title}' needs test coverage",
            new_task=test_task,
        )

Rule 2: DuplicateDetectionRule

class DuplicateDetectionRule(PlanValidationRule):
    """Detects tasks that might duplicate existing functionality."""

    SIMILARITY_THRESHOLD = 0.70

    @property
    def rule_id(self) -> str:
        return "PLAN.DUPLICATE_DETECTION"

    def validate(self, context: PlanValidationContext) -> list[PlanValidationFinding]:
        findings = []
        if context.memory_client is None:
            return findings

        for task in context.plan.all_tasks:
            query = f"{task.title} {task.description[:200]}"
            results = context.search_memory(
                query=query,
                entity_types=["function", "class", "implementation_pattern"]
            )

            for result in results:
                if result.get("score", 0) >= self.SIMILARITY_THRESHOLD:
                    findings.append(self._create_finding(task, result))

        return findings

    def suggest_revision(self, finding, context) -> PlanRevision:
        """Modify task to reference existing code."""
        return PlanRevision(
            revision_type=RevisionType.MODIFY_TASK,
            target_task_id=finding.affected_tasks[0],
            rationale="Potential duplicate detected",
            modifications={
                "description": f"{task.description}\n\n**Note:** Review existing implementation before proceeding.",
                "acceptance_criteria": task.acceptance_criteria + [
                    f"Verified no duplication with existing code"
                ],
            }
        )

Testing Requirements:

  • Unit tests for each rule with positive/negative cases
  • Test auto-revision generation
  • Test with real implementation plans
  • Measure false positive rate (<10% target)

Documentation:

  • Rule reference documentation
  • Configuration examples
  • Override mechanisms

Success Criteria:

  • All 5 rules implemented and tested
  • <10% false positive rate
  • Clear, actionable findings

Implementation Details (Milestone 9.3 Complete):

  • Created claude_indexer/ui/plan/guardrails/rules/ package with 5 rules
  • TestRequirementRule: Detects feature tasks without test coverage, auto-suggests test tasks
  • DocRequirementRule: Detects user-facing changes without documentation tasks
  • DuplicateDetectionRule: Uses semantic memory search to find potential duplicate code
  • ArchitecturalConsistencyRule: Validates file paths against project patterns
  • PerformancePatternRule: Detects N+1 queries, missing caching, blocking operations, etc.
  • All rules follow PlanValidationRule ABC pattern
  • Updated guardrails/__init__.py with rule exports
  • Tests: 159 unit tests covering all 5 rules with positive/negative/edge cases

Phase 10: Auto-Revision System

Goal: Automatically apply revisions to plans based on validation findings.

Milestone 10.1: Auto-Revision Engine

Objective: Create the engine that applies plan revisions automatically.

Tasks

ID Task Priority Status
10.1.1 Create AutoRevisionEngine class HIGH DONE
10.1.2 Implement revision sorting by severity HIGH DONE
10.1.3 Implement conflict detection HIGH DONE
10.1.4 Add circular dependency prevention HIGH DONE
10.1.5 Implement revision application methods HIGH DONE
10.1.6 Create RevisedPlan result dataclass MEDIUM DONE
10.1.7 Add post-revision dependency resolution MEDIUM DONE
10.1.8 Add iteration limit (prevent infinite loops) HIGH DONE

AutoRevisionEngine:

@dataclass
class AppliedRevision:
    revision: PlanRevision
    finding: PlanValidationFinding
    success: bool
    error: str | None = None

@dataclass
class RevisedPlan:
    original_plan: ImplementationPlan
    revised_plan: ImplementationPlan
    revisions_applied: list[AppliedRevision] = field(default_factory=list)
    revisions_skipped: list[tuple[PlanRevision, str]] = field(default_factory=list)
    qa_passed: bool = False

class AutoRevisionEngine:
    MAX_ITERATIONS = 3

    def revise_plan(
        self,
        plan: ImplementationPlan,
        findings: list[PlanValidationFinding],
        rules: dict[str, PlanValidationRule]
    ) -> RevisedPlan:
        """
        Apply revisions based on findings.

        Algorithm:
        1. Sort findings by severity (CRITICAL > HIGH > MEDIUM > LOW)
        2. For each finding that can be auto-revised:
           a. Get revision suggestion from rule
           b. Check for conflicts
           c. Apply revision if safe
        3. Re-resolve dependencies
        4. Update priorities
        """
        ...

    def _check_conflicts(self, plan: ImplementationPlan, revision: PlanRevision) -> str | None:
        """Check if revision would create conflicts."""
        if revision.revision_type == RevisionType.ADD_TASK:
            if any(t.id == revision.new_task.id for t in plan.all_tasks):
                return f"Task ID '{revision.new_task.id}' already exists"

        if revision.revision_type == RevisionType.ADD_DEPENDENCY:
            for from_id, to_id in revision.dependency_additions:
                if self._would_create_cycle(plan, from_id, to_id):
                    return f"Would create circular dependency"

        return None

    def _would_create_cycle(self, plan, from_id, to_id) -> bool:
        """DFS to detect circular dependencies."""
        ...

Testing Requirements:

  • Test revision application for each type
  • Test conflict detection
  • Test circular dependency prevention
  • Test iteration limit enforcement

Success Criteria:

  • Revisions applied without conflicts
  • No circular dependencies introduced
  • Iteration limit prevents infinite loops

Implementation Details (Milestone 10.1 Complete):

  • Created claude_indexer/ui/plan/guardrails/auto_revision.py with:
    • AppliedRevision dataclass for tracking applied revisions
    • RevisedPlan dataclass with audit trail formatting
    • AutoRevisionEngine class with full revision lifecycle
  • Key features:
    • Revision sorting by severity (CRITICAL > HIGH > MEDIUM > LOW) and type order
    • Conflict detection for all RevisionTypes (ADD_TASK, MODIFY_TASK, REMOVE_TASK, ADD_DEPENDENCY, REORDER_TASKS)
    • Circular dependency detection using DFS algorithm
    • Post-revision dependency resolution (removes orphaned dependencies)
    • MAX_ITERATIONS = 3 to prevent infinite loops
    • Configurable via PlanGuardrailConfig (auto_revise, max_revisions_per_plan, revision_confidence_threshold)
    • Human-readable audit trail via format_audit_trail()
  • Tests: 55 unit tests covering all functionality

Milestone 10.2: Revision Audit Trail

Objective: Track all revisions for transparency and debugging.

Tasks

ID Task Priority Status
10.2.1 Add revision history to ImplementationPlan MEDIUM DONE
10.2.2 Create human-readable revision summary MEDIUM DONE
10.2.3 Add revision rollback capability LOW DONE
10.2.4 Implement revision persistence LOW DONE

Audit Trail Format:

## Plan Revisions Applied

### 1. Added Test Task (PLAN.TEST_REQUIREMENT)
- **Reason**: Feature 'Add user authentication' needs test coverage
- **Added**: TASK-TST-0001 "Add tests for user authentication"
- **Confidence**: 95%

### 2. Modified Task (PLAN.DUPLICATE_DETECTION)
- **Reason**: Potential duplicate of existing 'AuthService.login()'
- **Modified**: TASK-0002 description to reference existing code
- **Confidence**: 78%

Testing Requirements:

  • Test audit trail generation
  • Test revision summary formatting
  • Verify all revisions tracked

Success Criteria:

  • Complete audit trail for all revisions
  • Human-readable summaries
  • Transparency for user review

Implementation Details (Milestone 10.2 Complete):

  • Added revision_history: list[AppliedRevision] field to ImplementationPlan
  • Added format_revision_history() method for human-readable markdown output
  • Created claude_indexer/ui/plan/guardrails/revision_history.py with:
    • PlanSnapshot dataclass for versioned plan state snapshots
    • RevisionHistoryManager for snapshot creation, versioning, and rollback
    • PlanPersistence for JSON file save/load of plans and history
  • Full serialization support with backward compatibility for old plans
  • Exports added to guardrails/__init__.py and plan/__init__.py
  • Tests: 41 unit tests covering all functionality

Phase 11: Prompt Augmentation & Exploration Hints

Goal: Inject planning guidelines and exploration hints into Claude's context.

Milestone 11.1: Planning Guidelines Generator

Objective: Generate contextual planning guidelines for injection.

Tasks

ID Task Priority Status
11.1.1 Create hooks/planning_guidelines.py HIGH DONE
11.1.2 Define guidelines template with placeholders HIGH DONE
11.1.3 Load project patterns from CLAUDE.md MEDIUM DONE
11.1.4 Generate collection-specific MCP commands MEDIUM DONE
11.1.5 Add configuration for guideline customization LOW DONE

Planning Guidelines Template:

PLANNING_GUIDELINES_TEMPLATE = """
=== PLANNING QUALITY GUIDELINES ===

When formulating this implementation plan, follow these guidelines:

## 1. Code Reuse Check (CRITICAL)
Before proposing ANY new function, class, or component:
- Search the codebase: `{mcp_prefix}search_similar("functionality")`
- Check existing patterns: `{mcp_prefix}read_graph(entity="Component", mode="relationships")`
- If similar exists, plan to REUSE or EXTEND it
- State explicitly: "Verified no existing implementation" or "Will extend existing Y"

## 2. Testing Requirements
Every plan that modifies code MUST include:
- [ ] Unit tests for new/modified functions
- [ ] Integration tests for API changes
- Task format: "Add tests for [feature] in [test_file]"

## 3. Documentation Requirements
Include documentation tasks when:
- Adding public APIs -> Update API docs
- Changing user-facing behavior -> Update README
- Adding configuration -> Update config docs

## 4. Architecture Alignment
Your plan MUST align with project patterns:
{project_patterns}

## 5. Performance Considerations
Flag any step that may introduce:
- O(n^2) or worse complexity
- Unbounded memory usage
- Missing timeouts on network calls

== END PLANNING GUIDELINES ==
"""

def generate_planning_guidelines(
    collection_name: str,
    project_patterns: str = "",
    exploration_hints: list[str] = None
) -> str:
    """Generate planning guidelines with project context."""
    mcp_prefix = f"mcp__{collection_name}-memory__"
    ...

Testing Requirements:

  • Test template rendering
  • Test project pattern loading
  • Test MCP prefix generation

Success Criteria:

  • Guidelines correctly formatted
  • Project-specific patterns included
  • MCP commands use correct collection

Implementation Details (Milestone 11.1 Complete):

  • Implemented as claude_indexer/hooks/planning/guidelines.py
  • PlanningGuidelinesConfig with all section toggles
  • PlanningGuidelines output with full_text, sections, mcp_commands
  • PlanningGuidelinesGenerator with 5 template sections
  • CLAUDE.md pattern loading from project root or .claude/
  • Tests: 18 unit tests in test_guidelines.py

Milestone 11.2: Exploration Hints Generator

Objective: Generate hints to guide Claude's sub-agent parallel exploration.

Tasks

ID Task Priority Status
11.2.1 Create hooks/exploration_hints.py HIGH DONE
11.2.2 Implement entity extraction from prompts HIGH DONE
11.2.3 Generate duplicate-check hints HIGH DONE
11.2.4 Generate test-discovery hints MEDIUM DONE
11.2.5 Generate documentation hints MEDIUM DONE
11.2.6 Generate architecture hints LOW DONE

Exploration Hints Generator:

def generate_exploration_hints(prompt: str, collection_name: str) -> list[str]:
    """Generate exploration hints for parallel sub-agents."""
    mcp_prefix = f"mcp__{collection_name}-memory__"
    entities = extract_entities(prompt)

    hints = [
        # Duplicate Check
        f"## Duplicate Check\n{mcp_prefix}search_similar('{entities[0]}', entityTypes=['function', 'class'])",

        # Test Discovery
        f"## Test Discovery\n{mcp_prefix}search_similar('test', entityTypes=['file'])",

        # Documentation
        f"## Documentation\n{mcp_prefix}search_similar('docs', entityTypes=['documentation'])",
    ]

    # Entity-specific hints
    for entity in entities[:3]:
        hints.append(
            f"## {entity} Analysis\n{mcp_prefix}read_graph(entity='{entity}', mode='smart')"
        )

    return hints

def extract_entities(prompt: str) -> list[str]:
    """Extract likely code entities from prompt."""
    patterns = [
        r'\b[A-Z][a-z]+(?:[A-Z][a-z]+)+\b',  # CamelCase
        r'\b[a-z]+(?:_[a-z]+)+\b',            # snake_case
        r'["\']([^"\']+)["\']',               # Quoted terms
    ]
    ...

Testing Requirements:

  • Test entity extraction accuracy
  • Test hint generation with various prompts
  • Verify MCP commands are valid

Success Criteria:

  • Entities extracted with >80% accuracy
  • Hints guide toward quality checks
  • MCP commands are executable

Implementation Details (Milestone 11.2 Complete):

  • Implemented as claude_indexer/hooks/planning/exploration.py
  • ExplorationHintsConfig with section toggles and max_entity_hints
  • ExplorationHints output with hints, extracted_entities, mcp_commands
  • ExplorationHintsGenerator with entity extraction patterns
  • Supports CamelCase, snake_case, quoted terms, technical terms
  • Tests: 21 unit tests in test_exploration.py

Milestone 11.3: Prompt Handler Integration

Objective: Integrate guidelines and hints into UserPromptSubmit hook.

Tasks

ID Task Priority Status
11.3.1 Modify hooks/prompt_handler.py for Plan Mode HIGH DONE
11.3.2 Implement guidelines injection HIGH DONE
11.3.3 Implement hints injection HIGH DONE
11.3.4 Add configuration toggle MEDIUM DONE
11.3.5 Measure injection latency (<50ms target) MEDIUM DONE

Testing Requirements:

  • Test full hook flow with Plan Mode
  • Test injection timing
  • Verify guidelines appear in context

Success Criteria:

  • Guidelines injected for Plan Mode
  • <50ms injection latency
  • Claude follows guidelines

Implementation Details (Milestone 11.3 Complete):

  • hooks/prompt_handler.py - Main hook with Plan Mode detection
  • claude_indexer/hooks/planning/injector.py - Coordinates injection
  • PlanContextInjectionConfig with all toggles and compact mode
  • PlanContextInjector assembles guidelines + hints
  • inject_plan_context() convenience function
  • Tests: 22 unit tests in test_injector.py

Phase 12: Plan QA Verification

Goal: Verify generated plans meet quality standards before user approval.

Milestone 12.1: Plan QA Verifier

Objective: Post-generation verification of plan quality.

Tasks

ID Task Priority Status
12.1.1 Create claude_indexer/hooks/plan_qa.py HIGH DONE
12.1.2 Implement missing test detection HIGH DONE
12.1.3 Implement missing doc detection HIGH DONE
12.1.4 Implement duplicate check verification HIGH DONE
12.1.5 Create PlanQAResult dataclass MEDIUM DONE
12.1.6 Generate human-readable feedback MEDIUM DONE
12.1.7 Integrate with plan output MEDIUM DONE

PlanQAVerifier:

@dataclass
class PlanQAResult:
    is_valid: bool = True
    missing_tests: list[str] = field(default_factory=list)
    missing_docs: list[str] = field(default_factory=list)
    potential_duplicates: list[str] = field(default_factory=list)
    architecture_warnings: list[str] = field(default_factory=list)
    suggestions: list[str] = field(default_factory=list)

    def has_issues(self) -> bool:
        return bool(self.missing_tests or self.missing_docs or
                    self.potential_duplicates or self.architecture_warnings)

    def format_feedback(self) -> str:
        """Format feedback for plan output."""
        if not self.has_issues():
            return "\n[Plan QA: All quality checks passed]"

        lines = ["\n=== Plan QA Feedback ==="]
        if self.missing_tests:
            lines.append("\n[WARN] Missing Test Coverage:")
            for item in self.missing_tests:
                lines.append(f"  - {item}")
        ...
        return "\n".join(lines)

class PlanQAVerifier:
    CODE_CHANGE_PATTERNS = re.compile(
        r'(add|create|implement|modify)\s+(?:a\s+)?(function|class|component)',
        re.IGNORECASE
    )

    def verify_plan(self, plan_text: str) -> PlanQAResult:
        """Verify a plan meets quality standards."""
        result = PlanQAResult()

        if self._needs_tests(plan_text) and not self._has_test_tasks(plan_text):
            result.missing_tests.append("Plan modifies code but includes no test tasks")
            result.suggestions.append("Add unit/integration test task")

        if self._is_user_facing(plan_text) and not self._has_doc_tasks(plan_text):
            result.missing_docs.append("User-facing changes without doc update")

        if self._creates_new_code(plan_text) and not self._mentions_reuse_check(plan_text):
            result.potential_duplicates.append("New code without explicit duplicate check")

        return result

Testing Requirements:

  • Test with plans missing tests
  • Test with plans missing docs
  • Test with complete plans
  • Verify feedback formatting

Success Criteria:

  • Detects missing test/doc tasks
  • Generates actionable feedback
  • <50ms verification latency

Implementation Details (Milestone 12.1 Complete):

  • Created claude_indexer/hooks/plan_qa.py with:
    • PlanQAConfig dataclass with check toggles and strict mode settings
    • PlanQAResult dataclass with has_issues(), format_feedback(), to_dict()
    • PlanQAVerifier class with pattern-based detection for:
      • CODE_CHANGE_PATTERNS (test coverage check)
      • TEST_TASK_PATTERNS (test task detection)
      • DOC_TASK_PATTERNS (doc task detection)
      • USER_FACING_PATTERNS (user-facing change detection)
      • REUSE_CHECK_PATTERNS (duplicate verification)
      • ARCHITECTURE_CONCERN_PATTERNS (performance anti-patterns)
    • verify_plan_qa() convenience function
  • Tests: 50+ unit tests in test_plan_qa.py covering all scenarios

Milestone 12.2: QA Integration Points

Objective: Integrate QA verification into the planning workflow.

Tasks

ID Task Priority Status
12.2.1 Add QA check after guardrail validation HIGH DONE
12.2.2 Append QA feedback to plan output HIGH DONE
12.2.3 Track QA pass/fail metrics LOW DONE
12.2.4 Add QA override configuration LOW DONE

Testing Requirements:

  • Test end-to-end QA flow
  • Test feedback appears in plan
  • Test override configuration

Success Criteria:

  • QA feedback visible to user
  • Metrics tracked for analysis (Milestone 13.5)
  • Override available for edge cases

Implementation Details (Milestone 12.2 Complete):

  • Updated claude_indexer/hooks/planning/injector.py with:
    • Added qa_enabled and qa_config to PlanContextInjectionConfig
    • Added verify_plan_output() method to PlanContextInjector
    • QA configuration supports JSON serialization
  • Updated claude_indexer/hooks/__init__.py to export Plan QA classes
  • PlanQAConfig provides override toggles:
    • enabled: Master toggle for QA
    • check_tests/check_docs/check_duplicates/check_architecture: Individual checks
    • fail_on_missing_tests/fail_on_missing_docs: Strict mode settings

QA Metrics Tracking (Milestone 13.5):

  • Added QA fields to MetricSnapshot in claude_indexer/ui/metrics/models.py:
    • qa_checks_passed, qa_issues_found, qa_missing_tests, qa_missing_docs
    • qa_potential_duplicates, qa_architecture_warnings, qa_verification_time_ms
  • Added record_qa_verification() method to MetricsCollector
  • Added get_qa_metrics_summary() for aggregated QA metrics
  • Updated PlanQAVerifier with optional metrics_collector parameter
  • Tests: 15 unit tests in test_qa_metrics.py

Phase 13: Polish, Testing & Documentation

Goal: Ensure production readiness with comprehensive testing and documentation.

Milestone 13.1: Integration Testing

Objective: End-to-end testing of Plan Mode integration.

Tasks

ID Task Priority Status
13.1.1 Create tests/integration/test_plan_mode.py HIGH DONE
13.1.2 Test full Plan Mode flow (detect -> augment -> validate -> revise) HIGH DONE
13.1.3 Test MCP tool integration HIGH DONE
13.1.4 Test issue tracker integration MEDIUM DONE
13.1.5 Test design doc indexing MEDIUM DONE
13.1.6 Create mock Claude Code Plan Mode responses MEDIUM DONE

Implementation Notes (Milestone 13.1 Complete):

  • Created comprehensive integration test suite with 47 tests
  • Test classes: TestPlanModeDetectionIntegration, TestContextInjectionIntegration, TestGuardrailValidationIntegration, TestAutoRevisionIntegration, TestFullPlanModeFlow, TestDesignDocIndexingIntegration, TestPlanQAVerification
  • Coverage: Plan Mode detection, context injection, guardrail validation, auto-revision, full pipeline flow, design doc parsing, QA verification
  • Performance: All tests complete in <0.2 seconds
  • All lint checks pass (black, isort, flake8, ruff)

Integration Test Scenarios:

class TestPlanModeIntegration:
    def test_full_plan_mode_flow(self):
        """Test complete Plan Mode integration."""
        # 1. Detect Plan Mode
        # 2. Inject guidelines
        # 3. Validate plan
        # 4. Apply auto-revisions
        # 5. Run QA check
        # 6. Verify output
        ...

    def test_plan_with_missing_tests(self):
        """Verify test tasks auto-added."""
        ...

    def test_duplicate_detection(self):
        """Verify duplicate warning generated."""
        ...

    def test_doc_requirement(self):
        """Verify doc tasks auto-added for user-facing changes."""
        ...

Testing Requirements:

  • >80% code coverage for new components
  • All scenarios covered
  • Performance benchmarks included

Success Criteria:

  • All integration tests pass (47 tests)
  • >80% coverage
  • Performance within targets (<0.2s for all tests)

Milestone 13.2: Performance Optimization ✅ DONE

Objective: Meet all performance targets.

Tasks

ID Task Priority Status
13.2.1 Profile Plan Mode detection latency HIGH DONE
13.2.2 Profile guidelines injection latency HIGH DONE
13.2.3 Profile validation latency HIGH DONE
13.2.4 Optimize hot paths MEDIUM DONE
13.2.5 Add caching where beneficial MEDIUM DONE
13.2.6 Create performance benchmarks MEDIUM DONE

Performance Targets:

Operation Target Budget
Plan Mode Detection <10ms Pattern matching
Guidelines Generation <20ms Template substitution
Exploration Hints <30ms Entity extraction
Plan Validation <500ms 5 rules
Auto-Revision <200ms Conflict checking
Plan QA <50ms Pattern matching
Total Overhead <100ms (excluding validation)

Testing Requirements:

  • Benchmark all operations
  • Test under load
  • Verify targets met

Implementation Details (Milestone 13.2 Complete):

  • Replaced time.time() with time.perf_counter() across all Plan Mode files for sub-millisecond precision
  • Added LRU caching for CLAUDE.md project patterns loading (mtime-based invalidation)
  • Added LRU caching for entity extraction in exploration hints (128-entry cache)
  • Created comprehensive benchmark suite: tests/benchmarks/test_plan_mode_performance.py

Files Modified:

  • claude_indexer/hooks/plan_mode_detector.py - Timing precision
  • claude_indexer/hooks/planning/guidelines.py - Timing + LRU cache
  • claude_indexer/hooks/planning/exploration.py - Timing + entity cache
  • claude_indexer/hooks/planning/injector.py - Timing precision
  • claude_indexer/hooks/plan_qa.py - Timing precision
  • claude_indexer/ui/plan/guardrails/engine.py - Timing precision
  • claude_indexer/ui/plan/guardrails/auto_revision.py - Timing precision

New Files:

  • tests/benchmarks/test_plan_mode_performance.py - 7 benchmark test classes

Benchmark Tests:

  • TestPlanModeDetectionPerformance - <10ms p95 target
  • TestGuidelinesGenerationPerformance - <20ms p95 target
  • TestExplorationHintsPerformance - <30ms p95 target
  • TestPlanQAPerformance - <50ms p95 target
  • TestGuardrailValidationPerformance - <500ms target
  • TestAutoRevisionPerformance - <200ms target
  • TestEndToEndPlanModePerformance - <100ms overhead target
  • TestMemoryUsage - <50MB peak usage
  • TestScalabilityMetrics - Linear scaling verification

Success Criteria:

  • All performance targets met
  • No regression from baseline
  • Clear metrics dashboard

Milestone 13.3: Documentation

Objective: Comprehensive documentation for Plan Mode integration.

Tasks

ID Task Priority Status
13.3.1 Update README.md with Plan Mode features HIGH DONE
13.3.2 Create docs/PLAN_MODE.md comprehensive guide HIGH DONE
13.3.3 Update HOOKS.md with Plan hooks HIGH DONE
13.3.4 Document all configuration options MEDIUM DONE
13.3.5 Create plan guardrail rule reference MEDIUM DONE
13.3.6 Add troubleshooting section MEDIUM DONE
13.3.7 Update CLAUDE.md with Plan guidelines LOW DONE

Documentation Structure:

docs/
├── PLAN_MODE.md          # Comprehensive Plan Mode guide
│   ├── Overview
│   ├── Configuration
│   ├── Quality Guardrails
│   ├── Auto-Revision Behavior
│   ├── MCP Context Integration
│   └── Troubleshooting
├── HOOKS.md              # Updated with Plan hooks
└── memory-functions.md   # Updated with new MCP tools

Implementation Details (Milestone 13.3 Complete):

  • Created docs/PLAN_MODE.md with ~700 lines covering:
    • Quick start and activation methods
    • Detection methods (4 signals) with confidence scoring
    • Context injection (guidelines + exploration hints)
    • All 5 guardrail rules with examples and auto-fix behavior
    • Auto-revision system and audit trail
    • Complete configuration reference
    • Troubleshooting section
    • Performance metrics
  • Overhauled README.md (reduced from 781 to ~400 lines):
    • Problem/Solution led structure
    • Feature highlights for all 4 major capabilities
    • Simplified setup with one-command path
    • Clean documentation links
  • Updated CLAUDE.md with Plan Mode usage section
  • Updated docs/HOOKS.md already had Plan Mode sections (Milestones 7.2, 8.4)

Testing Requirements:

  • All documentation accurate
  • Examples tested and working
  • No broken links

Success Criteria:

  • New user understands Plan Mode in <10 minutes
  • All features documented
  • Examples are copy-paste ready

Milestone 13.4: User Experience Validation ✅ DONE

Objective: Validate the "magical" UX described in PRD.

Tasks

ID Task Priority Status
13.4.1 Conduct user testing sessions HIGH DEFERRED - Manual Process
13.4.2 Collect feedback on plan quality HIGH DONE
13.4.3 Measure plan approval rate MEDIUM DONE
13.4.4 Iterate on findings format MEDIUM DONE
13.4.5 Add configuration for thoroughness level LOW DONE

Implementation Details (Milestone 13.4 Complete):

  • Extended PlanAdoptionRecord with feedback fields: approved, approved_at, rejection_reason, accuracy_rating, user_notes, revision_count
  • Added approval metrics to MetricsReport: approval_rate, pending_approval_count, average_accuracy_rating, average_revision_count, rejection_reasons_summary()
  • Added feedback recording methods to MetricsCollector: record_plan_approval(), record_plan_revision(), get_approval_rate_history(), get_quality_metrics_summary()
  • Created claude_indexer/ui/plan/formatters.py with:
    • ThoroughnessLevel enum (minimal, standard, thorough, exhaustive)
    • format_plan_findings_for_display() with thoroughness-aware output
    • format_plan_findings_for_claude() for Claude consumption
    • SEVERITY_ICONS and CATEGORY_NAMES mappings
  • Added thoroughness_level and group_findings_by_severity to PlanGuardrailConfig
  • Added format_for_display() and format_for_claude() methods to PlanGuardrailResult
  • Tests: 54 unit tests covering all functionality

Success Metrics Validation:

  • >90% of plans include test/doc tasks when appropriate (Phase 12 Plan QA)
  • <10% of plans require user revision (pending user testing)
  • Existing code reuse suggested in >80% of applicable cases (Phase 12 duplicate check)
  • Users report plans "feel like senior engineer's work" (pending user testing)

Success Criteria:

  • All code implementation complete
  • User feedback collection enabled (infrastructure ready)
  • UX validated through testing (pending task 13.4.1)

Milestone 13.5: Deferred Items Completion ✅ DONE

Objective: Complete deferred items from Phases 9 and 12.

Tasks

ID Task Priority Status
13.5.1 Implement parallel rule execution (9.2.3) MEDIUM DONE
13.5.2 Implement QA metrics tracking (12.2.3) LOW DONE
13.5.3 Update Ticket Sync Service notes LOW DONE

Implementation Details (Milestone 13.5 Complete):

Parallel Rule Execution (9.2.3):

  • Updated PlanGuardrailEngineConfig with max_parallel_workers (default: 4)
  • Added _validate_parallel() method using concurrent.futures.ThreadPoolExecutor
  • Added _validate_sequential() method (refactored from original code)
  • Modified validate() to conditionally use parallel or sequential execution
  • Parallel mode executes rules concurrently with configurable worker count
  • Error handling preserved in parallel mode (continue_on_error behavior)
  • Tests: 8 tests for parallel execution behavior

QA Metrics Tracking (12.2.3):

  • Extended MetricSnapshot with 7 QA-specific fields:
    • qa_checks_passed, qa_issues_found, qa_missing_tests
    • qa_missing_docs, qa_potential_duplicates, qa_architecture_warnings
    • qa_verification_time_ms
  • Added record_qa_verification() to MetricsCollector:
    • Creates MetricSnapshot from PlanQAResult
    • Calculates checks passed/failed counts
    • Records as tier 2 (design-time) snapshot
  • Added get_qa_metrics_summary() to MetricsCollector:
    • Returns pass rate, average issues, issue breakdown
    • Aggregates across all QA verification snapshots
  • Updated PlanQAVerifier with optional metrics_collector parameter:
    • Records metrics automatically after verification
    • Accepts optional plan_id for correlation
  • Backward compatible: old snapshots without QA fields load with defaults
  • Tests: 15 tests for QA metrics functionality

Ticket Sync Service (8.3.9) - Deferred to Phase 14:

  • Remains deferred due to complexity (new scheduler, Qdrant schema, sync state)
  • Recommended for dedicated Phase 14: Background Services
  • Current on-demand ticket fetching via MCP tools remains functional

Files Modified:

  • claude_indexer/ui/plan/guardrails/engine.py - Parallel execution
  • claude_indexer/ui/metrics/models.py - QA fields in MetricSnapshot
  • claude_indexer/ui/metrics/collector.py - QA metrics methods
  • claude_indexer/hooks/plan_qa.py - Metrics integration

Test Coverage:

  • tests/unit/ui/plan/guardrails/test_engine.py - 8 parallel execution tests
  • tests/unit/ui/metrics/test_qa_metrics.py - 15 QA metrics tests

Phase 14: MCP Server Enhancement - Testing Foundation

Goal: Establish comprehensive testing infrastructure for the mcp-qdrant-memory MCP server using Vitest.

PRD Reference: mcp-qdrant-memory/docs/PRD.md - Phase 14: MCP Server Enhancement

Milestone 14.1: Testing Foundation ✅ DONE

Objective: Set up Vitest testing framework with 60%+ baseline coverage, targeting 90-95% on core modules.

Tasks

ID Task Priority Status
14.1.1 Add Vitest devDependencies to package.json HIGH DONE
14.1.2 Add test scripts to package.json HIGH DONE
14.1.3 Create vitest.config.ts with coverage thresholds HIGH DONE
14.1.4 Create test fixtures (entities, relations) MEDIUM DONE
14.1.5 Create planModeGuard.test.ts HIGH DONE
14.1.6 Create tokenCounter.test.ts HIGH DONE
14.1.7 Create validation.test.ts HIGH DONE
14.1.8 Create bm25Service.test.ts HIGH DONE
14.1.9 Fix BM25 vitest import compatibility HIGH DONE

Implementation Details:

  • Configuration Files:

    • package.json - Added test scripts (test, test:watch, test:coverage, test:ui) and devDependencies (@vitest/coverage-v8, @vitest/ui)
    • vitest.config.ts - Vitest configuration with v8 coverage provider, HTML/LCOV reporters
  • Test Files Created (207 tests total):

    • src/__tests__/planModeGuard.test.ts - 38 tests, 100% coverage
    • src/__tests__/tokenCounter.test.ts - 34 tests, 96.57% coverage
    • src/__tests__/validation.test.ts - 74 tests, 83.28% coverage
    • src/__tests__/bm25Service.test.ts - 47 tests, 95.87% coverage
    • src/__tests__/fixtures/ - Test data for entities and relations
  • Bug Fix: Fixed BM25 library import compatibility issue where vitest SSR transforms imports differently than Node.js ESM, causing BM25.default.default is not a function error. Solution handles both import behaviors.

Coverage Results:

Module Statements Branches Functions
plan-mode-guard.ts 100% 100% 100%
tokenCounter.ts 96.57% 90.69% 100%
validation.ts 83.28% 78.14% 100%
bm25Service.ts 95.87% 87.71% 100%

Success Criteria:

  • 207 tests passing
  • >90% coverage on core modules (plan-mode-guard, tokenCounter, bm25Service)
  • >80% coverage on validation module
  • Build passes with no TypeScript errors
  • Test execution <1 second

Milestone 14.2: Integration Testing ✅ DONE

Objective: Add integration tests for MCP server functionality.

Tasks

ID Task Priority Status
14.2.1 Create MCP tool integration tests HIGH DONE
14.2.2 Create Qdrant persistence tests HIGH DONE
14.2.3 Add mock Qdrant client for isolated testing MEDIUM DONE
14.2.4 Test hybrid search (semantic + BM25) MEDIUM DONE

Implementation Details

New Files Created:

  • src/__tests__/mocks/qdrantClient.mock.ts - Mock Qdrant client with in-memory storage
  • src/__tests__/mocks/openaiClient.mock.ts - Mock OpenAI embeddings with deterministic generation
  • src/__tests__/mocks/index.ts - Mock infrastructure exports
  • src/__tests__/integration/qdrant.integration.test.ts - 45 tests for QdrantPersistence
  • src/__tests__/integration/mcp-tools.integration.test.ts - 50 tests for MCP tool validation
  • src/__tests__/integration/hybrid-search.integration.test.ts - 30 tests for BM25/hybrid search

Test Coverage:

  • Total tests: 362 (207 unit + 155 integration)
  • QdrantPersistence: Connection, Entity CRUD, Relation CRUD, Search, Scroll, Cache, Error handling, Multi-collection
  • MCP Tools: Write tool validation, Read tool validation, Plan Mode access control, Collection parameter support
  • Hybrid Search: BM25 keyword search, RRF fusion algorithm, Result processing, Unicode/special characters

Acceptance Criteria ✅

  • Mock infrastructure supports isolated testing without external dependencies
  • All MCP tools have validation tests
  • QdrantPersistence CRUD operations tested
  • Hybrid search (semantic + BM25) fusion tested
  • TypeScript build passes with no errors
  • All 362 tests pass

Milestone 14.3: CI/CD Integration ✅ DONE

Objective: Integrate testing into CI/CD pipeline.

Tasks

ID Task Priority Status
14.3.1 Add GitHub Actions workflow for MCP tests HIGH DONE
14.3.2 Configure coverage thresholds in CI MEDIUM DONE
14.3.3 Add test status badge to README LOW DONE

Implementation Details

New Files Created:

  • mcp-qdrant-memory/.github/workflows/ci.yml - GitHub Actions CI workflow

Workflow Configuration:

  • Triggers: Push and PR to main/master branches
  • Concurrency: Cancel in-progress runs on same branch
  • Jobs:
    • build: TypeScript compilation with artifact upload
    • typecheck: Strict type validation (tsc --noEmit)
    • test: Tests with coverage (Node.js 18, 20, 22 matrix)
    • security: npm audit for vulnerability scanning
  • Caching: npm dependencies cached for faster runs
  • Artifacts: Coverage reports uploaded (Node 20)

README Updates:

  • Added CI badge linking to GitHub Actions workflow

Acceptance Criteria ✅

  • CI workflow runs on push/PR to main/master
  • Build job compiles TypeScript successfully
  • Type check job validates types
  • Test job runs on Node.js 18, 20, 22 matrix (362 tests)
  • Coverage reports uploaded as artifacts
  • Security audit job runs
  • CI badge visible in README

Phase 15: MCP Server Code Quality & Documentation

Goal: Add ESLint, Prettier, pre-commit hooks, and governance documentation to mcp-qdrant-memory.

PRD Reference: mcp-qdrant-memory/docs/PRD.md - Section 4.3: Code Quality Tooling, Section 4.4: Documentation


Milestone 15.1: Code Quality Tooling ✅ DONE

Objective: Establish linting and formatting infrastructure for the MCP server.

Tasks

ID Task Priority Status
15.1.1 Install ESLint, Prettier, Husky, lint-staged dependencies HIGH DONE
15.1.2 Create ESLint configuration (eslint.config.mjs) with TypeScript support HIGH DONE
15.1.3 Create Prettier configuration (.prettierrc, .prettierignore) HIGH DONE
15.1.4 Update package.json with lint/format scripts HIGH DONE
15.1.5 Initialize Husky and configure pre-commit hook MEDIUM DONE
15.1.6 Update CI workflow with lint job HIGH DONE

Implementation Details:

  • ESLint v9 flat config with typescript-eslint/recommendedTypeChecked
  • Relaxed rules for existing codebase (warnings for any-related rules)
  • Prettier with double quotes, semicolons, 100 char width
  • Pre-commit hook runs lint-staged (ESLint + Prettier on staged files)
  • CI lint job runs npm run lint and npm run format:check

New npm Scripts:

{
  "lint": "eslint src/",
  "lint:fix": "eslint src/ --fix",
  "format": "prettier --write .",
  "format:check": "prettier --check .",
  "typecheck": "tsc --noEmit"
}

Acceptance Criteria ✅

  • ESLint configured and passing (0 errors, warnings allowed)
  • Prettier configured and passing
  • Pre-commit hooks functional
  • CI lint job passing

Milestone 15.2: Governance Documentation ✅ DONE

Objective: Add contributor documentation and project governance files.

Tasks

ID Task Priority Status
15.2.1 Create CONTRIBUTING.md with development guidelines HIGH DONE
15.2.2 Create CHANGELOG.md with version history HIGH DONE
15.2.3 Create LICENSE file (MIT) HIGH DONE

Implementation Details:

  • CONTRIBUTING.md includes development setup, code style, testing, PR process
  • CHANGELOG.md follows Keep a Changelog format
  • LICENSE file matches package.json MIT declaration

Acceptance Criteria ✅

  • CONTRIBUTING.md present with development guidelines
  • CHANGELOG.md present with version history
  • LICENSE file present

Phase 15 Summary

Files Created:

  • mcp-qdrant-memory/eslint.config.mjs - ESLint configuration
  • mcp-qdrant-memory/.prettierrc - Prettier configuration
  • mcp-qdrant-memory/.prettierignore - Prettier ignore patterns
  • mcp-qdrant-memory/.husky/pre-commit - Pre-commit hook
  • mcp-qdrant-memory/CONTRIBUTING.md - Contributor guide
  • mcp-qdrant-memory/CHANGELOG.md - Version history
  • mcp-qdrant-memory/LICENSE - MIT license

Files Modified:

  • mcp-qdrant-memory/package.json - Added lint/format scripts, lint-staged config
  • mcp-qdrant-memory/.github/workflows/ci.yml - Added lint job

CI/CD:

  • Lint job added to GitHub Actions workflow
  • Runs ESLint and Prettier checks on all PRs
  • 0 errors required, warnings allowed for gradual cleanup

Phase 16: MCP Server CI/CD Completion

Goal: Complete CI/CD infrastructure for the mcp-qdrant-memory MCP server.

Status: ✅ Complete

Milestone 16.1: Release Workflow ✅ DONE

Objective: Automated npm publishing on version tags.

Files Created:

  • mcp-qdrant-memory/.github/workflows/release.yml - Release workflow

Features:

  • Triggers on version tags (v*..)
  • Builds and tests before publishing
  • npm publish with provenance for supply chain security
  • Auto-generates GitHub releases with release notes

Milestone 16.2: Dependabot Configuration ✅ DONE

Objective: Automated dependency management.

Files Created:

  • mcp-qdrant-memory/.github/dependabot.yml - Dependabot configuration

Features:

  • Weekly npm dependency updates (Monday)
  • Dev dependencies grouped together
  • PR limit of 10 to avoid noise
  • Consistent commit message prefix (chore(deps))

Phase 16 Summary

MCP Server Progress:

  • Phase 1 (Testing): 100% complete - 362 tests
  • Phase 2 (CI/CD): 100% complete - CI, Release, Dependabot
  • Phase 3 (Code Quality): 100% complete - ESLint, Prettier, Husky, docs
  • Phase 4 (Error Handling): 0% - Not started
  • Phase 5 (Security): 0% - Not started
  • Phase 6 (Coverage): 100% complete - Integration tests

Overall MCP Server Progress: 65%


Phase 17: MCP Server Error Handling & Security

Goal: Complete Phase 4 (Error Handling) and Phase 5 (Security) for the mcp-qdrant-memory MCP server.

Status: ✅ Complete

PRD Reference: mcp-qdrant-memory/docs/PRD.md - Sections 4.5, 4.7, 4.8

Milestone 17.1: Security - Scoped Fetch Override ✅ DONE

Objective: Fix critical security issue where Qdrant API key was leaked to all fetch requests.

File Modified:

  • mcp-qdrant-memory/src/fetch-override.ts

Change: Previously, the fetch override added the Qdrant API key to ALL fetch requests globally, including third-party APIs like Voyage AI, OpenAI, Linear, and GitHub. This leaked the API key to external services.

Fix: Scoped API key injection to only Qdrant URLs using url.startsWith(qdrantUrl) check.

Tests Created:

  • mcp-qdrant-memory/src/__tests__/fetch-override.test.ts (18 tests)

Acceptance Criteria ✅

  • Qdrant API key only added to Qdrant URL requests
  • Voyage AI requests do NOT contain Qdrant API key
  • OpenAI/Linear/GitHub requests do NOT contain Qdrant API key
  • All 18 security tests passing

Milestone 17.2: Security - Input Size Validation ✅ DONE

Objective: Add input size limits to prevent DoS attacks via oversized payloads.

File Modified:

  • mcp-qdrant-memory/src/validation.ts

Constants Added:

export const INPUT_LIMITS = {
  QUERY_MAX_LENGTH: 10000,
  ENTITIES_MAX_COUNT: 1000,
  ENTITY_NAME_MAX_LENGTH: 500,
  OBSERVATIONS_MAX_COUNT: 100,
  OBSERVATION_MAX_LENGTH: 50000,
  RELATIONS_MAX_COUNT: 1000,
  ENTITY_NAMES_MAX_COUNT: 1000,
} as const;

Validators Updated:

  • validateSearchSimilarRequest - query length
  • validateSearchDocsRequest - query length
  • validateCreateEntitiesRequest - entity count, name length, observations count/length
  • validateAddObservationsRequest - entity name, contents count/length
  • validateCreateRelationsRequest - relations count
  • validateDeleteEntitiesRequest - entity names count

Tests Extended:

  • mcp-qdrant-memory/src/__tests__/validation.test.ts (+22 tests)

Acceptance Criteria ✅

  • All validators enforce size limits
  • Clear error messages indicate limit exceeded
  • All 22 new validation tests passing

Milestone 17.3: Edge Case - BM25 Unicode Support ✅ DONE

Objective: Fix Unicode stripping in BM25 text processing.

File Modified:

  • mcp-qdrant-memory/src/bm25/bm25Service.ts

Change: Replaced ASCII-only regex with Unicode-aware regex.

// Before (stripped Unicode)
.replace(/[^\w\s]/g, ' ')

// After (preserves Unicode letters and numbers)
.replace(/[^\p{L}\p{N}\s]/gu, ' ')

Tests Extended:

  • mcp-qdrant-memory/src/__tests__/bm25Service.test.ts (+7 Unicode tests)

Languages Tested:

  • Chinese (用户认证)
  • Japanese (ユーザー認証)
  • Korean (사용자 인증)
  • Cyrillic (аутентификации)
  • Arabic (مصادقة)
  • Accented Latin (autenticación)

Acceptance Criteria ✅

  • Unicode characters preserved in BM25 indexing
  • All 7 Unicode tests passing
  • Mixed ASCII/Unicode content searchable

Milestone 17.4: Error Handling - Context Preservation ✅ DONE

Objective: Preserve error context when wrapping errors in McpError.

File Modified:

  • mcp-qdrant-memory/src/index.ts

Change: Added error cause to McpError for stack trace preservation.

} catch (error) {
  const mcpError = new McpError(
    ErrorCode.InternalError,
    error instanceof Error ? error.message : String(error)
  );
  // Preserve original error for debugging
  (mcpError as any).cause = error;
  throw mcpError;
}

Acceptance Criteria ✅

  • Original error attached as cause
  • Stack traces preserved for debugging
  • All existing tests still passing

Phase 17 Summary

Files Modified:

  • mcp-qdrant-memory/src/fetch-override.ts - Scoped API key injection
  • mcp-qdrant-memory/src/validation.ts - Added INPUT_LIMITS and size validation
  • mcp-qdrant-memory/src/bm25/bm25Service.ts - Unicode-aware regex
  • mcp-qdrant-memory/src/index.ts - Error context preservation

Files Created:

  • mcp-qdrant-memory/src/__tests__/fetch-override.test.ts - Security tests

Tests Added:

  • 18 fetch-override security tests
  • 22 input validation size limit tests
  • 7 BM25 Unicode support tests
  • Total: 47 new tests

MCP Server Progress Update:

  • Phase 1 (Testing): 100% complete - 362 tests → 407 tests
  • Phase 2 (CI/CD): 100% complete
  • Phase 3 (Code Quality): 100% complete
  • Phase 4 (Error Handling): 100% complete ✅ NEW
  • Phase 5 (Security): 100% complete ✅ NEW
  • Phase 6 (Coverage): 100% complete

Overall MCP Server Progress: 100% ✅


Phase 18: MCP Server Resilience & Resource Management

Objective: Complete remaining P1/P2 items from the MCP server PRD (sections 4.5 and 4.6).

Status: Complete ✅

Milestone 18.1: Graceful Shutdown

Objective: Handle process termination signals properly.

Tasks

ID Task Priority Status
18.1.1 Create src/shutdown.ts with signal handler infrastructure HIGH DONE
18.1.2 Implement SIGTERM handler with 10-second grace period HIGH DONE
18.1.3 Implement SIGINT handler (same logic as SIGTERM) HIGH DONE
18.1.4 Add cleanup actions: cancel pending requests, flush logs HIGH DONE
18.1.5 Integrate shutdown manager into src/index.ts HIGH DONE
18.1.6 Create src/__tests__/shutdown.test.ts with tests HIGH DONE

Implementation Details:

  • Created ShutdownManager class with:
    • Signal handler registration (SIGTERM/SIGINT)
    • Cleanup callback registration and execution
    • Pending request tracking via AbortController
    • Configurable grace period (default 10s)
    • Global singleton access via getShutdownManager()
  • 21 tests covering all shutdown functionality

Success Criteria:

  • Graceful shutdown handles SIGTERM/SIGINT properly
  • Cleanup callbacks executed in order
  • Pending requests cancelled on shutdown

Milestone 18.2: External API Timeouts

Objective: Add configurable timeouts for HTTP calls to external APIs.

Tasks

ID Task Priority Status
18.2.1 Create src/http-client.ts with fetchWithTimeout() utility HIGH DONE
18.2.2 Update Voyage AI embedding calls (30s timeout) HIGH DONE
18.2.3 Update Linear API calls (10s timeout) HIGH DONE
18.2.4 Update GitHub API calls (10s timeout) HIGH DONE
18.2.5 Create src/__tests__/http-client.test.ts with tests HIGH DONE

Implementation Details:

  • Created fetchWithTimeout() utility with:
    • AbortController-based timeout mechanism
    • TimeoutError and ShutdownAbortError classes
    • Integration with ShutdownManager for request tracking
    • DEFAULT_TIMEOUTS constants for each API
    • Environment variable support for custom timeouts
  • Updated all external API calls in qdrant.ts:
    • Voyage AI: 30s timeout
    • Linear API: 10s timeout (2 call sites)
    • GitHub API: 10s timeout (4 call sites)
  • 23 tests covering timeout behavior

Success Criteria:

  • External API calls have configurable timeouts
  • TimeoutError thrown on timeout
  • Requests properly cancelled and cleaned up

Milestone 18.3: BM25 Service LRU Cleanup

Objective: Implement LRU cleanup for per-collection BM25 services.

Tasks

ID Task Priority Status
18.3.1 Add lastAccessed timestamp to BM25 service tracking MEDIUM DONE
18.3.2 Implement cleanupStaleServices() method MEDIUM DONE
18.3.3 Add periodic cleanup timer (every 5 minutes) MEDIUM DONE
18.3.4 Add max service count limit (10 collections) MEDIUM DONE
18.3.5 Add cleanup on graceful shutdown MEDIUM DONE
18.3.6 Add tests for LRU cleanup behavior MEDIUM DONE

Implementation Details:

  • Added BM25ServiceEntry interface with lastAccessed timestamp
  • Configuration constants:
    • BM25_MAX_SERVICES = 10 (max cached services)
    • BM25_TTL_MS = 30 minutes (stale threshold)
    • BM25_CLEANUP_INTERVAL_MS = 5 minutes
  • LRU cleanup algorithm:
    1. Remove entries not accessed within TTL
    2. If count > max, evict oldest by lastAccessed
  • Public methods:
    • getBM25ServiceCount() for monitoring
    • stopBM25Cleanup() for graceful shutdown
  • 4 integration tests for BM25 service management

Success Criteria:

  • BM25 services cleaned up based on LRU policy
  • Max service count enforced
  • Cleanup stops on shutdown

Phase 18 Summary

Files Created:

  • mcp-qdrant-memory/src/shutdown.ts - Graceful shutdown manager
  • mcp-qdrant-memory/src/http-client.ts - HTTP client with timeout support
  • mcp-qdrant-memory/src/__tests__/shutdown.test.ts - Shutdown tests
  • mcp-qdrant-memory/src/__tests__/http-client.test.ts - HTTP client tests

Files Modified:

  • mcp-qdrant-memory/src/index.ts - Shutdown integration, cleanup methods
  • mcp-qdrant-memory/src/persistence/qdrant.ts - BM25 LRU cleanup, fetchWithTimeout usage
  • mcp-qdrant-memory/src/__tests__/integration/qdrant.integration.test.ts - BM25 tests

Tests Added:

  • 21 shutdown tests
  • 23 http-client tests
  • 4 BM25 management tests
  • Total: 48 new tests

PRD Coverage:

  • 4.5.2 Timeout Configuration - Complete
  • 4.6.1 Graceful Shutdown - Complete
  • 4.6.2 BM25 Service Cleanup - Complete

Deferred to Future Phase:

  • 4.5.1 Result Type Pattern - Requires significant refactoring
  • 4.6.3 Connection Lifecycle Management - Health checks, reconnection logic

Phase 19: MCP Server Structured Logging

Objective: Implement structured logging throughout the MCP server codebase, replacing all console.error/console.log calls with a consistent, configurable logging system.

Status: Complete ✅

Milestone 19.1: Logger Implementation

Objective: Create a structured logging system with module-specific loggers.

Tasks

ID Task Priority Status
19.1.1 Create src/logger.ts with structured logger implementation HIGH DONE
19.1.2 Implement 4 log levels (debug, info, warn, error) HIGH DONE
19.1.3 Add JSON output format (LOG_FORMAT=json) HIGH DONE
19.1.4 Add human-readable output format (default) MEDIUM DONE
19.1.5 Implement LOG_LEVEL environment variable filtering MEDIUM DONE
19.1.6 Create pre-configured module loggers MEDIUM DONE
19.1.7 Add child logger support with context inheritance LOW DONE
19.1.8 Create src/__tests__/logger.test.ts with tests HIGH DONE

Implementation Details:

  • Created Logger class with:
    • 4 log levels: debug, info, warn, error
    • JSON structured output (LOG_FORMAT=json)
    • Human-readable output (default)
    • Log level filtering via LOG_LEVEL env var
    • Child logger support with context inheritance
    • Error serialization with stack traces
  • Pre-configured module loggers:
    • logger (general MCP server)
    • qdrantLogger (database operations)
    • bm25Logger (keyword search)
    • validationLogger (input validation)
    • planModeLogger (plan mode access control)
    • shutdownLogger (shutdown lifecycle)
    • configLogger (configuration loading)
    • ignoreFilterLogger (claudeignore filtering)
  • 29 tests covering all logger functionality

Success Criteria:

  • Structured logging with consistent format
  • Configurable log level filtering
  • Module-specific loggers for better debugging

Milestone 19.2: Logger Integration

Objective: Replace all console.error/console.log calls with structured logging.

Tasks

ID Task Priority Status
19.2.1 Integrate logger into src/config.ts HIGH DONE
19.2.2 Integrate logger into src/validation.ts HIGH DONE
19.2.3 Integrate logger into src/bm25/bm25Service.ts HIGH DONE
19.2.4 Integrate logger into src/plan-mode-guard.ts MEDIUM DONE
19.2.5 Integrate logger into src/shutdown.ts HIGH DONE
19.2.6 Integrate logger into src/claudeignore/filter.ts MEDIUM DONE
19.2.7 Integrate logger into src/index.ts HIGH DONE
19.2.8 Integrate logger into src/persistence/qdrant.ts HIGH DONE
19.2.9 Update src/__tests__/shutdown.test.ts for new format MEDIUM DONE

Implementation Details:

  • Replaced 60+ console.error/console.log calls across 8 production files
  • Each module uses its designated logger (e.g., qdrantLogger for database operations)
  • Debug-level logging for verbose debugging (e.g., relation processing, batch progress)
  • Info-level logging for normal operations (e.g., server startup, BM25 initialization)
  • Warn-level logging for warnings (e.g., ignore file not found, safety breaks)
  • Error-level logging for errors with stack traces

Success Criteria:

  • All console.error/console.log calls replaced
  • All 503 tests passing
  • TypeScript clean (no errors)
  • Prettier formatted

Phase 19 Summary

Files Created:

  • mcp-qdrant-memory/src/logger.ts - Structured logger implementation (198 lines)
  • mcp-qdrant-memory/src/__tests__/logger.test.ts - Logger test suite (453 lines)

Files Modified:

  • mcp-qdrant-memory/src/config.ts - Environment variable error logging
  • mcp-qdrant-memory/src/validation.ts - Debug-level validation logging
  • mcp-qdrant-memory/src/bm25/bm25Service.ts - BM25 operation logging
  • mcp-qdrant-memory/src/plan-mode-guard.ts - Plan mode state change logging
  • mcp-qdrant-memory/src/shutdown.ts - Shutdown lifecycle logging
  • mcp-qdrant-memory/src/claudeignore/filter.ts - Ignore pattern warning logging
  • mcp-qdrant-memory/src/index.ts - Server lifecycle and auto-reduce logging
  • mcp-qdrant-memory/src/persistence/qdrant.ts - Database operation logging
  • mcp-qdrant-memory/src/__tests__/shutdown.test.ts - Updated for new log format

Tests Added:

  • 29 logger tests
  • Total MCP Server Tests: 503

PRD Coverage:

  • 4.5.3 Structured Logging - Complete ✅

Appendix: Plan Guardrail Rule Specifications

A.1 Coverage Rules (2)

# Rule Severity Detection Auto-Fix
1 PLAN.TEST_REQUIREMENT MEDIUM Feature task without test dependency Add test task
2 PLAN.DOC_REQUIREMENT LOW User-facing change without doc task Add doc task

A.2 Consistency Rules (1)

# Rule Severity Detection Auto-Fix
1 PLAN.DUPLICATE_DETECTION HIGH Semantic similarity >70% with existing code Modify task to reference existing

A.3 Architecture Rules (1)

# Rule Severity Detection Auto-Fix
1 PLAN.ARCHITECTURAL_CONSISTENCY MEDIUM File paths outside established patterns Add warning to task

A.4 Performance Rules (1)

# Rule Severity Detection Auto-Fix
1 PLAN.PERFORMANCE_PATTERN LOW Known anti-patterns (N+1, no caching) Add performance note

Implementation Order Summary

Critical Path

  1. Phase 7: Plan Mode Detection & Hook Infrastructure (foundation)
  2. Phase 9: Quality Guardrails Framework (core value)
  3. Phase 10: Auto-Revision System (key differentiator)
  4. Phase 11: Prompt Augmentation (guidance injection)

High Priority

  1. Phase 8: MCP Context Integration (rich context)
  2. Phase 12: Plan QA Verification (quality assurance)

Lower Priority

  1. Phase 13: Polish, Testing & Documentation (production readiness)

Dependencies Graph

Phase 7 (Detection/Hooks)
       |
       +----------------------+
       v                      v
Phase 9 (Guardrails)    Phase 8 (MCP Context)
       |                      |
       v                      |
Phase 10 (Auto-Revision) <----+
       |
       v
Phase 11 (Prompt Augmentation)
       |
       v
Phase 12 (Plan QA)
       |
       v
Phase 13 (Polish/Testing/Docs)

Risk Mitigation

Risk Impact Mitigation
False positives annoying HIGH Configurable thresholds, easy overrides
Performance too slow MEDIUM Parallel validation, caching
Auto-revision conflicts HIGH Conflict detection, iteration limits
Issue tracker API failures MEDIUM Graceful degradation, caching
Plan Mode detection fails HIGH Multiple detection methods, fallbacks

Success Criteria Summary

  • Plan Mode detected with >95% accuracy
  • <100ms overhead for plan augmentation (Phase 11 complete: <50ms)
  • >90% of plans include test/doc tasks when appropriate (Phase 12 Plan QA)
  • Existing code reuse suggested in >80% of applicable cases (Phase 12 duplicate check)
  • <10% of plans require user revision before approval (pending user testing)
  • All 5 guardrail rules implemented and tested
  • MCP tools for docs and tickets functional
  • Documentation complete (Phase 13.3)
  • Plan QA verification implemented (Phase 12)
  • Feedback collection infrastructure (Phase 13.4)
  • Thoroughness level configuration (Phase 13.4)

Generated from PRD.md and TDD.md (Plan Mode Integration)