Version: 1.0 Based On: PRD.md and TDD.md (Plan Mode Integration) Target: Deepen Plan Mode integration with quality guardrails and rich context Prerequisites: Phases 0-6 complete (v2.9.20+)
- Executive Summary
- Phase 7: Foundation - Plan Mode Detection & Hook Infrastructure
- Phase 8: MCP Context Integration
- Phase 9: Quality Guardrails Framework
- Phase 10: Auto-Revision System
- Phase 11: Prompt Augmentation & Exploration Hints
- Phase 12: Plan QA Verification
- Phase 13: Polish, Testing & Documentation
- Phase 14: MCP Server Enhancement - Testing Foundation
- Phase 15: MCP Server Code Quality & Documentation
- Phase 16: MCP Server CI/CD Completion
- Phase 17: MCP Server Error Handling & Security
- Phase 18: MCP Server Resilience & Resource Management
- Phase 19: MCP Server Structured Logging
- Appendix: Rule Specifications
Enhance Claude Code's Plan Mode to produce implementation plans that are thorough, context-aware, and aligned with project best practices. The system proactively catches architectural/design issues during planning and ensures plans include necessary tasks (testing, documentation) while avoiding redundant or ill-advised changes.
- Hook into Claude Code's Plan Mode - Use hooks/MCP to enhance native Plan Mode behavior
- Auto-revise plans - Automatically add missing tasks before showing to user
- Full MCP context - Code index, design docs, and issue tracking (Linear, GitHub)
- Hybrid parallelism - Inject prompts that guide Claude Code's sub-agent exploration
| Component | Status | Notes |
|---|---|---|
| CLI Infrastructure | v2.9.20 | Full claude-indexer with hooks |
| MCP Server | Complete | 8+ tools, streaming responses |
| Hook Framework | Complete | SessionStart, UserPromptSubmit, PreToolUse, PostToolUse |
| Memory Guard v4.3 | Complete | 27 pattern checks |
| UI Consistency | Complete | 15+ rules, 3-tier architecture |
- Plan Completeness: >90% of plans include test/doc tasks when appropriate
- Duplicate Detection: Existing code reuse suggested in >80% of applicable cases
- Plan Approval Rate: <10% of plans require user revision before approval
- User Confidence: Reviewers have minimal additional pointers to add
- Performance: <100ms overhead for plan augmentation
Goal: Establish the infrastructure for detecting Plan Mode activation and intercepting plan generation.
Objective: Reliably detect when Claude Code enters Plan Mode.
| ID | Task | Priority | Status |
|---|---|---|---|
| 7.1.1 | Create hooks/plan_mode_detector.py with pattern detection |
HIGH | DONE |
| 7.1.2 | Implement explicit marker detection (@agent-plan, @plan) |
HIGH | DONE |
| 7.1.3 | Add planning keyword detection (confidence scoring) | MEDIUM | DONE |
| 7.1.4 | Implement environment variable detection (CLAUDE_PLAN_MODE) | MEDIUM | DONE |
| 7.1.5 | Create session state tracking for Plan Mode persistence | MEDIUM | DONE |
| 7.1.6 | Add unit tests for detection accuracy (>95% target) | HIGH | DONE |
Detection Patterns:
# Explicit markers (1.0 confidence)
EXPLICIT_PATTERNS = r'@agent-plan|@plan\b|--plan\b|plan\s*mode'
# Planning keywords (0.7 confidence)
PLANNING_KEYWORDS = r'\b(create|make|write|design|implement)\s+(a\s+)?plan\b'Testing Requirements:
- Unit tests for each detection method
- Test with real Claude Code Plan Mode sessions
- Verify <10ms detection latency
Success Criteria:
- >95% detection accuracy for Plan Mode (achieved: 100% on 30-case benchmark)
- <10ms detection latency (achieved: <1ms average)
- Zero false positives on non-plan prompts
Objective: Extend existing hook system to support Plan Mode interception.
| ID | Task | Priority | Status |
|---|---|---|---|
| 7.2.1 | Extend UserPromptSubmit hook for Plan Mode detection |
HIGH | DONE |
| 7.2.2 | Create plan context injection mechanism | HIGH | DONE |
| 7.2.3 | Implement hook chaining for plan augmentation | MEDIUM | DONE |
| 7.2.4 | Add Plan Mode state to SessionContext | MEDIUM | DONE |
| 7.2.5 | Create PlanModeContext dataclass for state tracking |
MEDIUM | DONE |
| 7.2.6 | Update .claude/settings.json template for Plan hooks |
LOW | DONE |
Implementation Details (Milestone 7.2 Complete):
- Created
claude_indexer/hooks/planning/package with:guidelines.py- PlanningGuidelinesGenerator (<20ms)exploration.py- ExplorationHintsGenerator (<30ms)injector.py- PlanContextInjector (<50ms total)
- Modified
hooks/prompt_handler.pyto inject guidelines and hints - Configuration via
CLAUDE_PLAN_MODE_CONFIGenv var orCLAUDE_PLAN_MODE_COMPACT
Hook Flow:
UserPromptSubmit Hook
|
+---> Plan Mode Detection
| |
| +---> (Yes) -> Inject Planning Guidelines
| | |
| | +---> Generate Exploration Hints
| | |
| | +---> Enable Plan QA Verification
| |
| +---> (No) -> Pass through unchanged
|
+---> Continue to Claude
Testing Requirements:
- Test hook invocation order
- Test state persistence across turns
- Verify non-blocking behavior
Documentation:
- Update HOOKS.md with Plan Mode hooks
- Add configuration examples
Success Criteria:
- Hooks execute in correct order
- State persists correctly (via SessionContext.plan_mode)
- <20ms hook overhead (guidelines <20ms, hints <30ms, total <50ms)
Goal: Enable Plan Mode to query rich context via MCP - code index, design docs, and issue trackers.
Objective: Index and search design documents (PRD, TDD, ADR, specs).
| ID | Task | Priority | Status |
|---|---|---|---|
| 8.1.1 | Create new EntityTypes: SPEC, PRD, TDD, ADR, REQUIREMENT | HIGH | DONE |
| 8.1.2 | Implement DesignDocParser in claude_indexer/analysis/ |
HIGH | DONE |
| 8.1.3 | Add document type detection (patterns in filename/content) | HIGH | DONE |
| 8.1.4 | Implement section extraction from markdown | MEDIUM | DONE |
| 8.1.5 | Extract individual requirements as separate entities | MEDIUM | DONE |
| 8.1.6 | Create relations between docs and code components | MEDIUM | DONE |
| 8.1.7 | Add design doc paths to configuration | LOW | DONE |
Entity Types:
class EntityType(Enum):
# Existing types...
SPEC = "spec" # Design specifications
PRD = "prd" # Product requirements documents
TDD = "tdd" # Technical design documents
ADR = "adr" # Architecture decision records
REQUIREMENT = "requirement" # Individual requirements from specsDesignDocParser:
class DesignDocParser(CodeParser):
"""Parser for design documents (PRD, TDD, ADR, specs)."""
DOC_TYPE_PATTERNS = {
"prd": [r"product\s+requirements?\s+document", r"^prd"],
"tdd": [r"technical\s+design\s+document", r"^tdd"],
"adr": [r"architecture\s+decision\s+record", r"^adr"],
"spec": [r"specification", r"^spec"],
}
def parse(self, file_path: Path) -> ParserResult:
# Extract sections, requirements, decisions
...Configuration (in unified_config.py):
class DesignDocsConfig(BaseModel):
enabled: bool = True
paths: list[str] = ["docs/", "specs/", "design/", "*.md"]
doc_patterns: dict[str, str] = {
"prd": "**/PRD*.md",
"tdd": "**/TDD*.md",
"adr": "**/adr/*.md",
}Testing Requirements:
- Test document type detection with various formats
- Test section extraction from real PRD/TDD files
- Test requirement entity creation
- Verify relations to code components
Success Criteria:
- Auto-detect document type with >90% accuracy
- Extract sections and requirements correctly
- Create searchable entities for all design docs
Implementation Details (Milestone 8.1 Complete):
- Created
claude_indexer/analysis/design_doc_parser.pywith DesignDocParser - Added 5 new EntityTypes: SPEC, PRD, TDD, ADR, REQUIREMENT
- Added DesignDocsConfig to
claude_indexer/config/unified_config.py - Document type detection via filename patterns and content patterns
- Section extraction respects configurable max_section_depth
- Requirement extraction supports RFC 2119 (MUST/SHALL/SHOULD/MAY), [REQ-XXX], and numbered lists
- Relations created between documents, sections, and requirements
- Tests: 30 unit tests covering all functionality
Objective: Add MCP tools for searching and retrieving design documents.
| ID | Task | Priority | Status |
|---|---|---|---|
| 8.2.1 | Add search_docs tool to MCP server |
HIGH | DONE |
| 8.2.2 | Add get_doc tool for full document retrieval |
HIGH | DONE |
| 8.2.3 | Implement docTypes filtering (prd, tdd, adr, spec) | MEDIUM | DONE |
| 8.2.4 | Add section-specific retrieval | LOW | DONE |
| 8.2.5 | Create TypeScript interfaces for doc tools | MEDIUM | DONE |
| 8.2.6 | Add validation for doc tool inputs | MEDIUM | DONE |
MCP Tool: search_docs:
{
name: "search_docs",
description: "Search design documents, specifications, PRDs, and ADRs",
inputSchema: {
type: "object",
properties: {
query: { type: "string", description: "Search query" },
docTypes: {
type: "array",
items: { type: "string" },
description: "Filter: prd, tdd, spec, adr"
},
limit: { type: "number", default: 10 }
},
required: ["query"]
}
}MCP Tool: get_doc:
{
name: "get_doc",
description: "Retrieve full content of a specific design document",
inputSchema: {
type: "object",
properties: {
docId: { type: "string", description: "Document ID or file path" },
section: { type: "string", description: "Optional: specific section" }
},
required: ["docId"]
}
}Testing Requirements:
- Test search with various queries
- Test filtering by document type
- Test full document retrieval
- Benchmark search latency (<50ms)
Success Criteria:
- Both tools implemented and tested
- Filter by document type works correctly
- <50ms search latency
Implementation Details (Milestone 8.2 Complete):
- Created
DocSearchResultandDocContentinterfaces inmcp-qdrant-memory/src/types.ts - Added
SearchDocsRequestandGetDocRequestinterfaces invalidation.ts - Implemented
validateSearchDocsRequestandvalidateGetDocRequestvalidators - Added
searchDocs()andgetDoc()methods toQdrantPersistenceclass - Added
search_docsandget_doctool definitions to MCP server - Supports docTypes filtering: prd, tdd, adr, spec
- Section-specific retrieval via
sectionparameter - Multi-project support via
collectionparameter - Returns sections sorted by line number, requirements with type classification
Objective: Enable querying tickets from Linear and GitHub Issues.
| ID | Task | Priority | Status |
|---|---|---|---|
| 8.3.1 | Create claude_indexer/integrations/ package |
HIGH | DONE |
| 8.3.2 | Implement LinearClient with GraphQL queries |
HIGH | DONE |
| 8.3.3 | Implement GitHubIssuesClient with REST API |
HIGH | DONE |
| 8.3.4 | Create TicketEntity data model |
HIGH | DONE |
| 8.3.5 | Add search_tickets MCP tool |
HIGH | DONE |
| 8.3.6 | Add get_ticket MCP tool with comments/PRs |
HIGH | DONE |
| 8.3.7 | Implement authentication configuration | MEDIUM | DONE |
| 8.3.8 | Add rate limiting and caching | MEDIUM | DONE |
| 8.3.9 | Create ticket sync service (background) | LOW | DEFERRED |
TicketEntity Data Model:
@dataclass
class TicketEntity:
id: str # e.g., "AVO-123", "github#456"
source: str # "linear", "github"
title: str
description: str
status: str # "open", "in_progress", "done"
assignee: str | None
labels: list[str]
priority: str | None
acceptance_criteria: list[str] # Extracted requirements
linked_prs: list[str] # PR references
def to_entity(self) -> Entity:
"""Convert to standard Entity for indexing."""
...LinearClient:
class LinearClient:
BASE_URL = "https://api.linear.app/graphql"
async def search_issues(
self,
query: str,
status: list[str] | None = None,
labels: list[str] | None = None,
limit: int = 20
) -> list[TicketEntity]:
...
async def get_issue(self, identifier: str) -> TicketEntity:
...MCP Tool: search_tickets:
{
name: "search_tickets",
description: "Search issue tracker for relevant tickets (Linear, GitHub)",
inputSchema: {
type: "object",
properties: {
query: { type: "string" },
status: { type: "array", items: { type: "string" } },
labels: { type: "array", items: { type: "string" } },
source: { type: "string", enum: ["linear", "github", "all"] }
},
required: ["query"]
}
}Configuration:
class LinearConfig(BaseModel):
enabled: bool = False
api_key: str = "" # From LINEAR_API_KEY env var
team_id: str = ""
class GitHubIssuesConfig(BaseModel):
enabled: bool = False
token: str = "" # From GITHUB_TOKEN env var
owner: str = ""
repo: str = ""Testing Requirements:
- Test Linear API integration (mock responses)
- Test GitHub API integration (mock responses)
- Test authentication handling
- Test rate limiting
Documentation:
- API key configuration guide
- Ticket search examples
Success Criteria:
- Both integrations work with API keys
- Search returns relevant tickets
- Acceptance criteria extracted correctly
Objective: Ensure Plan Mode only has read-only access to MCP tools.
| ID | Task | Priority | Status |
|---|---|---|---|
| 8.4.1 | Create PlanModeGuard class in MCP server |
HIGH | DONE |
| 8.4.2 | Define allowed tools list for Plan Mode | HIGH | DONE |
| 8.4.3 | Define blocked tools list (write operations) | HIGH | DONE |
| 8.4.4 | Integrate guard into MCP request handler | HIGH | DONE |
| 8.4.5 | Add set_plan_mode internal tool |
MEDIUM | DONE |
| 8.4.6 | Create clear error messages for blocked tools | MEDIUM | DONE |
PlanModeGuard:
const PLAN_MODE_ALLOWED = [
// Read-only code memory
"search_similar", "read_graph", "get_implementation",
// Read-only documents
"search_docs", "get_doc",
// Read-only tickets
"search_tickets", "get_ticket",
];
const PLAN_MODE_BLOCKED = [
// Write operations
"create_entities", "create_relations", "add_observations",
"delete_entities", "delete_observations", "delete_relations",
];
class PlanModeGuard {
private isPlanMode: boolean = false;
setPlanMode(enabled: boolean): void { ... }
isToolAllowed(toolName: string): boolean {
if (!this.isPlanMode) return true;
return !PLAN_MODE_BLOCKED.includes(toolName);
}
}Testing Requirements:
- Test tool blocking in Plan Mode
- Test allowed tools work correctly
- Verify error messages are clear
Success Criteria:
- Write tools blocked in Plan Mode
- Clear error messages for blocked operations
- No security bypass possible
Implementation Details (Milestone 8.4 Complete):
- Created
mcp-qdrant-memory/src/plan-mode-guard.tswith PlanModeGuard class - Environment variable detection via
CLAUDE_PLAN_MODE(matches Python implementation) - Blocked tools: create_entities, create_relations, add_observations, delete_entities, delete_observations, delete_relations
- Allowed tools: search_similar, read_graph, get_implementation, search_docs, get_doc, search_tickets, get_ticket, set_plan_mode
- Added
set_plan_modeMCP tool to enable/disable Plan Mode - Error responses include blocked tools list and hint for resolution
- Version bumped to 0.6.4
Goal: Create the framework for validating implementation plans against quality rules.
Objective: Define data structures for plan validation rules.
| ID | Task | Priority | Status |
|---|---|---|---|
| 9.1.1 | Create claude_indexer/ui/plan/guardrails/ package |
HIGH | DONE |
| 9.1.2 | Define PlanValidationContext dataclass |
HIGH | DONE |
| 9.1.3 | Define PlanValidationFinding dataclass |
HIGH | DONE |
| 9.1.4 | Define PlanRevision and RevisionType |
HIGH | DONE |
| 9.1.5 | Create abstract PlanValidationRule base class |
HIGH | DONE |
| 9.1.6 | Create PlanGuardrailConfig for rule configuration |
MEDIUM | DONE |
Core Data Structures:
# claude_indexer/ui/plan/guardrails/base.py
class RevisionType(Enum):
ADD_TASK = "add_task"
MODIFY_TASK = "modify_task"
REMOVE_TASK = "remove_task"
ADD_DEPENDENCY = "add_dependency"
REORDER_TASKS = "reorder_tasks"
@dataclass
class PlanValidationContext:
plan: ImplementationPlan
memory_client: Any | None = None
collection_name: str | None = None
project_path: Path = field(default_factory=Path.cwd)
config: PlanGuardrailConfig = None
source_requirements: str = ""
def search_memory(self, query: str, **kwargs) -> list[dict]:
"""Search semantic memory for similar code/patterns."""
...
@dataclass
class PlanValidationFinding:
rule_id: str
severity: Severity
summary: str
affected_tasks: list[str] = field(default_factory=list)
suggestion: str | None = None
can_auto_revise: bool = False
confidence: float = 1.0
evidence: list[Evidence] = field(default_factory=list)
suggested_revision: PlanRevision | None = None
@dataclass
class PlanRevision:
revision_type: RevisionType
rationale: str
target_task_id: str | None = None
new_task: Task | None = None
modifications: dict[str, Any] = field(default_factory=dict)
dependency_additions: list[tuple[str, str]] = field(default_factory=list)
class PlanValidationRule(ABC):
@property
@abstractmethod
def rule_id(self) -> str: ...
@property
@abstractmethod
def category(self) -> str: ... # coverage, consistency, architecture, performance
@abstractmethod
def validate(self, context: PlanValidationContext) -> list[PlanValidationFinding]: ...
@abstractmethod
def suggest_revision(self, finding: PlanValidationFinding, context: PlanValidationContext) -> PlanRevision | None: ...Testing Requirements:
- Unit tests for all data classes
- Test serialization/deserialization
- Test validation context memory search
Success Criteria:
- All data structures defined and tested
- Follows existing pattern from claude_indexer/rules/base.py
- Memory search integration works
Implementation Details (Milestone 9.1 Complete):
- Created
claude_indexer/ui/plan/guardrails/package with base.py, config.py, init.py - RevisionType enum with ADD_TASK, MODIFY_TASK, REMOVE_TASK, ADD_DEPENDENCY, REORDER_TASKS
- PlanRevision dataclass with serialization support
- PlanValidationFinding dataclass with evidence and suggested revision
- PlanValidationContext with plan, config, memory search integration
- PlanValidationRule ABC following rules/base.py pattern
- PlanGuardrailConfig Pydantic model with category toggles, auto-revise settings
- Added plan_guardrails to UnifiedConfig
- Tests: 52 unit tests covering all functionality
Objective: Create the engine that orchestrates rule validation.
| ID | Task | Priority | Status |
|---|---|---|---|
| 9.2.1 | Create PlanGuardrailEngine coordinator class |
HIGH | DONE |
| 9.2.2 | Implement rule discovery (auto-load from directory) | HIGH | DONE |
| 9.2.3 | Add parallel rule execution support | MEDIUM | DONE |
| 9.2.4 | Implement severity filtering | MEDIUM | DONE |
| 9.2.5 | Create PlanGuardrailResult aggregation |
MEDIUM | DONE |
| 9.2.6 | Add performance timing for rules | LOW | DONE |
PlanGuardrailEngine:
class PlanGuardrailEngine:
def __init__(self, config: PlanGuardrailConfig):
self.config = config
self.rules: dict[str, PlanValidationRule] = {}
self._discover_rules()
def _discover_rules(self) -> None:
"""Auto-discover rules from guardrails/rules/ directory."""
...
def validate(self, context: PlanValidationContext) -> list[PlanValidationFinding]:
"""Run all enabled rules against the plan."""
findings = []
for rule_id, rule in self.rules.items():
if self.config.is_rule_enabled(rule_id):
findings.extend(rule.validate(context))
return findings
def auto_revise(
self,
plan: ImplementationPlan,
findings: list[PlanValidationFinding]
) -> RevisedPlan:
"""Apply auto-revisions based on findings."""
...Testing Requirements:
- Test rule discovery
- Test parallel execution (Milestone 13.5)
- Test severity filtering
- Benchmark validation latency
Success Criteria:
- Rules auto-discovered from directory
- Parallel execution reduces latency (Milestone 13.5)
- <500ms total validation time
Implementation Details (Milestone 9.2 Complete):
- Created
claude_indexer/ui/plan/guardrails/engine.pywith:PlanGuardrailEngineConfigdataclass (timeout, error handling, confidence threshold)RuleExecutionResultdataclass for individual rule resultsPlanGuardrailResultdataclass with findings, statistics, error trackingPlanGuardrailEngineclass with full rule lifecycle management
- Key features:
register()/unregister()for manual rule registrationdiscover_rules()for auto-discovery from directoryvalidate()runs all enabled rules with configurable filteringvalidate_fast()runs only fast rules (<100ms) for sync checksvalidate_category()runs rules in specific category- Confidence threshold filtering
- Max findings per rule limiting
- Error handling with continue_on_error option
- Performance timing recorded per rule and total
- Tests: 42 unit tests covering all functionality
Parallel Execution (Milestone 13.5):
- Added
parallel_executionflag toPlanGuardrailEngineConfig(default: False) - Added
max_parallel_workersconfig option (default: 4) - Implemented
_validate_parallel()using ThreadPoolExecutor - Extracted
_validate_sequential()for the original behavior validate()method now conditionally uses parallel or sequential execution- Tests: 8 additional tests for parallel execution behavior
Objective: Implement the core quality validation rules.
| ID | Task | Priority | Rule Name | Category | Status |
|---|---|---|---|---|---|
| 9.3.1 | Test Requirement Detection | HIGH | PLAN.TEST_REQUIREMENT |
coverage | DONE |
| 9.3.2 | Documentation Requirement Detection | HIGH | PLAN.DOC_REQUIREMENT |
coverage | DONE |
| 9.3.3 | Duplicate Code Detection | HIGH | PLAN.DUPLICATE_DETECTION |
consistency | DONE |
| 9.3.4 | Architectural Consistency Check | MEDIUM | PLAN.ARCHITECTURAL_CONSISTENCY |
architecture | DONE |
| 9.3.5 | Performance Pattern Detection | MEDIUM | PLAN.PERFORMANCE_PATTERN |
performance | DONE |
Implementation Location: claude_indexer/ui/plan/guardrails/rules/
class TestRequirementRule(PlanValidationRule):
"""Ensures new features have corresponding test tasks."""
FEATURE_KEYWORDS = ["implement", "add", "create", "build", "develop", "new"]
TEST_KEYWORDS = ["test", "spec", "unittest", "pytest", "jest"]
TRIVIAL_PATTERNS = [r"^(fix|rename|move|delete)\s+(typo|comment|readme)"]
@property
def rule_id(self) -> str:
return "PLAN.TEST_REQUIREMENT"
def validate(self, context: PlanValidationContext) -> list[PlanValidationFinding]:
findings = []
for task in context.plan.all_tasks:
if self._is_feature_task(task) and not self._has_test_task(task, context.plan):
if not self._is_trivial(task):
findings.append(self._create_finding(task))
return findings
def suggest_revision(self, finding, context) -> PlanRevision:
"""Auto-add test task."""
feature_task = self._get_task(finding.affected_tasks[0], context.plan)
test_task = Task(
id=f"TASK-TST-{feature_task.id[-4:]}",
title=f"Add tests for {feature_task.title}",
description=f"Write tests for {feature_task.title}",
scope=feature_task.scope,
priority=feature_task.priority + 1,
estimated_effort="low",
impact=feature_task.impact * 0.8,
acceptance_criteria=["Unit tests cover main functionality", "Tests pass in CI"],
dependencies=[feature_task.id],
tags=["testing", "quality"],
)
return PlanRevision(
revision_type=RevisionType.ADD_TASK,
rationale=f"Feature '{feature_task.title}' needs test coverage",
new_task=test_task,
)class DuplicateDetectionRule(PlanValidationRule):
"""Detects tasks that might duplicate existing functionality."""
SIMILARITY_THRESHOLD = 0.70
@property
def rule_id(self) -> str:
return "PLAN.DUPLICATE_DETECTION"
def validate(self, context: PlanValidationContext) -> list[PlanValidationFinding]:
findings = []
if context.memory_client is None:
return findings
for task in context.plan.all_tasks:
query = f"{task.title} {task.description[:200]}"
results = context.search_memory(
query=query,
entity_types=["function", "class", "implementation_pattern"]
)
for result in results:
if result.get("score", 0) >= self.SIMILARITY_THRESHOLD:
findings.append(self._create_finding(task, result))
return findings
def suggest_revision(self, finding, context) -> PlanRevision:
"""Modify task to reference existing code."""
return PlanRevision(
revision_type=RevisionType.MODIFY_TASK,
target_task_id=finding.affected_tasks[0],
rationale="Potential duplicate detected",
modifications={
"description": f"{task.description}\n\n**Note:** Review existing implementation before proceeding.",
"acceptance_criteria": task.acceptance_criteria + [
f"Verified no duplication with existing code"
],
}
)Testing Requirements:
- Unit tests for each rule with positive/negative cases
- Test auto-revision generation
- Test with real implementation plans
- Measure false positive rate (<10% target)
Documentation:
- Rule reference documentation
- Configuration examples
- Override mechanisms
Success Criteria:
- All 5 rules implemented and tested
- <10% false positive rate
- Clear, actionable findings
Implementation Details (Milestone 9.3 Complete):
- Created
claude_indexer/ui/plan/guardrails/rules/package with 5 rules - TestRequirementRule: Detects feature tasks without test coverage, auto-suggests test tasks
- DocRequirementRule: Detects user-facing changes without documentation tasks
- DuplicateDetectionRule: Uses semantic memory search to find potential duplicate code
- ArchitecturalConsistencyRule: Validates file paths against project patterns
- PerformancePatternRule: Detects N+1 queries, missing caching, blocking operations, etc.
- All rules follow PlanValidationRule ABC pattern
- Updated
guardrails/__init__.pywith rule exports - Tests: 159 unit tests covering all 5 rules with positive/negative/edge cases
Goal: Automatically apply revisions to plans based on validation findings.
Objective: Create the engine that applies plan revisions automatically.
| ID | Task | Priority | Status |
|---|---|---|---|
| 10.1.1 | Create AutoRevisionEngine class |
HIGH | DONE |
| 10.1.2 | Implement revision sorting by severity | HIGH | DONE |
| 10.1.3 | Implement conflict detection | HIGH | DONE |
| 10.1.4 | Add circular dependency prevention | HIGH | DONE |
| 10.1.5 | Implement revision application methods | HIGH | DONE |
| 10.1.6 | Create RevisedPlan result dataclass |
MEDIUM | DONE |
| 10.1.7 | Add post-revision dependency resolution | MEDIUM | DONE |
| 10.1.8 | Add iteration limit (prevent infinite loops) | HIGH | DONE |
AutoRevisionEngine:
@dataclass
class AppliedRevision:
revision: PlanRevision
finding: PlanValidationFinding
success: bool
error: str | None = None
@dataclass
class RevisedPlan:
original_plan: ImplementationPlan
revised_plan: ImplementationPlan
revisions_applied: list[AppliedRevision] = field(default_factory=list)
revisions_skipped: list[tuple[PlanRevision, str]] = field(default_factory=list)
qa_passed: bool = False
class AutoRevisionEngine:
MAX_ITERATIONS = 3
def revise_plan(
self,
plan: ImplementationPlan,
findings: list[PlanValidationFinding],
rules: dict[str, PlanValidationRule]
) -> RevisedPlan:
"""
Apply revisions based on findings.
Algorithm:
1. Sort findings by severity (CRITICAL > HIGH > MEDIUM > LOW)
2. For each finding that can be auto-revised:
a. Get revision suggestion from rule
b. Check for conflicts
c. Apply revision if safe
3. Re-resolve dependencies
4. Update priorities
"""
...
def _check_conflicts(self, plan: ImplementationPlan, revision: PlanRevision) -> str | None:
"""Check if revision would create conflicts."""
if revision.revision_type == RevisionType.ADD_TASK:
if any(t.id == revision.new_task.id for t in plan.all_tasks):
return f"Task ID '{revision.new_task.id}' already exists"
if revision.revision_type == RevisionType.ADD_DEPENDENCY:
for from_id, to_id in revision.dependency_additions:
if self._would_create_cycle(plan, from_id, to_id):
return f"Would create circular dependency"
return None
def _would_create_cycle(self, plan, from_id, to_id) -> bool:
"""DFS to detect circular dependencies."""
...Testing Requirements:
- Test revision application for each type
- Test conflict detection
- Test circular dependency prevention
- Test iteration limit enforcement
Success Criteria:
- Revisions applied without conflicts
- No circular dependencies introduced
- Iteration limit prevents infinite loops
Implementation Details (Milestone 10.1 Complete):
- Created
claude_indexer/ui/plan/guardrails/auto_revision.pywith:AppliedRevisiondataclass for tracking applied revisionsRevisedPlandataclass with audit trail formattingAutoRevisionEngineclass with full revision lifecycle
- Key features:
- Revision sorting by severity (CRITICAL > HIGH > MEDIUM > LOW) and type order
- Conflict detection for all RevisionTypes (ADD_TASK, MODIFY_TASK, REMOVE_TASK, ADD_DEPENDENCY, REORDER_TASKS)
- Circular dependency detection using DFS algorithm
- Post-revision dependency resolution (removes orphaned dependencies)
- MAX_ITERATIONS = 3 to prevent infinite loops
- Configurable via PlanGuardrailConfig (auto_revise, max_revisions_per_plan, revision_confidence_threshold)
- Human-readable audit trail via
format_audit_trail()
- Tests: 55 unit tests covering all functionality
Objective: Track all revisions for transparency and debugging.
| ID | Task | Priority | Status |
|---|---|---|---|
| 10.2.1 | Add revision history to ImplementationPlan | MEDIUM | DONE |
| 10.2.2 | Create human-readable revision summary | MEDIUM | DONE |
| 10.2.3 | Add revision rollback capability | LOW | DONE |
| 10.2.4 | Implement revision persistence | LOW | DONE |
Audit Trail Format:
## Plan Revisions Applied
### 1. Added Test Task (PLAN.TEST_REQUIREMENT)
- **Reason**: Feature 'Add user authentication' needs test coverage
- **Added**: TASK-TST-0001 "Add tests for user authentication"
- **Confidence**: 95%
### 2. Modified Task (PLAN.DUPLICATE_DETECTION)
- **Reason**: Potential duplicate of existing 'AuthService.login()'
- **Modified**: TASK-0002 description to reference existing code
- **Confidence**: 78%Testing Requirements:
- Test audit trail generation
- Test revision summary formatting
- Verify all revisions tracked
Success Criteria:
- Complete audit trail for all revisions
- Human-readable summaries
- Transparency for user review
Implementation Details (Milestone 10.2 Complete):
- Added
revision_history: list[AppliedRevision]field toImplementationPlan - Added
format_revision_history()method for human-readable markdown output - Created
claude_indexer/ui/plan/guardrails/revision_history.pywith:PlanSnapshotdataclass for versioned plan state snapshotsRevisionHistoryManagerfor snapshot creation, versioning, and rollbackPlanPersistencefor JSON file save/load of plans and history
- Full serialization support with backward compatibility for old plans
- Exports added to
guardrails/__init__.pyandplan/__init__.py - Tests: 41 unit tests covering all functionality
Goal: Inject planning guidelines and exploration hints into Claude's context.
Objective: Generate contextual planning guidelines for injection.
| ID | Task | Priority | Status |
|---|---|---|---|
| 11.1.1 | Create hooks/planning_guidelines.py |
HIGH | DONE |
| 11.1.2 | Define guidelines template with placeholders | HIGH | DONE |
| 11.1.3 | Load project patterns from CLAUDE.md | MEDIUM | DONE |
| 11.1.4 | Generate collection-specific MCP commands | MEDIUM | DONE |
| 11.1.5 | Add configuration for guideline customization | LOW | DONE |
Planning Guidelines Template:
PLANNING_GUIDELINES_TEMPLATE = """
=== PLANNING QUALITY GUIDELINES ===
When formulating this implementation plan, follow these guidelines:
## 1. Code Reuse Check (CRITICAL)
Before proposing ANY new function, class, or component:
- Search the codebase: `{mcp_prefix}search_similar("functionality")`
- Check existing patterns: `{mcp_prefix}read_graph(entity="Component", mode="relationships")`
- If similar exists, plan to REUSE or EXTEND it
- State explicitly: "Verified no existing implementation" or "Will extend existing Y"
## 2. Testing Requirements
Every plan that modifies code MUST include:
- [ ] Unit tests for new/modified functions
- [ ] Integration tests for API changes
- Task format: "Add tests for [feature] in [test_file]"
## 3. Documentation Requirements
Include documentation tasks when:
- Adding public APIs -> Update API docs
- Changing user-facing behavior -> Update README
- Adding configuration -> Update config docs
## 4. Architecture Alignment
Your plan MUST align with project patterns:
{project_patterns}
## 5. Performance Considerations
Flag any step that may introduce:
- O(n^2) or worse complexity
- Unbounded memory usage
- Missing timeouts on network calls
== END PLANNING GUIDELINES ==
"""
def generate_planning_guidelines(
collection_name: str,
project_patterns: str = "",
exploration_hints: list[str] = None
) -> str:
"""Generate planning guidelines with project context."""
mcp_prefix = f"mcp__{collection_name}-memory__"
...Testing Requirements:
- Test template rendering
- Test project pattern loading
- Test MCP prefix generation
Success Criteria:
- Guidelines correctly formatted
- Project-specific patterns included
- MCP commands use correct collection
Implementation Details (Milestone 11.1 Complete):
- Implemented as
claude_indexer/hooks/planning/guidelines.py - PlanningGuidelinesConfig with all section toggles
- PlanningGuidelines output with full_text, sections, mcp_commands
- PlanningGuidelinesGenerator with 5 template sections
- CLAUDE.md pattern loading from project root or .claude/
- Tests: 18 unit tests in test_guidelines.py
Objective: Generate hints to guide Claude's sub-agent parallel exploration.
| ID | Task | Priority | Status |
|---|---|---|---|
| 11.2.1 | Create hooks/exploration_hints.py |
HIGH | DONE |
| 11.2.2 | Implement entity extraction from prompts | HIGH | DONE |
| 11.2.3 | Generate duplicate-check hints | HIGH | DONE |
| 11.2.4 | Generate test-discovery hints | MEDIUM | DONE |
| 11.2.5 | Generate documentation hints | MEDIUM | DONE |
| 11.2.6 | Generate architecture hints | LOW | DONE |
Exploration Hints Generator:
def generate_exploration_hints(prompt: str, collection_name: str) -> list[str]:
"""Generate exploration hints for parallel sub-agents."""
mcp_prefix = f"mcp__{collection_name}-memory__"
entities = extract_entities(prompt)
hints = [
# Duplicate Check
f"## Duplicate Check\n{mcp_prefix}search_similar('{entities[0]}', entityTypes=['function', 'class'])",
# Test Discovery
f"## Test Discovery\n{mcp_prefix}search_similar('test', entityTypes=['file'])",
# Documentation
f"## Documentation\n{mcp_prefix}search_similar('docs', entityTypes=['documentation'])",
]
# Entity-specific hints
for entity in entities[:3]:
hints.append(
f"## {entity} Analysis\n{mcp_prefix}read_graph(entity='{entity}', mode='smart')"
)
return hints
def extract_entities(prompt: str) -> list[str]:
"""Extract likely code entities from prompt."""
patterns = [
r'\b[A-Z][a-z]+(?:[A-Z][a-z]+)+\b', # CamelCase
r'\b[a-z]+(?:_[a-z]+)+\b', # snake_case
r'["\']([^"\']+)["\']', # Quoted terms
]
...Testing Requirements:
- Test entity extraction accuracy
- Test hint generation with various prompts
- Verify MCP commands are valid
Success Criteria:
- Entities extracted with >80% accuracy
- Hints guide toward quality checks
- MCP commands are executable
Implementation Details (Milestone 11.2 Complete):
- Implemented as
claude_indexer/hooks/planning/exploration.py - ExplorationHintsConfig with section toggles and max_entity_hints
- ExplorationHints output with hints, extracted_entities, mcp_commands
- ExplorationHintsGenerator with entity extraction patterns
- Supports CamelCase, snake_case, quoted terms, technical terms
- Tests: 21 unit tests in test_exploration.py
Objective: Integrate guidelines and hints into UserPromptSubmit hook.
| ID | Task | Priority | Status |
|---|---|---|---|
| 11.3.1 | Modify hooks/prompt_handler.py for Plan Mode |
HIGH | DONE |
| 11.3.2 | Implement guidelines injection | HIGH | DONE |
| 11.3.3 | Implement hints injection | HIGH | DONE |
| 11.3.4 | Add configuration toggle | MEDIUM | DONE |
| 11.3.5 | Measure injection latency (<50ms target) | MEDIUM | DONE |
Testing Requirements:
- Test full hook flow with Plan Mode
- Test injection timing
- Verify guidelines appear in context
Success Criteria:
- Guidelines injected for Plan Mode
- <50ms injection latency
- Claude follows guidelines
Implementation Details (Milestone 11.3 Complete):
hooks/prompt_handler.py- Main hook with Plan Mode detectionclaude_indexer/hooks/planning/injector.py- Coordinates injection- PlanContextInjectionConfig with all toggles and compact mode
- PlanContextInjector assembles guidelines + hints
- inject_plan_context() convenience function
- Tests: 22 unit tests in test_injector.py
Goal: Verify generated plans meet quality standards before user approval.
Objective: Post-generation verification of plan quality.
| ID | Task | Priority | Status |
|---|---|---|---|
| 12.1.1 | Create claude_indexer/hooks/plan_qa.py |
HIGH | DONE |
| 12.1.2 | Implement missing test detection | HIGH | DONE |
| 12.1.3 | Implement missing doc detection | HIGH | DONE |
| 12.1.4 | Implement duplicate check verification | HIGH | DONE |
| 12.1.5 | Create PlanQAResult dataclass |
MEDIUM | DONE |
| 12.1.6 | Generate human-readable feedback | MEDIUM | DONE |
| 12.1.7 | Integrate with plan output | MEDIUM | DONE |
PlanQAVerifier:
@dataclass
class PlanQAResult:
is_valid: bool = True
missing_tests: list[str] = field(default_factory=list)
missing_docs: list[str] = field(default_factory=list)
potential_duplicates: list[str] = field(default_factory=list)
architecture_warnings: list[str] = field(default_factory=list)
suggestions: list[str] = field(default_factory=list)
def has_issues(self) -> bool:
return bool(self.missing_tests or self.missing_docs or
self.potential_duplicates or self.architecture_warnings)
def format_feedback(self) -> str:
"""Format feedback for plan output."""
if not self.has_issues():
return "\n[Plan QA: All quality checks passed]"
lines = ["\n=== Plan QA Feedback ==="]
if self.missing_tests:
lines.append("\n[WARN] Missing Test Coverage:")
for item in self.missing_tests:
lines.append(f" - {item}")
...
return "\n".join(lines)
class PlanQAVerifier:
CODE_CHANGE_PATTERNS = re.compile(
r'(add|create|implement|modify)\s+(?:a\s+)?(function|class|component)',
re.IGNORECASE
)
def verify_plan(self, plan_text: str) -> PlanQAResult:
"""Verify a plan meets quality standards."""
result = PlanQAResult()
if self._needs_tests(plan_text) and not self._has_test_tasks(plan_text):
result.missing_tests.append("Plan modifies code but includes no test tasks")
result.suggestions.append("Add unit/integration test task")
if self._is_user_facing(plan_text) and not self._has_doc_tasks(plan_text):
result.missing_docs.append("User-facing changes without doc update")
if self._creates_new_code(plan_text) and not self._mentions_reuse_check(plan_text):
result.potential_duplicates.append("New code without explicit duplicate check")
return resultTesting Requirements:
- Test with plans missing tests
- Test with plans missing docs
- Test with complete plans
- Verify feedback formatting
Success Criteria:
- Detects missing test/doc tasks
- Generates actionable feedback
- <50ms verification latency
Implementation Details (Milestone 12.1 Complete):
- Created
claude_indexer/hooks/plan_qa.pywith:- PlanQAConfig dataclass with check toggles and strict mode settings
- PlanQAResult dataclass with has_issues(), format_feedback(), to_dict()
- PlanQAVerifier class with pattern-based detection for:
- CODE_CHANGE_PATTERNS (test coverage check)
- TEST_TASK_PATTERNS (test task detection)
- DOC_TASK_PATTERNS (doc task detection)
- USER_FACING_PATTERNS (user-facing change detection)
- REUSE_CHECK_PATTERNS (duplicate verification)
- ARCHITECTURE_CONCERN_PATTERNS (performance anti-patterns)
- verify_plan_qa() convenience function
- Tests: 50+ unit tests in test_plan_qa.py covering all scenarios
Objective: Integrate QA verification into the planning workflow.
| ID | Task | Priority | Status |
|---|---|---|---|
| 12.2.1 | Add QA check after guardrail validation | HIGH | DONE |
| 12.2.2 | Append QA feedback to plan output | HIGH | DONE |
| 12.2.3 | Track QA pass/fail metrics | LOW | DONE |
| 12.2.4 | Add QA override configuration | LOW | DONE |
Testing Requirements:
- Test end-to-end QA flow
- Test feedback appears in plan
- Test override configuration
Success Criteria:
- QA feedback visible to user
- Metrics tracked for analysis (Milestone 13.5)
- Override available for edge cases
Implementation Details (Milestone 12.2 Complete):
- Updated
claude_indexer/hooks/planning/injector.pywith:- Added qa_enabled and qa_config to PlanContextInjectionConfig
- Added verify_plan_output() method to PlanContextInjector
- QA configuration supports JSON serialization
- Updated
claude_indexer/hooks/__init__.pyto export Plan QA classes - PlanQAConfig provides override toggles:
- enabled: Master toggle for QA
- check_tests/check_docs/check_duplicates/check_architecture: Individual checks
- fail_on_missing_tests/fail_on_missing_docs: Strict mode settings
QA Metrics Tracking (Milestone 13.5):
- Added QA fields to
MetricSnapshotinclaude_indexer/ui/metrics/models.py:- qa_checks_passed, qa_issues_found, qa_missing_tests, qa_missing_docs
- qa_potential_duplicates, qa_architecture_warnings, qa_verification_time_ms
- Added
record_qa_verification()method toMetricsCollector - Added
get_qa_metrics_summary()for aggregated QA metrics - Updated
PlanQAVerifierwith optionalmetrics_collectorparameter - Tests: 15 unit tests in test_qa_metrics.py
Goal: Ensure production readiness with comprehensive testing and documentation.
Objective: End-to-end testing of Plan Mode integration.
| ID | Task | Priority | Status |
|---|---|---|---|
| 13.1.1 | Create tests/integration/test_plan_mode.py |
HIGH | DONE |
| 13.1.2 | Test full Plan Mode flow (detect -> augment -> validate -> revise) | HIGH | DONE |
| 13.1.3 | Test MCP tool integration | HIGH | DONE |
| 13.1.4 | Test issue tracker integration | MEDIUM | DONE |
| 13.1.5 | Test design doc indexing | MEDIUM | DONE |
| 13.1.6 | Create mock Claude Code Plan Mode responses | MEDIUM | DONE |
Implementation Notes (Milestone 13.1 Complete):
- Created comprehensive integration test suite with 47 tests
- Test classes: TestPlanModeDetectionIntegration, TestContextInjectionIntegration, TestGuardrailValidationIntegration, TestAutoRevisionIntegration, TestFullPlanModeFlow, TestDesignDocIndexingIntegration, TestPlanQAVerification
- Coverage: Plan Mode detection, context injection, guardrail validation, auto-revision, full pipeline flow, design doc parsing, QA verification
- Performance: All tests complete in <0.2 seconds
- All lint checks pass (black, isort, flake8, ruff)
Integration Test Scenarios:
class TestPlanModeIntegration:
def test_full_plan_mode_flow(self):
"""Test complete Plan Mode integration."""
# 1. Detect Plan Mode
# 2. Inject guidelines
# 3. Validate plan
# 4. Apply auto-revisions
# 5. Run QA check
# 6. Verify output
...
def test_plan_with_missing_tests(self):
"""Verify test tasks auto-added."""
...
def test_duplicate_detection(self):
"""Verify duplicate warning generated."""
...
def test_doc_requirement(self):
"""Verify doc tasks auto-added for user-facing changes."""
...Testing Requirements:
- >80% code coverage for new components
- All scenarios covered
- Performance benchmarks included
Success Criteria:
- All integration tests pass (47 tests)
- >80% coverage
- Performance within targets (<0.2s for all tests)
Objective: Meet all performance targets.
| ID | Task | Priority | Status |
|---|---|---|---|
| 13.2.1 | Profile Plan Mode detection latency | HIGH | DONE |
| 13.2.2 | Profile guidelines injection latency | HIGH | DONE |
| 13.2.3 | Profile validation latency | HIGH | DONE |
| 13.2.4 | Optimize hot paths | MEDIUM | DONE |
| 13.2.5 | Add caching where beneficial | MEDIUM | DONE |
| 13.2.6 | Create performance benchmarks | MEDIUM | DONE |
Performance Targets:
| Operation | Target | Budget |
|---|---|---|
| Plan Mode Detection | <10ms | Pattern matching |
| Guidelines Generation | <20ms | Template substitution |
| Exploration Hints | <30ms | Entity extraction |
| Plan Validation | <500ms | 5 rules |
| Auto-Revision | <200ms | Conflict checking |
| Plan QA | <50ms | Pattern matching |
| Total Overhead | <100ms | (excluding validation) |
Testing Requirements:
- Benchmark all operations
- Test under load
- Verify targets met
Implementation Details (Milestone 13.2 Complete):
- Replaced
time.time()withtime.perf_counter()across all Plan Mode files for sub-millisecond precision - Added LRU caching for CLAUDE.md project patterns loading (mtime-based invalidation)
- Added LRU caching for entity extraction in exploration hints (128-entry cache)
- Created comprehensive benchmark suite:
tests/benchmarks/test_plan_mode_performance.py
Files Modified:
claude_indexer/hooks/plan_mode_detector.py- Timing precisionclaude_indexer/hooks/planning/guidelines.py- Timing + LRU cacheclaude_indexer/hooks/planning/exploration.py- Timing + entity cacheclaude_indexer/hooks/planning/injector.py- Timing precisionclaude_indexer/hooks/plan_qa.py- Timing precisionclaude_indexer/ui/plan/guardrails/engine.py- Timing precisionclaude_indexer/ui/plan/guardrails/auto_revision.py- Timing precision
New Files:
tests/benchmarks/test_plan_mode_performance.py- 7 benchmark test classes
Benchmark Tests:
TestPlanModeDetectionPerformance- <10ms p95 targetTestGuidelinesGenerationPerformance- <20ms p95 targetTestExplorationHintsPerformance- <30ms p95 targetTestPlanQAPerformance- <50ms p95 targetTestGuardrailValidationPerformance- <500ms targetTestAutoRevisionPerformance- <200ms targetTestEndToEndPlanModePerformance- <100ms overhead targetTestMemoryUsage- <50MB peak usageTestScalabilityMetrics- Linear scaling verification
Success Criteria:
- All performance targets met
- No regression from baseline
- Clear metrics dashboard
Objective: Comprehensive documentation for Plan Mode integration.
| ID | Task | Priority | Status |
|---|---|---|---|
| 13.3.1 | Update README.md with Plan Mode features | HIGH | DONE |
| 13.3.2 | Create docs/PLAN_MODE.md comprehensive guide |
HIGH | DONE |
| 13.3.3 | Update HOOKS.md with Plan hooks | HIGH | DONE |
| 13.3.4 | Document all configuration options | MEDIUM | DONE |
| 13.3.5 | Create plan guardrail rule reference | MEDIUM | DONE |
| 13.3.6 | Add troubleshooting section | MEDIUM | DONE |
| 13.3.7 | Update CLAUDE.md with Plan guidelines | LOW | DONE |
Documentation Structure:
docs/
├── PLAN_MODE.md # Comprehensive Plan Mode guide
│ ├── Overview
│ ├── Configuration
│ ├── Quality Guardrails
│ ├── Auto-Revision Behavior
│ ├── MCP Context Integration
│ └── Troubleshooting
├── HOOKS.md # Updated with Plan hooks
└── memory-functions.md # Updated with new MCP tools
Implementation Details (Milestone 13.3 Complete):
- Created
docs/PLAN_MODE.mdwith ~700 lines covering:- Quick start and activation methods
- Detection methods (4 signals) with confidence scoring
- Context injection (guidelines + exploration hints)
- All 5 guardrail rules with examples and auto-fix behavior
- Auto-revision system and audit trail
- Complete configuration reference
- Troubleshooting section
- Performance metrics
- Overhauled
README.md(reduced from 781 to ~400 lines):- Problem/Solution led structure
- Feature highlights for all 4 major capabilities
- Simplified setup with one-command path
- Clean documentation links
- Updated
CLAUDE.mdwith Plan Mode usage section - Updated
docs/HOOKS.mdalready had Plan Mode sections (Milestones 7.2, 8.4)
Testing Requirements:
- All documentation accurate
- Examples tested and working
- No broken links
Success Criteria:
- New user understands Plan Mode in <10 minutes
- All features documented
- Examples are copy-paste ready
Objective: Validate the "magical" UX described in PRD.
| ID | Task | Priority | Status |
|---|---|---|---|
| 13.4.1 | Conduct user testing sessions | HIGH | DEFERRED - Manual Process |
| 13.4.2 | Collect feedback on plan quality | HIGH | DONE |
| 13.4.3 | Measure plan approval rate | MEDIUM | DONE |
| 13.4.4 | Iterate on findings format | MEDIUM | DONE |
| 13.4.5 | Add configuration for thoroughness level | LOW | DONE |
Implementation Details (Milestone 13.4 Complete):
- Extended
PlanAdoptionRecordwith feedback fields: approved, approved_at, rejection_reason, accuracy_rating, user_notes, revision_count - Added approval metrics to
MetricsReport: approval_rate, pending_approval_count, average_accuracy_rating, average_revision_count, rejection_reasons_summary() - Added feedback recording methods to
MetricsCollector: record_plan_approval(), record_plan_revision(), get_approval_rate_history(), get_quality_metrics_summary() - Created
claude_indexer/ui/plan/formatters.pywith:ThoroughnessLevelenum (minimal, standard, thorough, exhaustive)format_plan_findings_for_display()with thoroughness-aware outputformat_plan_findings_for_claude()for Claude consumptionSEVERITY_ICONSandCATEGORY_NAMESmappings
- Added
thoroughness_levelandgroup_findings_by_severitytoPlanGuardrailConfig - Added
format_for_display()andformat_for_claude()methods toPlanGuardrailResult - Tests: 54 unit tests covering all functionality
Success Metrics Validation:
- >90% of plans include test/doc tasks when appropriate (Phase 12 Plan QA)
- <10% of plans require user revision (pending user testing)
- Existing code reuse suggested in >80% of applicable cases (Phase 12 duplicate check)
- Users report plans "feel like senior engineer's work" (pending user testing)
Success Criteria:
- All code implementation complete
- User feedback collection enabled (infrastructure ready)
- UX validated through testing (pending task 13.4.1)
Objective: Complete deferred items from Phases 9 and 12.
| ID | Task | Priority | Status |
|---|---|---|---|
| 13.5.1 | Implement parallel rule execution (9.2.3) | MEDIUM | DONE |
| 13.5.2 | Implement QA metrics tracking (12.2.3) | LOW | DONE |
| 13.5.3 | Update Ticket Sync Service notes | LOW | DONE |
Implementation Details (Milestone 13.5 Complete):
Parallel Rule Execution (9.2.3):
- Updated
PlanGuardrailEngineConfigwithmax_parallel_workers(default: 4) - Added
_validate_parallel()method usingconcurrent.futures.ThreadPoolExecutor - Added
_validate_sequential()method (refactored from original code) - Modified
validate()to conditionally use parallel or sequential execution - Parallel mode executes rules concurrently with configurable worker count
- Error handling preserved in parallel mode (continue_on_error behavior)
- Tests: 8 tests for parallel execution behavior
QA Metrics Tracking (12.2.3):
- Extended
MetricSnapshotwith 7 QA-specific fields:qa_checks_passed,qa_issues_found,qa_missing_testsqa_missing_docs,qa_potential_duplicates,qa_architecture_warningsqa_verification_time_ms
- Added
record_qa_verification()toMetricsCollector:- Creates MetricSnapshot from PlanQAResult
- Calculates checks passed/failed counts
- Records as tier 2 (design-time) snapshot
- Added
get_qa_metrics_summary()toMetricsCollector:- Returns pass rate, average issues, issue breakdown
- Aggregates across all QA verification snapshots
- Updated
PlanQAVerifierwith optionalmetrics_collectorparameter:- Records metrics automatically after verification
- Accepts optional
plan_idfor correlation
- Backward compatible: old snapshots without QA fields load with defaults
- Tests: 15 tests for QA metrics functionality
Ticket Sync Service (8.3.9) - Deferred to Phase 14:
- Remains deferred due to complexity (new scheduler, Qdrant schema, sync state)
- Recommended for dedicated Phase 14: Background Services
- Current on-demand ticket fetching via MCP tools remains functional
Files Modified:
claude_indexer/ui/plan/guardrails/engine.py- Parallel executionclaude_indexer/ui/metrics/models.py- QA fields in MetricSnapshotclaude_indexer/ui/metrics/collector.py- QA metrics methodsclaude_indexer/hooks/plan_qa.py- Metrics integration
Test Coverage:
tests/unit/ui/plan/guardrails/test_engine.py- 8 parallel execution teststests/unit/ui/metrics/test_qa_metrics.py- 15 QA metrics tests
Goal: Establish comprehensive testing infrastructure for the mcp-qdrant-memory MCP server using Vitest.
PRD Reference: mcp-qdrant-memory/docs/PRD.md - Phase 14: MCP Server Enhancement
Objective: Set up Vitest testing framework with 60%+ baseline coverage, targeting 90-95% on core modules.
| ID | Task | Priority | Status |
|---|---|---|---|
| 14.1.1 | Add Vitest devDependencies to package.json | HIGH | DONE |
| 14.1.2 | Add test scripts to package.json | HIGH | DONE |
| 14.1.3 | Create vitest.config.ts with coverage thresholds | HIGH | DONE |
| 14.1.4 | Create test fixtures (entities, relations) | MEDIUM | DONE |
| 14.1.5 | Create planModeGuard.test.ts | HIGH | DONE |
| 14.1.6 | Create tokenCounter.test.ts | HIGH | DONE |
| 14.1.7 | Create validation.test.ts | HIGH | DONE |
| 14.1.8 | Create bm25Service.test.ts | HIGH | DONE |
| 14.1.9 | Fix BM25 vitest import compatibility | HIGH | DONE |
Implementation Details:
-
Configuration Files:
package.json- Added test scripts (test, test:watch, test:coverage, test:ui) and devDependencies (@vitest/coverage-v8, @vitest/ui)vitest.config.ts- Vitest configuration with v8 coverage provider, HTML/LCOV reporters
-
Test Files Created (207 tests total):
src/__tests__/planModeGuard.test.ts- 38 tests, 100% coveragesrc/__tests__/tokenCounter.test.ts- 34 tests, 96.57% coveragesrc/__tests__/validation.test.ts- 74 tests, 83.28% coveragesrc/__tests__/bm25Service.test.ts- 47 tests, 95.87% coveragesrc/__tests__/fixtures/- Test data for entities and relations
-
Bug Fix: Fixed BM25 library import compatibility issue where vitest SSR transforms imports differently than Node.js ESM, causing
BM25.default.default is not a functionerror. Solution handles both import behaviors.
Coverage Results:
| Module | Statements | Branches | Functions |
|---|---|---|---|
| plan-mode-guard.ts | 100% | 100% | 100% |
| tokenCounter.ts | 96.57% | 90.69% | 100% |
| validation.ts | 83.28% | 78.14% | 100% |
| bm25Service.ts | 95.87% | 87.71% | 100% |
Success Criteria:
- 207 tests passing
- >90% coverage on core modules (plan-mode-guard, tokenCounter, bm25Service)
- >80% coverage on validation module
- Build passes with no TypeScript errors
- Test execution <1 second
Objective: Add integration tests for MCP server functionality.
| ID | Task | Priority | Status |
|---|---|---|---|
| 14.2.1 | Create MCP tool integration tests | HIGH | DONE |
| 14.2.2 | Create Qdrant persistence tests | HIGH | DONE |
| 14.2.3 | Add mock Qdrant client for isolated testing | MEDIUM | DONE |
| 14.2.4 | Test hybrid search (semantic + BM25) | MEDIUM | DONE |
New Files Created:
src/__tests__/mocks/qdrantClient.mock.ts- Mock Qdrant client with in-memory storagesrc/__tests__/mocks/openaiClient.mock.ts- Mock OpenAI embeddings with deterministic generationsrc/__tests__/mocks/index.ts- Mock infrastructure exportssrc/__tests__/integration/qdrant.integration.test.ts- 45 tests for QdrantPersistencesrc/__tests__/integration/mcp-tools.integration.test.ts- 50 tests for MCP tool validationsrc/__tests__/integration/hybrid-search.integration.test.ts- 30 tests for BM25/hybrid search
Test Coverage:
- Total tests: 362 (207 unit + 155 integration)
- QdrantPersistence: Connection, Entity CRUD, Relation CRUD, Search, Scroll, Cache, Error handling, Multi-collection
- MCP Tools: Write tool validation, Read tool validation, Plan Mode access control, Collection parameter support
- Hybrid Search: BM25 keyword search, RRF fusion algorithm, Result processing, Unicode/special characters
- Mock infrastructure supports isolated testing without external dependencies
- All MCP tools have validation tests
- QdrantPersistence CRUD operations tested
- Hybrid search (semantic + BM25) fusion tested
- TypeScript build passes with no errors
- All 362 tests pass
Objective: Integrate testing into CI/CD pipeline.
| ID | Task | Priority | Status |
|---|---|---|---|
| 14.3.1 | Add GitHub Actions workflow for MCP tests | HIGH | DONE |
| 14.3.2 | Configure coverage thresholds in CI | MEDIUM | DONE |
| 14.3.3 | Add test status badge to README | LOW | DONE |
New Files Created:
mcp-qdrant-memory/.github/workflows/ci.yml- GitHub Actions CI workflow
Workflow Configuration:
- Triggers: Push and PR to main/master branches
- Concurrency: Cancel in-progress runs on same branch
- Jobs:
build: TypeScript compilation with artifact uploadtypecheck: Strict type validation (tsc --noEmit)test: Tests with coverage (Node.js 18, 20, 22 matrix)security: npm audit for vulnerability scanning
- Caching: npm dependencies cached for faster runs
- Artifacts: Coverage reports uploaded (Node 20)
README Updates:
- Added CI badge linking to GitHub Actions workflow
- CI workflow runs on push/PR to main/master
- Build job compiles TypeScript successfully
- Type check job validates types
- Test job runs on Node.js 18, 20, 22 matrix (362 tests)
- Coverage reports uploaded as artifacts
- Security audit job runs
- CI badge visible in README
Goal: Add ESLint, Prettier, pre-commit hooks, and governance documentation to mcp-qdrant-memory.
PRD Reference: mcp-qdrant-memory/docs/PRD.md - Section 4.3: Code Quality Tooling, Section 4.4: Documentation
Objective: Establish linting and formatting infrastructure for the MCP server.
| ID | Task | Priority | Status |
|---|---|---|---|
| 15.1.1 | Install ESLint, Prettier, Husky, lint-staged dependencies | HIGH | DONE |
| 15.1.2 | Create ESLint configuration (eslint.config.mjs) with TypeScript support |
HIGH | DONE |
| 15.1.3 | Create Prettier configuration (.prettierrc, .prettierignore) |
HIGH | DONE |
| 15.1.4 | Update package.json with lint/format scripts |
HIGH | DONE |
| 15.1.5 | Initialize Husky and configure pre-commit hook | MEDIUM | DONE |
| 15.1.6 | Update CI workflow with lint job | HIGH | DONE |
Implementation Details:
- ESLint v9 flat config with
typescript-eslint/recommendedTypeChecked - Relaxed rules for existing codebase (warnings for
any-related rules) - Prettier with double quotes, semicolons, 100 char width
- Pre-commit hook runs lint-staged (ESLint + Prettier on staged files)
- CI lint job runs
npm run lintandnpm run format:check
New npm Scripts:
{
"lint": "eslint src/",
"lint:fix": "eslint src/ --fix",
"format": "prettier --write .",
"format:check": "prettier --check .",
"typecheck": "tsc --noEmit"
}- ESLint configured and passing (0 errors, warnings allowed)
- Prettier configured and passing
- Pre-commit hooks functional
- CI lint job passing
Objective: Add contributor documentation and project governance files.
| ID | Task | Priority | Status |
|---|---|---|---|
| 15.2.1 | Create CONTRIBUTING.md with development guidelines |
HIGH | DONE |
| 15.2.2 | Create CHANGELOG.md with version history |
HIGH | DONE |
| 15.2.3 | Create LICENSE file (MIT) |
HIGH | DONE |
Implementation Details:
- CONTRIBUTING.md includes development setup, code style, testing, PR process
- CHANGELOG.md follows Keep a Changelog format
- LICENSE file matches package.json MIT declaration
- CONTRIBUTING.md present with development guidelines
- CHANGELOG.md present with version history
- LICENSE file present
Files Created:
mcp-qdrant-memory/eslint.config.mjs- ESLint configurationmcp-qdrant-memory/.prettierrc- Prettier configurationmcp-qdrant-memory/.prettierignore- Prettier ignore patternsmcp-qdrant-memory/.husky/pre-commit- Pre-commit hookmcp-qdrant-memory/CONTRIBUTING.md- Contributor guidemcp-qdrant-memory/CHANGELOG.md- Version historymcp-qdrant-memory/LICENSE- MIT license
Files Modified:
mcp-qdrant-memory/package.json- Added lint/format scripts, lint-staged configmcp-qdrant-memory/.github/workflows/ci.yml- Added lint job
CI/CD:
- Lint job added to GitHub Actions workflow
- Runs ESLint and Prettier checks on all PRs
- 0 errors required, warnings allowed for gradual cleanup
Goal: Complete CI/CD infrastructure for the mcp-qdrant-memory MCP server.
Status: ✅ Complete
Objective: Automated npm publishing on version tags.
Files Created:
mcp-qdrant-memory/.github/workflows/release.yml- Release workflow
Features:
- Triggers on version tags (v*..)
- Builds and tests before publishing
- npm publish with provenance for supply chain security
- Auto-generates GitHub releases with release notes
Objective: Automated dependency management.
Files Created:
mcp-qdrant-memory/.github/dependabot.yml- Dependabot configuration
Features:
- Weekly npm dependency updates (Monday)
- Dev dependencies grouped together
- PR limit of 10 to avoid noise
- Consistent commit message prefix (chore(deps))
MCP Server Progress:
- Phase 1 (Testing): 100% complete - 362 tests
- Phase 2 (CI/CD): 100% complete - CI, Release, Dependabot
- Phase 3 (Code Quality): 100% complete - ESLint, Prettier, Husky, docs
- Phase 4 (Error Handling): 0% - Not started
- Phase 5 (Security): 0% - Not started
- Phase 6 (Coverage): 100% complete - Integration tests
Overall MCP Server Progress: 65%
Goal: Complete Phase 4 (Error Handling) and Phase 5 (Security) for the mcp-qdrant-memory MCP server.
Status: ✅ Complete
PRD Reference: mcp-qdrant-memory/docs/PRD.md - Sections 4.5, 4.7, 4.8
Objective: Fix critical security issue where Qdrant API key was leaked to all fetch requests.
File Modified:
mcp-qdrant-memory/src/fetch-override.ts
Change: Previously, the fetch override added the Qdrant API key to ALL fetch requests globally, including third-party APIs like Voyage AI, OpenAI, Linear, and GitHub. This leaked the API key to external services.
Fix: Scoped API key injection to only Qdrant URLs using url.startsWith(qdrantUrl) check.
Tests Created:
mcp-qdrant-memory/src/__tests__/fetch-override.test.ts(18 tests)
- Qdrant API key only added to Qdrant URL requests
- Voyage AI requests do NOT contain Qdrant API key
- OpenAI/Linear/GitHub requests do NOT contain Qdrant API key
- All 18 security tests passing
Objective: Add input size limits to prevent DoS attacks via oversized payloads.
File Modified:
mcp-qdrant-memory/src/validation.ts
Constants Added:
export const INPUT_LIMITS = {
QUERY_MAX_LENGTH: 10000,
ENTITIES_MAX_COUNT: 1000,
ENTITY_NAME_MAX_LENGTH: 500,
OBSERVATIONS_MAX_COUNT: 100,
OBSERVATION_MAX_LENGTH: 50000,
RELATIONS_MAX_COUNT: 1000,
ENTITY_NAMES_MAX_COUNT: 1000,
} as const;Validators Updated:
validateSearchSimilarRequest- query lengthvalidateSearchDocsRequest- query lengthvalidateCreateEntitiesRequest- entity count, name length, observations count/lengthvalidateAddObservationsRequest- entity name, contents count/lengthvalidateCreateRelationsRequest- relations countvalidateDeleteEntitiesRequest- entity names count
Tests Extended:
mcp-qdrant-memory/src/__tests__/validation.test.ts(+22 tests)
- All validators enforce size limits
- Clear error messages indicate limit exceeded
- All 22 new validation tests passing
Objective: Fix Unicode stripping in BM25 text processing.
File Modified:
mcp-qdrant-memory/src/bm25/bm25Service.ts
Change: Replaced ASCII-only regex with Unicode-aware regex.
// Before (stripped Unicode)
.replace(/[^\w\s]/g, ' ')
// After (preserves Unicode letters and numbers)
.replace(/[^\p{L}\p{N}\s]/gu, ' ')Tests Extended:
mcp-qdrant-memory/src/__tests__/bm25Service.test.ts(+7 Unicode tests)
Languages Tested:
- Chinese (用户认证)
- Japanese (ユーザー認証)
- Korean (사용자 인증)
- Cyrillic (аутентификации)
- Arabic (مصادقة)
- Accented Latin (autenticación)
- Unicode characters preserved in BM25 indexing
- All 7 Unicode tests passing
- Mixed ASCII/Unicode content searchable
Objective: Preserve error context when wrapping errors in McpError.
File Modified:
mcp-qdrant-memory/src/index.ts
Change: Added error cause to McpError for stack trace preservation.
} catch (error) {
const mcpError = new McpError(
ErrorCode.InternalError,
error instanceof Error ? error.message : String(error)
);
// Preserve original error for debugging
(mcpError as any).cause = error;
throw mcpError;
}- Original error attached as cause
- Stack traces preserved for debugging
- All existing tests still passing
Files Modified:
mcp-qdrant-memory/src/fetch-override.ts- Scoped API key injectionmcp-qdrant-memory/src/validation.ts- Added INPUT_LIMITS and size validationmcp-qdrant-memory/src/bm25/bm25Service.ts- Unicode-aware regexmcp-qdrant-memory/src/index.ts- Error context preservation
Files Created:
mcp-qdrant-memory/src/__tests__/fetch-override.test.ts- Security tests
Tests Added:
- 18 fetch-override security tests
- 22 input validation size limit tests
- 7 BM25 Unicode support tests
- Total: 47 new tests
MCP Server Progress Update:
- Phase 1 (Testing): 100% complete - 362 tests → 407 tests
- Phase 2 (CI/CD): 100% complete
- Phase 3 (Code Quality): 100% complete
- Phase 4 (Error Handling): 100% complete ✅ NEW
- Phase 5 (Security): 100% complete ✅ NEW
- Phase 6 (Coverage): 100% complete
Overall MCP Server Progress: 100% ✅
Objective: Complete remaining P1/P2 items from the MCP server PRD (sections 4.5 and 4.6).
Status: Complete ✅
Objective: Handle process termination signals properly.
| ID | Task | Priority | Status |
|---|---|---|---|
| 18.1.1 | Create src/shutdown.ts with signal handler infrastructure |
HIGH | DONE |
| 18.1.2 | Implement SIGTERM handler with 10-second grace period | HIGH | DONE |
| 18.1.3 | Implement SIGINT handler (same logic as SIGTERM) | HIGH | DONE |
| 18.1.4 | Add cleanup actions: cancel pending requests, flush logs | HIGH | DONE |
| 18.1.5 | Integrate shutdown manager into src/index.ts |
HIGH | DONE |
| 18.1.6 | Create src/__tests__/shutdown.test.ts with tests |
HIGH | DONE |
Implementation Details:
- Created
ShutdownManagerclass with:- Signal handler registration (SIGTERM/SIGINT)
- Cleanup callback registration and execution
- Pending request tracking via AbortController
- Configurable grace period (default 10s)
- Global singleton access via
getShutdownManager()
- 21 tests covering all shutdown functionality
Success Criteria:
- Graceful shutdown handles SIGTERM/SIGINT properly
- Cleanup callbacks executed in order
- Pending requests cancelled on shutdown
Objective: Add configurable timeouts for HTTP calls to external APIs.
| ID | Task | Priority | Status |
|---|---|---|---|
| 18.2.1 | Create src/http-client.ts with fetchWithTimeout() utility |
HIGH | DONE |
| 18.2.2 | Update Voyage AI embedding calls (30s timeout) | HIGH | DONE |
| 18.2.3 | Update Linear API calls (10s timeout) | HIGH | DONE |
| 18.2.4 | Update GitHub API calls (10s timeout) | HIGH | DONE |
| 18.2.5 | Create src/__tests__/http-client.test.ts with tests |
HIGH | DONE |
Implementation Details:
- Created
fetchWithTimeout()utility with:- AbortController-based timeout mechanism
- TimeoutError and ShutdownAbortError classes
- Integration with ShutdownManager for request tracking
- DEFAULT_TIMEOUTS constants for each API
- Environment variable support for custom timeouts
- Updated all external API calls in qdrant.ts:
- Voyage AI: 30s timeout
- Linear API: 10s timeout (2 call sites)
- GitHub API: 10s timeout (4 call sites)
- 23 tests covering timeout behavior
Success Criteria:
- External API calls have configurable timeouts
- TimeoutError thrown on timeout
- Requests properly cancelled and cleaned up
Objective: Implement LRU cleanup for per-collection BM25 services.
| ID | Task | Priority | Status |
|---|---|---|---|
| 18.3.1 | Add lastAccessed timestamp to BM25 service tracking |
MEDIUM | DONE |
| 18.3.2 | Implement cleanupStaleServices() method |
MEDIUM | DONE |
| 18.3.3 | Add periodic cleanup timer (every 5 minutes) | MEDIUM | DONE |
| 18.3.4 | Add max service count limit (10 collections) | MEDIUM | DONE |
| 18.3.5 | Add cleanup on graceful shutdown | MEDIUM | DONE |
| 18.3.6 | Add tests for LRU cleanup behavior | MEDIUM | DONE |
Implementation Details:
- Added
BM25ServiceEntryinterface withlastAccessedtimestamp - Configuration constants:
BM25_MAX_SERVICES = 10(max cached services)BM25_TTL_MS = 30 minutes(stale threshold)BM25_CLEANUP_INTERVAL_MS = 5 minutes
- LRU cleanup algorithm:
- Remove entries not accessed within TTL
- If count > max, evict oldest by lastAccessed
- Public methods:
getBM25ServiceCount()for monitoringstopBM25Cleanup()for graceful shutdown
- 4 integration tests for BM25 service management
Success Criteria:
- BM25 services cleaned up based on LRU policy
- Max service count enforced
- Cleanup stops on shutdown
Files Created:
mcp-qdrant-memory/src/shutdown.ts- Graceful shutdown managermcp-qdrant-memory/src/http-client.ts- HTTP client with timeout supportmcp-qdrant-memory/src/__tests__/shutdown.test.ts- Shutdown testsmcp-qdrant-memory/src/__tests__/http-client.test.ts- HTTP client tests
Files Modified:
mcp-qdrant-memory/src/index.ts- Shutdown integration, cleanup methodsmcp-qdrant-memory/src/persistence/qdrant.ts- BM25 LRU cleanup, fetchWithTimeout usagemcp-qdrant-memory/src/__tests__/integration/qdrant.integration.test.ts- BM25 tests
Tests Added:
- 21 shutdown tests
- 23 http-client tests
- 4 BM25 management tests
- Total: 48 new tests
PRD Coverage:
- 4.5.2 Timeout Configuration - Complete
- 4.6.1 Graceful Shutdown - Complete
- 4.6.2 BM25 Service Cleanup - Complete
Deferred to Future Phase:
- 4.5.1 Result Type Pattern - Requires significant refactoring
- 4.6.3 Connection Lifecycle Management - Health checks, reconnection logic
Objective: Implement structured logging throughout the MCP server codebase, replacing all console.error/console.log calls with a consistent, configurable logging system.
Status: Complete ✅
Objective: Create a structured logging system with module-specific loggers.
| ID | Task | Priority | Status |
|---|---|---|---|
| 19.1.1 | Create src/logger.ts with structured logger implementation |
HIGH | DONE |
| 19.1.2 | Implement 4 log levels (debug, info, warn, error) | HIGH | DONE |
| 19.1.3 | Add JSON output format (LOG_FORMAT=json) | HIGH | DONE |
| 19.1.4 | Add human-readable output format (default) | MEDIUM | DONE |
| 19.1.5 | Implement LOG_LEVEL environment variable filtering | MEDIUM | DONE |
| 19.1.6 | Create pre-configured module loggers | MEDIUM | DONE |
| 19.1.7 | Add child logger support with context inheritance | LOW | DONE |
| 19.1.8 | Create src/__tests__/logger.test.ts with tests |
HIGH | DONE |
Implementation Details:
- Created
Loggerclass with:- 4 log levels: debug, info, warn, error
- JSON structured output (LOG_FORMAT=json)
- Human-readable output (default)
- Log level filtering via LOG_LEVEL env var
- Child logger support with context inheritance
- Error serialization with stack traces
- Pre-configured module loggers:
logger(general MCP server)qdrantLogger(database operations)bm25Logger(keyword search)validationLogger(input validation)planModeLogger(plan mode access control)shutdownLogger(shutdown lifecycle)configLogger(configuration loading)ignoreFilterLogger(claudeignore filtering)
- 29 tests covering all logger functionality
Success Criteria:
- Structured logging with consistent format
- Configurable log level filtering
- Module-specific loggers for better debugging
Objective: Replace all console.error/console.log calls with structured logging.
| ID | Task | Priority | Status |
|---|---|---|---|
| 19.2.1 | Integrate logger into src/config.ts |
HIGH | DONE |
| 19.2.2 | Integrate logger into src/validation.ts |
HIGH | DONE |
| 19.2.3 | Integrate logger into src/bm25/bm25Service.ts |
HIGH | DONE |
| 19.2.4 | Integrate logger into src/plan-mode-guard.ts |
MEDIUM | DONE |
| 19.2.5 | Integrate logger into src/shutdown.ts |
HIGH | DONE |
| 19.2.6 | Integrate logger into src/claudeignore/filter.ts |
MEDIUM | DONE |
| 19.2.7 | Integrate logger into src/index.ts |
HIGH | DONE |
| 19.2.8 | Integrate logger into src/persistence/qdrant.ts |
HIGH | DONE |
| 19.2.9 | Update src/__tests__/shutdown.test.ts for new format |
MEDIUM | DONE |
Implementation Details:
- Replaced 60+ console.error/console.log calls across 8 production files
- Each module uses its designated logger (e.g., qdrantLogger for database operations)
- Debug-level logging for verbose debugging (e.g., relation processing, batch progress)
- Info-level logging for normal operations (e.g., server startup, BM25 initialization)
- Warn-level logging for warnings (e.g., ignore file not found, safety breaks)
- Error-level logging for errors with stack traces
Success Criteria:
- All console.error/console.log calls replaced
- All 503 tests passing
- TypeScript clean (no errors)
- Prettier formatted
Files Created:
mcp-qdrant-memory/src/logger.ts- Structured logger implementation (198 lines)mcp-qdrant-memory/src/__tests__/logger.test.ts- Logger test suite (453 lines)
Files Modified:
mcp-qdrant-memory/src/config.ts- Environment variable error loggingmcp-qdrant-memory/src/validation.ts- Debug-level validation loggingmcp-qdrant-memory/src/bm25/bm25Service.ts- BM25 operation loggingmcp-qdrant-memory/src/plan-mode-guard.ts- Plan mode state change loggingmcp-qdrant-memory/src/shutdown.ts- Shutdown lifecycle loggingmcp-qdrant-memory/src/claudeignore/filter.ts- Ignore pattern warning loggingmcp-qdrant-memory/src/index.ts- Server lifecycle and auto-reduce loggingmcp-qdrant-memory/src/persistence/qdrant.ts- Database operation loggingmcp-qdrant-memory/src/__tests__/shutdown.test.ts- Updated for new log format
Tests Added:
- 29 logger tests
- Total MCP Server Tests: 503
PRD Coverage:
- 4.5.3 Structured Logging - Complete ✅
| # | Rule | Severity | Detection | Auto-Fix |
|---|---|---|---|---|
| 1 | PLAN.TEST_REQUIREMENT |
MEDIUM | Feature task without test dependency | Add test task |
| 2 | PLAN.DOC_REQUIREMENT |
LOW | User-facing change without doc task | Add doc task |
| # | Rule | Severity | Detection | Auto-Fix |
|---|---|---|---|---|
| 1 | PLAN.DUPLICATE_DETECTION |
HIGH | Semantic similarity >70% with existing code | Modify task to reference existing |
| # | Rule | Severity | Detection | Auto-Fix |
|---|---|---|---|---|
| 1 | PLAN.ARCHITECTURAL_CONSISTENCY |
MEDIUM | File paths outside established patterns | Add warning to task |
| # | Rule | Severity | Detection | Auto-Fix |
|---|---|---|---|---|
| 1 | PLAN.PERFORMANCE_PATTERN |
LOW | Known anti-patterns (N+1, no caching) | Add performance note |
- Phase 7: Plan Mode Detection & Hook Infrastructure (foundation)
- Phase 9: Quality Guardrails Framework (core value)
- Phase 10: Auto-Revision System (key differentiator)
- Phase 11: Prompt Augmentation (guidance injection)
- Phase 8: MCP Context Integration (rich context)
- Phase 12: Plan QA Verification (quality assurance)
- Phase 13: Polish, Testing & Documentation (production readiness)
Phase 7 (Detection/Hooks)
|
+----------------------+
v v
Phase 9 (Guardrails) Phase 8 (MCP Context)
| |
v |
Phase 10 (Auto-Revision) <----+
|
v
Phase 11 (Prompt Augmentation)
|
v
Phase 12 (Plan QA)
|
v
Phase 13 (Polish/Testing/Docs)
| Risk | Impact | Mitigation |
|---|---|---|
| False positives annoying | HIGH | Configurable thresholds, easy overrides |
| Performance too slow | MEDIUM | Parallel validation, caching |
| Auto-revision conflicts | HIGH | Conflict detection, iteration limits |
| Issue tracker API failures | MEDIUM | Graceful degradation, caching |
| Plan Mode detection fails | HIGH | Multiple detection methods, fallbacks |
- Plan Mode detected with >95% accuracy
- <100ms overhead for plan augmentation (Phase 11 complete: <50ms)
- >90% of plans include test/doc tasks when appropriate (Phase 12 Plan QA)
- Existing code reuse suggested in >80% of applicable cases (Phase 12 duplicate check)
- <10% of plans require user revision before approval (pending user testing)
- All 5 guardrail rules implemented and tested
- MCP tools for docs and tickets functional
- Documentation complete (Phase 13.3)
- Plan QA verification implemented (Phase 12)
- Feedback collection infrastructure (Phase 13.4)
- Thoroughness level configuration (Phase 13.4)
Generated from PRD.md and TDD.md (Plan Mode Integration)