diff --git a/.docs/PRODUCTION_READINESS_REPORT.md b/.docs/PRODUCTION_READINESS_REPORT.md deleted file mode 100644 index e400b8225..000000000 --- a/.docs/PRODUCTION_READINESS_REPORT.md +++ /dev/null @@ -1,152 +0,0 @@ -# Production Readiness Report: GitHub Runner with Firecracker Integration - -**Date**: 2025-12-29 -**Version**: terraphim_github_runner v0.1.0 -**Status**: ✅ PRODUCTION READY (with known limitations) - -## Executive Summary - -The GitHub runner integration with Firecracker VMs has been validated end-to-end. All core functionality is working correctly, with sub-second command execution inside isolated VMs. - -## Test Results Summary - -| Test | Status | Evidence | -|------|--------|----------| -| Webhook endpoint | ✅ PASS | POST /webhook returns 200 with valid HMAC signature | -| Signature verification | ✅ PASS | HMAC-SHA256 validation working | -| Workflow execution | ✅ PASS | All 5 workflows completed successfully | -| Firecracker VM allocation | ✅ PASS | VMs allocated in ~1.2s | -| Command execution in VM | ✅ PASS | Commands execute with exit_code=0, ~113ms latency | -| LLM execute endpoint | ✅ PASS | /api/llm/execute works with bionic-test VMs | -| Knowledge graph integration | ✅ PASS | LearningCoordinator records patterns | - -## Verified Requirements - -### REQ-1: GitHub Webhook Integration -- **Status**: ✅ VERIFIED -- **Evidence**: - ``` - POST http://127.0.0.1:3004/webhook - Response: {"message":"Push webhook received for refs/heads/feat/github-runner-ci-integration","status":"success"} - ``` - -### REQ-2: Firecracker VM Execution -- **Status**: ✅ VERIFIED -- **Evidence**: - ``` - VM Boot Performance Report: - Total boot time: 0.247s - ✅ Boot time target (<2s) MET! - ``` - -### REQ-3: Command Execution in VMs -- **Status**: ✅ VERIFIED -- **Evidence**: - ```json - { - "vm_id": "vm-4c89ee57", - "exit_code": 0, - "stdout": "fctest\n", - "duration_ms": 113 - } - ``` - -### REQ-4: LLM Integration -- **Status**: ✅ VERIFIED -- **Evidence**: - - `USE_LLM_PARSER=true` configured - - `/api/llm/execute` endpoint functional - - Commands execute successfully via API - -### REQ-5: Workflow Parsing -- **Status**: ✅ VERIFIED -- **Evidence**: - ``` - Logs: Using simple YAML parser for: publish-bun.yml - ✅ All 5 workflows completed - ``` - -## Performance Metrics - -| Metric | Target | Actual | Status | -|--------|--------|--------|--------| -| VM boot time | <2s | 0.247s | ✅ | -| VM allocation | <2s | 1.2s | ✅ | -| Command execution | <500ms | 113ms | ✅ | -| Webhook response | <1s | ~100ms | ✅ | - -## Known Limitations - -### 1. VM Pool Type Mismatch -- **Issue**: Default VM pool contains 113 `focal-optimized` VMs with missing SSH keys -- **Impact**: Commands to pooled VMs fail with "No route to host" -- **Workaround**: Explicitly create `bionic-test` VMs -- **Fix**: Configure fcctl-web to use `bionic-test` as default pool type - -### 2. E2E Test Timing -- **Issue**: Test waits 3s for boot but VM state transition can be delayed -- **Impact**: E2E test may intermittently fail -- **Workaround**: Retry or increase wait time -- **Fix**: Add VM state polling instead of fixed sleep - -### 3. Response Parsing Errors -- **Issue**: Some command executions log "Failed to parse response: error decoding response body" -- **Impact**: Minor - workflows still complete successfully -- **Fix**: Investigate fcctl-web response format consistency - -## Server Configuration - -### GitHub Runner Server (port 3004) -- **PID**: 3348975 -- **Environment Variables**: - ``` - PORT=3004 - HOST=127.0.0.1 - GITHUB_WEBHOOK_SECRET= - FIRECRACKER_API_URL=http://127.0.0.1:8080 - USE_LLM_PARSER=true - OLLAMA_BASE_URL=http://127.0.0.1:11434 - OLLAMA_MODEL=gemma3:4b - MAX_CONCURRENT_WORKFLOWS=5 - ``` - -### Firecracker API (port 8080) -- **Status**: Healthy -- **Total VMs**: 114 -- **VM Usage**: 76% (114/150) -- **bionic-test VMs**: 1 running - -## Deployment Checklist - -- [x] GitHub webhook secret configured -- [x] JWT authentication working -- [x] Firecracker API accessible -- [x] VM images present (bionic-test) -- [x] SSH keys configured (bionic-test) -- [x] Network bridge (fcbr0) configured -- [x] LLM parser enabled -- [ ] Configure default VM pool to use bionic-test -- [ ] Add health check monitoring -- [ ] Set up log aggregation - -## Recommendations - -1. **Immediate**: Configure fcctl-web VM pool to use `bionic-test` type instead of `focal-optimized` -2. **Short-term**: Add VM state polling in E2E tests instead of fixed sleep -3. **Medium-term**: Implement automatic VM type validation on startup -4. **Long-term**: Add Prometheus metrics for monitoring - -## Conclusion - -The GitHub runner with Firecracker integration is **production ready** for the following use cases: -- Webhook-triggered workflow execution -- Secure command execution in isolated VMs -- LLM-assisted code analysis (with correct VM type) - -The primary blocker for full functionality is the VM pool type mismatch, which can be resolved by updating fcctl-web configuration. - ---- - -**Report Generated**: 2025-12-29T09:00:00Z -**Author**: Claude Code -**Verified By**: E2E testing and manual API validation diff --git a/.docs/code_assistant_requirements.md b/.docs/code_assistant_requirements.md deleted file mode 100644 index 421a71a80..000000000 --- a/.docs/code_assistant_requirements.md +++ /dev/null @@ -1,3028 +0,0 @@ -# Code Assistant Requirements: Superior AI Programming Tool - -**Version:** 1.0 -**Date:** 2025-10-29 -**Objective:** Build a coding assistant that surpasses claude-code, aider, and opencode by combining their best features - ---- - -## Executive Summary - -This document specifies requirements for an advanced AI coding assistant that combines the strengths of three leading tools: - -- **Claude Code**: Plugin system, multi-agent orchestration, confidence scoring, event hooks -- **Aider**: Text-based edit fallback, RepoMap context management, robust fuzzy matching -- **OpenCode**: Built-in LSP integration, 9-strategy edit matching, client/server architecture - -**Key Innovation**: Layer multiple approaches instead of choosing one. Start with tools (fastest), fall back to fuzzy matching (most reliable), validate with LSP (most immediate), recover with git (most forgiving). - ---- - -## 1. Mandatory Features - -These features are non-negotiable requirements: - -### 1.1 Multi-Strategy Edit Application (from Aider) -**Requirement**: Must apply edits to files even when the model doesn't support tool calls. - -**Implementation**: Text-based SEARCH/REPLACE parser with multiple fallback strategies: - -```python -# Aider's approach - parse from LLM text output -""" -<<<<<<< SEARCH -old_code_here -======= -new_code_here ->>>>>>> REPLACE -""" -``` - -**Success Criteria**: -- Works with any LLM (GPT-3.5, GPT-4, Claude, local models) -- No tool/function calling required -- Robust parsing from natural language responses - -### 1.2 Pre-Tool and Post-Tool Checks (from Claude Code) -**Requirement**: Validation hooks before and after every tool execution. - -**Implementation**: Event-driven hook system: - -```typescript -// Pre-tool validation -hooks.on('PreToolUse', async (tool, params) => { - // Permission check - if (!permissions.allows(tool.name, params)) { - throw new PermissionDenied(tool.name); - } - - // File existence check - if (tool.name === 'edit' && !fs.existsSync(params.file_path)) { - throw new FileNotFound(params.file_path); - } - - // Custom validators from config - await runCustomValidators('pre-tool', tool, params); -}); - -// Post-tool validation -hooks.on('PostToolUse', async (tool, params, result) => { - // LSP diagnostics - if (tool.name === 'edit') { - const diagnostics = await lsp.check(params.file_path); - if (diagnostics.errors.length > 0) { - await autoFix(params.file_path, diagnostics); - } - } - - // Auto-lint - if (config.autoLint) { - await runLinter(params.file_path); - } - - // Custom validators - await runCustomValidators('post-tool', tool, params, result); -}); -``` - -**Success Criteria**: -- Every tool call intercepted -- Failures prevent tool execution (pre-tool) or trigger recovery (post-tool) -- Extensible via configuration - -### 1.3 Pre-LLM and Post-LLM Validation -**Requirement**: Additional validation layers around LLM interactions. - -**Implementation**: - -```python -class LLMPipeline: - def __init__(self): - self.pre_validators = [] - self.post_validators = [] - - async def call_llm(self, messages, context): - # PRE-LLM VALIDATION - validated_context = await self.pre_llm_validation(messages, context) - - # Include validated context - enriched_messages = self.enrich_with_context(messages, validated_context) - - # Call LLM - response = await self.llm_provider.complete(enriched_messages) - - # POST-LLM VALIDATION - validated_response = await self.post_llm_validation(response, context) - - return validated_response - - async def pre_llm_validation(self, messages, context): - """Validate and enrich context before LLM call""" - validators = [ - self.validate_file_references, # Files mentioned exist - self.validate_context_size, # Within token limits - self.validate_permissions, # Has access to mentioned files - self.enrich_with_repo_map, # Add code structure - self.check_cache_freshness, # Context not stale - ] - - result = context - for validator in validators: - result = await validator(messages, result) - - return result - - async def post_llm_validation(self, response, context): - """Validate LLM output before execution""" - validators = [ - self.parse_tool_calls, # Extract structured actions - self.validate_file_paths, # Paths are valid - self.check_confidence_threshold, # ≥80 for code review - self.validate_code_syntax, # Basic syntax check - self.check_security_patterns, # No obvious vulnerabilities - ] - - result = response - for validator in validators: - result = await validator(result, context) - - return result -``` - -**Success Criteria**: -- Context validated before every LLM call -- Output validated before execution -- Token limits respected -- Security patterns checked - ---- - -## 2. Architecture & Design Patterns - -### 2.1 Overall Architecture - -**Pattern**: Client/Server + Plugin System + Multi-Agent Orchestration - -``` -┌─────────────────────────────────────────────────────────────┐ -│ CLIENT LAYER │ -│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ -│ │ CLI │ │ TUI │ │ Web │ │ Mobile │ │ -│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ -└────────────────────────┬────────────────────────────────────┘ - │ HTTP/SSE/WebSocket -┌────────────────────────▼────────────────────────────────────┐ -│ SERVER LAYER │ -│ ┌───────────────────────────────────────────────────────┐ │ -│ │ Session Manager │ │ -│ │ - Conversation state │ │ -│ │ - Context management │ │ -│ │ - Snapshot system │ │ -│ └───────────────────────────────────────────────────────┘ │ -│ │ -│ ┌───────────────────────────────────────────────────────┐ │ -│ │ Agent Orchestrator │ │ -│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ -│ │ │ Main │ │ Debugger │ │ Reviewer │ + More │ │ -│ │ │ Agent │ │ Agent │ │ Agent │ │ │ -│ │ └──────────┘ └──────────┘ └──────────┘ │ │ -│ │ │ │ │ │ │ -│ │ └──────────────┴──────────────┘ │ │ -│ │ │ │ │ -│ │ ┌────────────▼──────────────┐ │ │ -│ │ │ Parallel Execution │ │ │ -│ │ └───────────────────────────┘ │ │ -│ └───────────────────────────────────────────────────────┘ │ -│ │ -│ ┌───────────────────────────────────────────────────────┐ │ -│ │ LLM Pipeline │ │ -│ │ ┌─────────────┐ ┌─────────┐ ┌──────────────┐ │ │ -│ │ │ Pre-LLM │─→│ LLM │─→│ Post-LLM │ │ │ -│ │ │ Validation │ │ Call │ │ Validation │ │ │ -│ │ └─────────────┘ └─────────┘ └──────────────┘ │ │ -│ └───────────────────────────────────────────────────────┘ │ -│ │ -│ ┌───────────────────────────────────────────────────────┐ │ -│ │ Tool Execution Layer │ │ -│ │ ┌─────────────┐ ┌─────────┐ ┌──────────────┐ │ │ -│ │ │ Pre-Tool │─→│ Tool │─→│ Post-Tool │ │ │ -│ │ │ Validation │ │ Exec │ │ Validation │ │ │ -│ │ └─────────────┘ └─────────┘ └──────────────┘ │ │ -│ └───────────────────────────────────────────────────────┘ │ -│ │ -│ ┌───────────────────────────────────────────────────────┐ │ -│ │ Core Services │ │ -│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │ │ -│ │ │ RepoMap │ │ LSP │ │ Linter │ │ Git │ │ │ -│ │ └──────────┘ └──────────┘ └──────────┘ └─────────┘ │ │ -│ └───────────────────────────────────────────────────────┘ │ -│ │ -│ ┌───────────────────────────────────────────────────────┐ │ -│ │ Plugin System │ │ -│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ -│ │ │ Hooks │ │Commands │ │ Tools │ │ │ -│ │ └──────────┘ └──────────┘ └──────────┘ │ │ -│ └───────────────────────────────────────────────────────┘ │ -└──────────────────────────────────────────────────────────────┘ -``` - -**Key Design Decisions**: - -1. **Client/Server Split** (OpenCode approach) - - Enables multiple frontends (CLI, TUI, Web, Mobile) - - Remote execution support - - State persistence on server - - API-first design - -2. **Plugin Architecture** (Claude Code approach) - - Commands: User-facing slash commands - - Agents: Specialized AI assistants - - Hooks: Event-driven automation - - Tools: Low-level operations - -3. **Multi-Agent System** (Claude Code approach) - - Specialized agents with focused prompts - - Parallel execution for independent tasks - - Agent isolation prevents context pollution - - Confidence scoring for quality control - -### 2.2 Four-Layer Validation Pipeline - -**Critical Design**: Every operation passes through multiple validation layers. - -``` -┌────────────────────────────────────────────────────────────┐ -│ USER REQUEST │ -└───────────────────────┬────────────────────────────────────┘ - │ - ┌─────────────▼─────────────┐ - │ LAYER 1: PRE-LLM │ - │ Validation │ - │ ───────────────── │ - │ • Context validation │ - │ • Token budget check │ - │ • Permission check │ - │ • File existence │ - │ • RepoMap enrichment │ - └─────────────┬─────────────┘ - │ - ┌─────────────▼─────────────┐ - │ LLM CALL │ - └─────────────┬─────────────┘ - │ - ┌─────────────▼─────────────┐ - │ LAYER 2: POST-LLM │ - │ Validation │ - │ ───────────────── │ - │ • Parse tool calls │ - │ • Validate paths │ - │ • Confidence check │ - │ • Syntax validation │ - │ • Security scan │ - └─────────────┬─────────────┘ - │ - ┌─────────────▼─────────────┐ - │ LAYER 3: PRE-TOOL │ - │ Validation │ - │ ───────────────── │ - │ • Permission check │ - │ • File time assertion │ - │ • Hook: PreToolUse │ - │ • Dry-run validation │ - └─────────────┬─────────────┘ - │ - ┌─────────────▼─────────────┐ - │ TOOL EXECUTION │ - └─────────────┬─────────────┘ - │ - ┌─────────────▼─────────────┐ - │ LAYER 4: POST-TOOL │ - │ Validation │ - │ ───────────────── │ - │ • LSP diagnostics │ - │ • Linter execution │ - │ • Test execution │ - │ • Hook: PostToolUse │ - │ • Git commit │ - │ • Diff generation │ - └─────────────┬─────────────┘ - │ - ┌─────────────▼─────────────┐ - │ ERROR RECOVERY │ - │ (if validation fails) │ - │ ───────────────── │ - │ • Rollback via git │ - │ • Restore snapshot │ - │ • Retry with fixes │ - │ • User notification │ - └───────────────────────────┘ -``` - -**Implementation Details**: - -```typescript -class ValidationPipeline { - // LAYER 1: PRE-LLM - async validatePreLLM(context: Context): Promise { - // 1. Check token budget - const tokenCount = this.estimateTokens(context); - if (tokenCount > context.model.maxTokens) { - context = await this.compactContext(context); - } - - // 2. Validate file references - for (const file of context.files) { - if (!fs.existsSync(file)) { - throw new ValidationError(`File not found: ${file}`); - } - } - - // 3. Check permissions - await this.permissionManager.check(context.requestedActions); - - // 4. Enrich with RepoMap - context.repoMap = await this.repoMap.generate(context.files); - - // 5. Check cache freshness - if (this.cache.isStale(context)) { - await this.cache.refresh(context); - } - - return context; - } - - // LAYER 2: POST-LLM - async validatePostLLM(response: LLMResponse): Promise { - // 1. Parse tool calls (including text-based fallback) - const actions = await this.parseActions(response); - - // 2. Validate file paths - for (const action of actions) { - if (action.type === 'edit') { - this.validatePath(action.file_path); - } - } - - // 3. Confidence check - if (response.type === 'code_review') { - const confidence = this.calculateConfidence(response); - if (confidence < 0.8) { - // Filter low-confidence feedback - response = this.filterLowConfidence(response); - } - } - - // 4. Basic syntax validation - for (const action of actions) { - if (action.type === 'edit' && action.new_code) { - await this.validateSyntax(action.file_path, action.new_code); - } - } - - // 5. Security scan - await this.securityScanner.scan(actions); - - return { response, actions }; - } - - // LAYER 3: PRE-TOOL - async validatePreTool(tool: Tool, params: any): Promise { - // 1. Permission check - const allowed = await this.permissionManager.allows(tool.name, params); - if (!allowed) { - throw new PermissionDenied(`Tool ${tool.name} not allowed`); - } - - // 2. File time assertion (detect external changes) - if (params.file_path) { - const currentTime = fs.statSync(params.file_path).mtime; - const knownTime = this.fileTime.get(params.file_path); - if (knownTime && currentTime > knownTime) { - throw new FileChangedError(`${params.file_path} modified externally`); - } - } - - // 3. Run pre-tool hooks - await this.hooks.emit('PreToolUse', tool, params); - - // 4. Dry-run validation (if supported) - if (tool.supportsDryRun) { - await tool.dryRun(params); - } - } - - // LAYER 4: POST-TOOL - async validatePostTool(tool: Tool, params: any, result: any): Promise { - // 1. LSP diagnostics - if (tool.name === 'edit' && params.file_path) { - const diagnostics = await this.lsp.check(params.file_path); - - if (diagnostics.errors.length > 0) { - // Attempt auto-fix - const fixed = await this.autoFix(params.file_path, diagnostics); - if (!fixed) { - throw new ValidationError(`LSP errors: ${diagnostics.errors}`); - } - } - } - - // 2. Run linter - if (this.config.autoLint && params.file_path) { - const lintResult = await this.linter.lint(params.file_path); - if (lintResult.fatal.length > 0) { - throw new ValidationError(`Lint errors: ${lintResult.fatal}`); - } - } - - // 3. Run tests (if configured) - if (this.config.autoTest) { - const testResult = await this.testRunner.runRelated(params.file_path); - if (!testResult.success) { - throw new ValidationError(`Tests failed: ${testResult.failures}`); - } - } - - // 4. Run post-tool hooks - await this.hooks.emit('PostToolUse', tool, params, result); - - // 5. Git commit (for rollback) - if (this.config.autoCommit) { - const diff = this.generateDiff(params.file_path); - await this.git.commit(params.file_path, diff); - } - - // 6. Update file time tracking - if (params.file_path) { - this.fileTime.update(params.file_path); - } - } -} -``` - ---- - -## 3. File Editing System - -### 3.1 Hybrid Multi-Strategy Approach - -**Design Philosophy**: Layer multiple strategies for maximum reliability. - -``` -┌─────────────────────────────────────────────────────────┐ -│ STRATEGY 1: Tool-based Edit (Primary - Fastest) │ -│ ───────────────────────────────────────────────── │ -│ • Uses native Edit/Patch tools │ -│ • Direct API calls │ -│ • Most efficient │ -│ ✓ Try first if tools available │ -└────────────┬────────────────────────────────────────────┘ - │ (on failure or no tool support) - ▼ -┌─────────────────────────────────────────────────────────┐ -│ STRATEGY 2: Text-based SEARCH/REPLACE (Fallback) │ -│ ───────────────────────────────────────────────── │ -│ • Parse from LLM text output │ -│ • Works without tool support │ -│ • Multiple sub-strategies: │ -│ 1. Exact match │ -│ 2. Whitespace-flexible │ -│ 3. Block anchor match │ -│ 4. Levenshtein fuzzy match │ -│ 5. Context-aware match │ -│ 6. Dotdotdot handling │ -│ ✓ Try each until one succeeds │ -└────────────┬────────────────────────────────────────────┘ - │ (on all failures) - ▼ -┌─────────────────────────────────────────────────────────┐ -│ STRATEGY 3: Unified Diff/Patch (Advanced) │ -│ ───────────────────────────────────────────────── │ -│ • Parse unified diff format │ -│ • Apply with fuzz factor │ -│ • Context-based matching │ -│ ✓ Try if diff format detected │ -└────────────┬────────────────────────────────────────────┘ - │ (on all failures) - ▼ -┌─────────────────────────────────────────────────────────┐ -│ STRATEGY 4: Whole File Rewrite (Last Resort) │ -│ ───────────────────────────────────────────────── │ -│ • Replace entire file contents │ -│ • Generate diff for review │ -│ • Most token-intensive │ -│ ✓ Always succeeds │ -└─────────────────────────────────────────────────────────┘ -``` - -### 3.2 Detailed Strategy Implementations - -#### Strategy 1: Tool-Based Edit - -```typescript -class ToolBasedEditor { - async edit(file_path: string, old_string: string, new_string: string): Promise { - try { - // Use native Edit tool - const result = await this.tools.edit({ - file_path, - old_string, - new_string - }); - - return { - success: true, - strategy: 'tool-based', - result - }; - } catch (error) { - // Fall back to next strategy - throw new StrategyFailed('tool-based', error); - } - } -} -``` - -#### Strategy 2: Text-Based SEARCH/REPLACE (Aider Approach) - -```python -class SearchReplaceEditor: - """Parse SEARCH/REPLACE blocks from LLM text output""" - - def parse_blocks(self, text: str) -> List[EditBlock]: - """Extract all SEARCH/REPLACE blocks""" - pattern = r'<<<<<<< SEARCH\n(.*?)\n=======\n(.*?)\n>>>>>>> REPLACE' - matches = re.findall(pattern, text, re.DOTALL) - - blocks = [] - for search, replace in matches: - # Look back 3 lines for filename - filename = self.find_filename(text, search) - blocks.append(EditBlock(filename, search, replace)) - - return blocks - - def apply_edit(self, file_path: str, search: str, replace: str) -> EditResult: - """Apply edit with multiple fallback strategies""" - content = read_file(file_path) - - # Strategy 2.1: Exact match - result = self.exact_match(content, search, replace) - if result: - return self.write_result(file_path, result, 'exact-match') - - # Strategy 2.2: Whitespace-flexible match - result = self.whitespace_flexible(content, search, replace) - if result: - return self.write_result(file_path, result, 'whitespace-flexible') - - # Strategy 2.3: Block anchor match (first/last lines) - result = self.block_anchor_match(content, search, replace) - if result: - return self.write_result(file_path, result, 'block-anchor') - - # Strategy 2.4: Levenshtein fuzzy match - result = self.fuzzy_match(content, search, replace, threshold=0.8) - if result: - return self.write_result(file_path, result, 'fuzzy-match') - - # Strategy 2.5: Context-aware match - result = self.context_aware_match(content, search, replace) - if result: - return self.write_result(file_path, result, 'context-aware') - - # Strategy 2.6: Dotdotdot handling (elided code) - result = self.dotdotdot_match(content, search, replace) - if result: - return self.write_result(file_path, result, 'dotdotdot') - - # All strategies failed - raise EditFailed(self.suggest_similar(content, search)) - - def exact_match(self, content: str, search: str, replace: str) -> Optional[str]: - """Strategy 2.1: Perfect string match""" - if search in content: - return content.replace(search, replace, 1) # Replace first occurrence - return None - - def whitespace_flexible(self, content: str, search: str, replace: str) -> Optional[str]: - """Strategy 2.2: Match ignoring leading/trailing whitespace per line""" - content_lines = content.splitlines() - search_lines = search.splitlines() - replace_lines = replace.splitlines() - - # Try to find search block with flexible whitespace - for i in range(len(content_lines) - len(search_lines) + 1): - if self.lines_match_flexible(content_lines[i:i+len(search_lines)], search_lines): - # Found match - preserve original indentation - indentation = self.get_indentation(content_lines[i]) - replaced = self.apply_indentation(replace_lines, indentation) - - new_content = ( - content_lines[:i] + - replaced + - content_lines[i+len(search_lines):] - ) - return '\n'.join(new_content) - - return None - - def block_anchor_match(self, content: str, search: str, replace: str) -> Optional[str]: - """Strategy 2.3: Match using first and last lines as anchors""" - search_lines = search.splitlines() - if len(search_lines) < 2: - return None # Need at least 2 lines for anchors - - first_line = search_lines[0].strip() - last_line = search_lines[-1].strip() - - content_lines = content.splitlines() - candidates = [] - - # Find all positions where first line matches - for i, line in enumerate(content_lines): - if line.strip() == first_line: - # Check if last line matches at expected position - expected_last = i + len(search_lines) - 1 - if expected_last < len(content_lines): - if content_lines[expected_last].strip() == last_line: - # Calculate similarity of middle content - block = '\n'.join(content_lines[i:expected_last+1]) - similarity = self.levenshtein_similarity(block, search) - - if similarity >= 0.3: # Lower threshold for multi-candidate - candidates.append((i, expected_last, similarity)) - - if len(candidates) == 1: - # Single match - use very lenient threshold (0.0) - i, last, _ = candidates[0] - return self.replace_block(content_lines, i, last, replace) - elif len(candidates) > 1: - # Multiple matches - use best match above 0.3 threshold - best = max(candidates, key=lambda x: x[2]) - if best[2] >= 0.3: - return self.replace_block(content_lines, best[0], best[1], replace) - - return None - - def fuzzy_match(self, content: str, search: str, replace: str, threshold: float = 0.8) -> Optional[str]: - """Strategy 2.4: Levenshtein distance-based matching""" - search_lines = search.splitlines() - content_lines = content.splitlines() - - best_match = None - best_similarity = 0.0 - - # Sliding window - for i in range(len(content_lines) - len(search_lines) + 1): - block = '\n'.join(content_lines[i:i+len(search_lines)]) - similarity = self.levenshtein_similarity(block, search) - - if similarity > best_similarity: - best_similarity = similarity - best_match = i - - if best_similarity >= threshold: - # Found good match - new_content = ( - content_lines[:best_match] + - replace.splitlines() + - content_lines[best_match+len(search_lines):] - ) - return '\n'.join(new_content) - - return None - - def context_aware_match(self, content: str, search: str, replace: str) -> Optional[str]: - """Strategy 2.5: Use surrounding context for matching""" - # Extract context hints from search block - context = self.extract_context_hints(search) - - # Find similar blocks with context matching - candidates = self.find_blocks_with_context(content, search, context) - - if len(candidates) == 1: - return self.apply_replacement(content, candidates[0], replace) - elif len(candidates) > 1: - # Use additional heuristics - best = self.rank_candidates(candidates, context) - return self.apply_replacement(content, best, replace) - - return None - - def dotdotdot_match(self, content: str, search: str, replace: str) -> Optional[str]: - """Strategy 2.6: Handle ... for elided code""" - if '...' not in search: - return None - - # Split search into parts around ... - parts = search.split('...') - - # Find block that matches all parts in sequence - content_lines = content.splitlines() - - for i in range(len(content_lines)): - positions = [] - current_pos = i - - for part in parts: - # Find next occurrence of this part - match_pos = self.find_part(content_lines, part, current_pos) - if match_pos is None: - break - positions.append(match_pos) - current_pos = match_pos + len(part.splitlines()) - - if len(positions) == len(parts): - # All parts matched - start = positions[0] - end = current_pos - return self.replace_block(content_lines, start, end, replace) - - return None - - def suggest_similar(self, content: str, search: str) -> str: - """Find similar content to suggest to user""" - content_lines = content.splitlines() - search_lines = search.splitlines() - - # Find lines with high similarity - suggestions = [] - for i, line in enumerate(content_lines): - for search_line in search_lines: - similarity = self.line_similarity(line, search_line) - if similarity > 0.6: - suggestions.append((i+1, line, similarity)) - - if suggestions: - suggestions.sort(key=lambda x: x[2], reverse=True) - result = "Did you mean:\n" - for line_num, line, sim in suggestions[:5]: - result += f" Line {line_num}: {line} (similarity: {sim:.2f})\n" - return result - - return "No similar lines found" - - def levenshtein_similarity(self, s1: str, s2: str) -> float: - """Calculate similarity score (0-1) using Levenshtein distance""" - distance = Levenshtein.distance(s1, s2) - max_len = max(len(s1), len(s2)) - if max_len == 0: - return 1.0 - return 1.0 - (distance / max_len) -``` - -#### Strategy 3: Unified Diff/Patch Application (OpenCode Approach) - -```typescript -class PatchEditor { - async applyPatch(filePath: string, patchText: string): Promise { - try { - // Parse unified diff - const patch = parsePatch(patchText); - - // Read current file - const content = await fs.readFile(filePath, 'utf-8'); - const lines = content.split('\n'); - - // Apply each hunk - for (const hunk of patch.hunks) { - lines = await this.applyHunk(lines, hunk); - } - - const newContent = lines.join('\n'); - await fs.writeFile(filePath, newContent); - - return { - success: true, - strategy: 'unified-diff', - diff: createPatch(filePath, content, newContent) - }; - } catch (error) { - throw new StrategyFailed('unified-diff', error); - } - } - - private async applyHunk(lines: string[], hunk: Hunk): Promise { - // Find context match with fuzz factor - const contextLines = hunk.lines.filter(l => l.type === 'context'); - const position = this.findBestMatch(lines, contextLines, hunk.oldStart); - - if (position === -1) { - throw new Error('Cannot find context for hunk'); - } - - // Apply changes - const result = [...lines]; - let offset = 0; - - for (const line of hunk.lines) { - if (line.type === 'delete') { - result.splice(position + offset, 1); - } else if (line.type === 'insert') { - result.splice(position + offset, 0, line.content); - offset++; - } else { - offset++; - } - } - - return result; - } - - private findBestMatch(lines: string[], contextLines: string[], hint: number): number { - // Try exact position first - if (this.matchesAtPosition(lines, contextLines, hint)) { - return hint; - } - - // Search nearby - for (let offset = 1; offset <= 10; offset++) { - if (this.matchesAtPosition(lines, contextLines, hint + offset)) { - return hint + offset; - } - if (this.matchesAtPosition(lines, contextLines, hint - offset)) { - return hint - offset; - } - } - - // Search entire file - for (let i = 0; i < lines.length - contextLines.length; i++) { - if (this.matchesAtPosition(lines, contextLines, i)) { - return i; - } - } - - return -1; - } -} -``` - -#### Strategy 4: Whole File Rewrite - -```typescript -class WholeFileEditor { - async rewrite(filePath: string, newContent: string): Promise { - const oldContent = await fs.readFile(filePath, 'utf-8'); - - // Generate diff for review - const diff = createTwoFilesPatch( - filePath, - filePath, - oldContent, - newContent, - 'before', - 'after' - ); - - await fs.writeFile(filePath, newContent); - - return { - success: true, - strategy: 'whole-file-rewrite', - diff, - warning: 'Full file rewrite - review carefully' - }; - } -} -``` - -### 3.3 Edit Orchestrator - -```typescript -class EditOrchestrator { - private strategies: EditStrategy[] = [ - new ToolBasedEditor(), - new SearchReplaceEditor(), - new PatchEditor(), - new WholeFileEditor() - ]; - - async edit(request: EditRequest): Promise { - const errors: Error[] = []; - - for (const strategy of this.strategies) { - try { - console.log(`Trying strategy: ${strategy.name}`); - const result = await strategy.apply(request); - - if (result.success) { - console.log(`✓ Success with ${strategy.name}`); - return result; - } - } catch (error) { - console.log(`✗ ${strategy.name} failed: ${error.message}`); - errors.push(error); - } - } - - // All strategies failed - throw new AllStrategiesFailedError(errors); - } -} -``` - ---- - -## 4. Context Management (RepoMap) - -### 4.1 Intelligent Codebase Understanding - -**Key Innovation**: Use tree-sitter to parse 100+ languages and build dependency graphs. - -**Implementation** (from Aider): - -```python -class RepoMap: - """Generate intelligent repository maps for LLM context""" - - def __init__(self, cache_dir: str = '.aider.tags.cache'): - self.cache_dir = cache_dir - self.languages = self.load_tree_sitter_languages() - self.tag_cache = {} - - def get_repo_map( - self, - chat_files: List[str], - other_files: List[str], - mentioned_fnames: Set[str], - mentioned_idents: Set[str] - ) -> str: - """ - Generate a repository map showing code structure - - Args: - chat_files: Files currently in conversation - other_files: Other relevant files in repo - mentioned_fnames: Filenames mentioned by user/LLM - mentioned_idents: Identifiers (classes, functions) mentioned - - Returns: - Formatted repo map string for LLM context - """ - - # 1. Extract tags (classes, functions, methods) from all files - all_tags = {} - for file in chat_files + other_files: - tags = self.get_tags(file) - all_tags[file] = tags - - # 2. Build dependency graph - graph = self.build_dependency_graph(all_tags) - - # 3. Rank files by relevance - ranked = self.rank_files( - graph, - chat_files, - mentioned_fnames, - mentioned_idents - ) - - # 4. Generate map within token budget - return self.generate_map(ranked, token_budget=8000) - - def get_tags(self, file_path: str) -> List[Tag]: - """Extract code tags using tree-sitter""" - - # Check cache - cache_key = self.get_cache_key(file_path) - if cache_key in self.tag_cache: - return self.tag_cache[cache_key] - - # Determine language - language = self.detect_language(file_path) - if language not in self.languages: - return [] # Unsupported language - - # Parse with tree-sitter - parser = Parser() - parser.set_language(self.languages[language]) - - code = read_file(file_path) - tree = parser.parse(bytes(code, 'utf8')) - - # Run language-specific queries - tags = [] - query = self.get_query_for_language(language) - captures = query.captures(tree.root_node) - - for node, capture_name in captures: - tag = Tag( - name=self.get_identifier(node), - kind=capture_name, # 'class', 'function', 'method', etc. - line=node.start_point[0] + 1, - file=file_path - ) - tags.append(tag) - - # Cache results - self.tag_cache[cache_key] = tags - return tags - - def get_query_for_language(self, language: str) -> Query: - """Get tree-sitter query for extracting definitions""" - - queries = { - 'python': ''' - (class_definition name: (identifier) @class) - (function_definition name: (identifier) @function) - ''', - 'javascript': ''' - (class_declaration name: (identifier) @class) - (function_declaration name: (identifier) @function) - (method_definition name: (property_identifier) @method) - ''', - 'typescript': ''' - (class_declaration name: (type_identifier) @class) - (interface_declaration name: (type_identifier) @interface) - (function_declaration name: (identifier) @function) - (method_definition name: (property_identifier) @method) - ''', - 'rust': ''' - (struct_item name: (type_identifier) @struct) - (enum_item name: (type_identifier) @enum) - (trait_item name: (type_identifier) @trait) - (impl_item type: (_) @impl) - (function_item name: (identifier) @function) - ''', - 'go': ''' - (type_declaration (type_spec name: (type_identifier) @type)) - (function_declaration name: (identifier) @function) - (method_declaration name: (field_identifier) @method) - ''', - # ... 100+ more languages - } - - return Query(self.languages[language], queries[language]) - - def build_dependency_graph(self, all_tags: Dict[str, List[Tag]]) -> nx.DiGraph: - """Build dependency graph using networkx""" - - graph = nx.DiGraph() - - # Add nodes (one per file) - for file in all_tags: - graph.add_node(file) - - # Add edges (dependencies) - for file, tags in all_tags.items(): - code = read_file(file) - - # Find references to other files' tags - for other_file, other_tags in all_tags.items(): - if file == other_file: - continue - - for tag in other_tags: - # Check if this file references the tag - if self.has_reference(code, tag.name): - graph.add_edge(file, other_file, tag=tag.name) - - return graph - - def rank_files( - self, - graph: nx.DiGraph, - chat_files: List[str], - mentioned_fnames: Set[str], - mentioned_idents: Set[str] - ) -> List[Tuple[str, float]]: - """Rank files by relevance using PageRank-style algorithm""" - - scores = {} - - # Base scores - for file in graph.nodes(): - score = 0.0 - - # Chat files are most important - if file in chat_files: - score += 10.0 - - # Mentioned files - if file in mentioned_fnames: - score += 5.0 - - # Files with mentioned identifiers - tags = self.get_tags(file) - for tag in tags: - if tag.name in mentioned_idents: - score += 3.0 - - scores[file] = score - - # PageRank-style propagation - pagerank = nx.pagerank(graph, personalization=scores) - - # Combine scores - final_scores = {} - for file in graph.nodes(): - final_scores[file] = scores.get(file, 0) + pagerank[file] * 10 - - # Sort by score - ranked = sorted(final_scores.items(), key=lambda x: x[1], reverse=True) - return ranked - - def generate_map(self, ranked_files: List[Tuple[str, float]], token_budget: int) -> str: - """Generate formatted repo map within token budget""" - - lines = [] - tokens_used = 0 - - for file, score in ranked_files: - if tokens_used >= token_budget: - break - - # File header - header = f"\n{file}:\n" - tokens_used += self.estimate_tokens(header) - lines.append(header) - - # Tags for this file - tags = self.get_tags(file) - for tag in tags: - line = f" {tag.kind} {tag.name} (line {tag.line})\n" - token_cost = self.estimate_tokens(line) - - if tokens_used + token_cost > token_budget: - break - - tokens_used += token_cost - lines.append(line) - - return ''.join(lines) - - def estimate_tokens(self, text: str) -> int: - """Estimate token count (rough approximation)""" - return len(text) // 4 -``` - -**Usage in LLM Context**: - -```python -# Include repo map in system prompt -system_prompt = f"""You are an AI coding assistant. - -Here is the repository structure: - -{repo_map} - -The user is working on: {', '.join(chat_files)} - -Please help them with their request. -""" -``` - -**Benefits**: -- LLM understands codebase structure -- Discovers relevant files automatically -- Respects token limits -- Cached for performance -- Works with 100+ languages - ---- - -## 5. Built-in LSP Integration - -### 5.1 Language Server Protocol Support - -**Key Innovation**: Immediate type checking and diagnostics after every edit (from OpenCode). - -```typescript -class LSPManager { - private servers: Map = new Map(); - private diagnostics: Map = new Map(); - - async initialize() { - // Auto-discover LSP configurations - const config = await this.loadConfig(); - - for (const [language, serverConfig] of Object.entries(config.lsp)) { - await this.startServer(language, serverConfig); - } - } - - async startServer(language: string, config: LSPConfig) { - const server = new LanguageServer({ - command: config.command, - args: config.args, - rootUri: this.workspaceRoot, - capabilities: { - textDocument: { - hover: true, - completion: true, - definition: true, - references: true, - diagnostics: true - } - } - }); - - await server.start(); - - // Subscribe to diagnostics - server.on('textDocument/publishDiagnostics', (params) => { - this.diagnostics.set(params.uri, params.diagnostics); - }); - - this.servers.set(language, server); - } - - async touchFile(filePath: string, waitForDiagnostics: boolean = true) { - const language = this.detectLanguage(filePath); - const server = this.servers.get(language); - - if (!server) { - return; // No LSP for this language - } - - // Notify LSP of file change - const content = await fs.readFile(filePath, 'utf-8'); - await server.didChange({ - textDocument: { - uri: `file://${filePath}`, - version: Date.now() - }, - contentChanges: [{ - text: content - }] - }); - - if (waitForDiagnostics) { - // Wait for diagnostics (up to 2 seconds) - await this.waitForDiagnostics(filePath, 2000); - } - } - - async getDiagnostics(filePath?: string): Promise { - if (filePath) { - return this.diagnostics.get(`file://${filePath}`) || []; - } - - // Return all diagnostics - const all: Diagnostic[] = []; - for (const diags of this.diagnostics.values()) { - all.push(...diags); - } - return all; - } - - async getHover(filePath: string, line: number, character: number): Promise { - const language = this.detectLanguage(filePath); - const server = this.servers.get(language); - - if (!server) { - return null; - } - - return await server.hover({ - textDocument: { uri: `file://${filePath}` }, - position: { line, character } - }); - } - - async getDefinition(filePath: string, line: number, character: number): Promise { - const language = this.detectLanguage(filePath); - const server = this.servers.get(language); - - if (!server) { - return []; - } - - return await server.definition({ - textDocument: { uri: `file://${filePath}` }, - position: { line, character } - }); - } -} -``` - -**Configuration** (`opencode.json`): - -```json -{ - "lsp": { - "typescript": { - "command": "typescript-language-server", - "args": ["--stdio"], - "rootPatterns": ["package.json", "tsconfig.json"] - }, - "python": { - "command": "pylsp", - "args": [], - "rootPatterns": ["setup.py", "pyproject.toml"] - }, - "rust": { - "command": "rust-analyzer", - "args": [], - "rootPatterns": ["Cargo.toml"] - }, - "go": { - "command": "gopls", - "args": [], - "rootPatterns": ["go.mod"] - } - } -} -``` - -**Integration with Post-Tool Validation**: - -```typescript -// After every edit -await lsp.touchFile(filePath, true); -const diagnostics = await lsp.getDiagnostics(filePath); - -if (diagnostics.some(d => d.severity === DiagnosticSeverity.Error)) { - console.log('❌ LSP Errors detected:'); - for (const diag of diagnostics) { - console.log(` Line ${diag.range.start.line}: ${diag.message}`); - } - - // Attempt auto-fix - const fixed = await autoFix(filePath, diagnostics); - if (!fixed) { - throw new ValidationError('LSP errors could not be auto-fixed'); - } -} -``` - ---- - -## 6. Advanced Features - -### 6.1 Confidence Scoring (Claude Code) - -**Purpose**: Filter low-confidence code review feedback to reduce noise. - -```typescript -class ConfidenceScorer { - calculateConfidence(feedback: CodeReviewFeedback): number { - let score = 0.0; - - // Factor 1: Specificity (0-30 points) - if (feedback.includes('line')) score += 10; - if (feedback.includes('function')) score += 10; - if (/:\d+/.test(feedback)) score += 10; // Line number reference - - // Factor 2: Actionability (0-30 points) - const actionVerbs = ['change', 'add', 'remove', 'fix', 'refactor', 'rename']; - for (const verb of actionVerbs) { - if (feedback.toLowerCase().includes(verb)) { - score += 10; - break; - } - } - if (feedback.includes('should') || feedback.includes('must')) score += 10; - if (feedback.includes('```')) score += 10; // Code example - - // Factor 3: Severity (0-40 points) - if (feedback.toLowerCase().includes('security')) score += 20; - if (feedback.toLowerCase().includes('bug')) score += 15; - if (feedback.toLowerCase().includes('error')) score += 15; - if (feedback.toLowerCase().includes('performance')) score += 10; - - return Math.min(score, 100) / 100; // Normalize to 0-1 - } - - filterFeedback(feedback: CodeReviewFeedback[], threshold: number = 0.8): CodeReviewFeedback[] { - return feedback.filter(item => { - const confidence = this.calculateConfidence(item.message); - item.confidence = confidence; - return confidence >= threshold; - }); - } -} -``` - -**Usage**: - -```typescript -// In code review agent -const feedback = await this.generateCodeReview(files); -const filtered = this.confidenceScorer.filterFeedback(feedback, 0.8); - -console.log(`Generated ${feedback.length} items, ${filtered.length} above threshold`); -return filtered; -``` - -### 6.2 Plan Mode (OpenCode) - -**Purpose**: Safe exploration and analysis without execution. - -```typescript -class PlanMode { - private enabled: boolean = false; - private allowedTools: Set = new Set([ - 'read', 'grep', 'glob', 'lsp', 'git_status', 'git_diff', 'git_log' - ]); - - enable() { - this.enabled = true; - console.log('📋 Plan mode enabled - read-only operations only'); - } - - disable() { - this.enabled = false; - console.log('✏️ Plan mode disabled - full operations enabled'); - } - - async checkToolAllowed(toolName: string): Promise { - if (!this.enabled) { - return; // Plan mode not active - } - - if (!this.allowedTools.has(toolName)) { - throw new PlanModeError( - `Tool '${toolName}' not allowed in plan mode. ` + - `Only read-only operations permitted: ${Array.from(this.allowedTools).join(', ')}` - ); - } - } -} -``` - -**User Experience**: - -```bash -$ code-assistant --plan -📋 Plan mode enabled - -> Add user authentication with JWT tokens - -I'll analyze your codebase and create a plan for implementing JWT authentication: - -1. Reading current authentication setup... - ✓ Found auth.ts with basic authentication - ✓ No JWT implementation detected - -2. Analyzing dependencies... - ✓ Found jsonwebtoken in package.json - ✓ No security middleware detected - -3. Plan: - Phase 1: Install dependencies - - Add jsonwebtoken - - Add bcrypt for password hashing - - Phase 2: Implement JWT service - - Create src/services/jwt.service.ts - - Generate/verify tokens - - Refresh token mechanism - - Phase 3: Add authentication middleware - - Create src/middleware/auth.middleware.ts - - Protect routes - - Phase 4: Update user endpoints - - POST /auth/login - - POST /auth/register - - POST /auth/refresh - - Phase 5: Testing - - Unit tests for JWT service - - Integration tests for auth flow - -Ready to execute? [Y/n] -``` - -### 6.3 Multi-Agent Parallel Execution (Claude Code) - -**Purpose**: Run multiple specialized agents concurrently for faster completion. - -```typescript -class AgentOrchestrator { - private agents: Map = new Map(); - - async executeParallel(tasks: Task[]): Promise> { - // Group tasks by agent type - const grouped = this.groupByAgent(tasks); - - // Launch agents in parallel - const promises = []; - for (const [agentType, agentTasks] of grouped.entries()) { - const agent = this.getAgent(agentType); - promises.push( - this.executeAgent(agent, agentTasks) - ); - } - - // Wait for all to complete - const results = await Promise.allSettled(promises); - - // Aggregate results - const aggregated = new Map(); - for (let i = 0; i < results.length; i++) { - const result = results[i]; - const agentType = Array.from(grouped.keys())[i]; - - if (result.status === 'fulfilled') { - aggregated.set(agentType, result.value); - } else { - console.error(`Agent ${agentType} failed:`, result.reason); - aggregated.set(agentType, { error: result.reason }); - } - } - - return aggregated; - } - - private async executeAgent(agent: Agent, tasks: Task[]): Promise { - // Create isolated context - const context = agent.createContext(); - - // Execute tasks - const results = []; - for (const task of tasks) { - const result = await agent.execute(task, context); - results.push(result); - } - - return results; - } -} -``` - -**Example Usage**: - -```typescript -// User request: "Run tests, check linter, and build the project" - -const tasks = [ - { type: 'test', agent: 'test-runner' }, - { type: 'lint', agent: 'linter' }, - { type: 'build', agent: 'builder' } -]; - -const results = await orchestrator.executeParallel(tasks); - -console.log('✓ All tasks completed'); -console.log('Tests:', results.get('test-runner')); -console.log('Lint:', results.get('linter')); -console.log('Build:', results.get('builder')); -``` - -### 6.4 Multi-Phase Workflows (Claude Code) - -**Purpose**: Guide complex feature development through structured phases. - -```typescript -class WorkflowEngine { - private phases = [ - 'discovery', - 'exploration', - 'questions', - 'architecture', - 'implementation', - 'review', - 'summary' - ]; - - async executeFeatureWorkflow(feature: FeatureRequest): Promise { - const context = { - feature, - discoveries: [], - explorations: [], - answers: [], - architecture: null, - implementation: [], - reviews: [], - summary: null - }; - - for (const phase of this.phases) { - console.log(`\n=== Phase: ${phase} ===\n`); - - const phaseResult = await this.executePhase(phase, context); - context[phase] = phaseResult; - - // Check if user wants to continue - if (phase !== 'summary') { - const shouldContinue = await this.askUserToContinue(phase, phaseResult); - if (!shouldContinue) { - console.log('Workflow paused. You can resume later.'); - return context; - } - } - } - - return context; - } - - private async executePhase(phase: string, context: any): Promise { - switch (phase) { - case 'discovery': - return await this.discoveryPhase(context); - case 'exploration': - return await this.explorationPhase(context); - case 'questions': - return await this.questionsPhase(context); - case 'architecture': - return await this.architecturePhase(context); - case 'implementation': - return await this.implementationPhase(context); - case 'review': - return await this.reviewPhase(context); - case 'summary': - return await this.summaryPhase(context); - } - } - - private async discoveryPhase(context: any): Promise { - // Search codebase for related code - const related = await this.repoMap.findRelated(context.feature.description); - - // Analyze existing patterns - const patterns = await this.analyzePatterns(related); - - // Identify dependencies - const deps = await this.analyzeDependencies(related); - - return { related, patterns, deps }; - } - - private async explorationPhase(context: any): Promise { - // Read and understand related files - const understanding = await this.exploreAgent.analyze(context.discovery.related); - - // Identify integration points - const integrationPoints = this.findIntegrationPoints(understanding); - - return { understanding, integrationPoints }; - } - - private async questionsPhase(context: any): Promise { - // Generate clarifying questions - const questions = this.generateQuestions(context); - - if (questions.length === 0) { - return { questions: [], answers: [] }; - } - - // Ask user - const answers = await this.askUser(questions); - - return { questions, answers }; - } - - private async architecturePhase(context: any): Promise { - // Design the solution - const design = await this.architectAgent.design({ - feature: context.feature, - discoveries: context.discovery, - explorations: context.exploration, - answers: context.questions.answers - }); - - // Write ADR - const adr = await this.writeADR(design); - - return { design, adr }; - } - - private async implementationPhase(context: any): Promise { - // Break down into tasks - const tasks = this.breakDownIntoTasks(context.architecture.design); - - // Implement each task - const implementations = []; - for (const task of tasks) { - console.log(`\nImplementing: ${task.description}`); - const impl = await this.developerAgent.implement(task, context); - implementations.push(impl); - - // Run tests after each task - await this.runTests(impl.files); - } - - return implementations; - } - - private async reviewPhase(context: any): Promise { - // Review all implemented code - const reviews = []; - for (const impl of context.implementation) { - const review = await this.reviewerAgent.review(impl.files); - reviews.push(review); - - // Apply high-confidence feedback - const filtered = this.confidenceScorer.filterFeedback(review.feedback, 0.8); - if (filtered.length > 0) { - await this.applyFeedback(impl.files, filtered); - } - } - - return reviews; - } - - private async summaryPhase(context: any): Promise { - // Generate comprehensive summary - return { - feature: context.feature.description, - filesModified: this.collectFiles(context.implementation), - testsAdded: this.collectTests(context.implementation), - reviewFindings: this.summarizeReviews(context.review), - nextSteps: this.suggestNextSteps(context) - }; - } -} -``` - ---- - -## 7. Error Recovery & Rollback - -### 7.1 Git-Based Recovery (Aider Approach) - -```python -class GitRecovery: - """Auto-commit every change for easy rollback""" - - def __init__(self, repo_path: str): - self.repo = git.Repo(repo_path) - self.commit_stack = [] - - def auto_commit(self, files: List[str], message: str, strategy: str): - """Commit changes with detailed message""" - - # Stage specific files - for file in files: - self.repo.index.add([file]) - - # Create detailed commit message - full_message = f"""{message} - -Strategy: {strategy} -Files: {', '.join(files)} -Timestamp: {datetime.now().isoformat()} - -🤖 Generated with AI Code Assistant - -Co-Authored-By: Claude -""" - - # Commit - commit = self.repo.index.commit(full_message) - self.commit_stack.append(commit) - - return commit - - def undo(self, steps: int = 1): - """Undo last N commits""" - if steps > len(self.commit_stack): - raise ValueError(f"Cannot undo {steps} steps, only {len(self.commit_stack)} commits") - - # Get commit to reset to - target = self.commit_stack[-(steps + 1)] if steps < len(self.commit_stack) else None - - if target: - self.repo.head.reset(target, index=True, working_tree=True) - else: - # Reset to before any AI commits - self.repo.head.reset('HEAD~' + str(steps), index=True, working_tree=True) - - # Remove from stack - self.commit_stack = self.commit_stack[:-steps] - - def show_history(self, limit: int = 10): - """Show recent AI commits""" - commits = list(self.repo.iter_commits(max_count=limit)) - - for i, commit in enumerate(commits): - if '🤖' in commit.message: - print(f"{i+1}. {commit.hexsha[:7]} - {commit.message.split('\\n')[0]}") -``` - -### 7.2 Snapshot System (OpenCode Approach) - -```typescript -class SnapshotManager { - private snapshots: Map = new Map(); - private snapshotDir: string; - - async createSnapshot(sessionId: string, description: string): Promise { - const snapshot: Snapshot = { - id: this.generateId(), - sessionId, - timestamp: Date.now(), - description, - files: await this.captureFiles() - }; - - // Save to disk - await this.saveSnapshot(snapshot); - this.snapshots.set(snapshot.id, snapshot); - - return snapshot.id; - } - - async restoreSnapshot(snapshotId: string): Promise { - const snapshot = this.snapshots.get(snapshotId); - if (!snapshot) { - throw new Error(`Snapshot ${snapshotId} not found`); - } - - // Restore all files - for (const [filePath, content] of Object.entries(snapshot.files)) { - await fs.writeFile(filePath, content); - } - - console.log(`✓ Restored snapshot: ${snapshot.description}`); - } - - async autoSnapshot(event: string): Promise { - return await this.createSnapshot('auto', `Auto-snapshot: ${event}`); - } - - private async captureFiles(): Promise> { - const files = new Map(); - - // Capture all tracked files - const tracked = await this.getTrackedFiles(); - for (const file of tracked) { - const content = await fs.readFile(file, 'utf-8'); - files.set(file, content); - } - - return files; - } -} -``` - -### 7.3 Integrated Recovery System - -```typescript -class RecoveryManager { - constructor( - private git: GitRecovery, - private snapshots: SnapshotManager - ) {} - - async executeWithRecovery( - operation: () => Promise, - description: string - ): Promise { - // Create snapshot before operation - const snapshotId = await this.snapshots.autoSnapshot(`Before: ${description}`); - - try { - // Execute operation - const result = await operation(); - - // Auto-commit on success - await this.git.auto_commit( - this.getModifiedFiles(), - description, - 'auto' - ); - - return result; - } catch (error) { - console.error(`❌ Operation failed: ${error.message}`); - - // Ask user what to do - const choice = await this.askRecoveryChoice(); - - switch (choice) { - case 'snapshot': - await this.snapshots.restoreSnapshot(snapshotId); - break; - case 'git': - await this.git.undo(1); - break; - case 'retry': - return await this.executeWithRecovery(operation, description); - case 'continue': - // Do nothing, keep failed state - break; - } - - throw error; - } - } - - private async askRecoveryChoice(): Promise { - // Show options to user - const choices = [ - 'snapshot: Restore to snapshot before operation', - 'git: Undo last git commit', - 'retry: Try the operation again', - 'continue: Keep current state and continue' - ]; - - return await promptUser('Recovery options:', choices); - } -} -``` - ---- - -## 8. Permission & Security - -### 8.1 Permission System - -```typescript -interface PermissionConfig { - edit: 'allow' | 'deny' | 'ask'; - bash: { - [pattern: string]: 'allow' | 'deny' | 'ask'; - }; - webfetch: 'allow' | 'deny' | 'ask'; - git: { - push: 'allow' | 'deny' | 'ask'; - force: 'deny'; - }; -} - -class PermissionManager { - private config: PermissionConfig; - - async allows(tool: string, params: any): Promise { - const permission = this.getPermission(tool, params); - - switch (permission) { - case 'allow': - return true; - - case 'deny': - throw new PermissionDenied(`Tool ${tool} is not allowed`); - - case 'ask': - return await this.askUser(tool, params); - } - } - - private getPermission(tool: string, params: any): 'allow' | 'deny' | 'ask' { - // Special handling for bash commands - if (tool === 'bash') { - return this.getBashPermission(params.command); - } - - // Direct tool permissions - return this.config[tool] || 'ask'; - } - - private getBashPermission(command: string): 'allow' | 'deny' | 'ask' { - const patterns = this.config.bash || {}; - - // Check each pattern - for (const [pattern, permission] of Object.entries(patterns)) { - if (this.matchesPattern(command, pattern)) { - return permission; - } - } - - // Default to ask - return 'ask'; - } - - private matchesPattern(command: string, pattern: string): boolean { - // Convert glob pattern to regex - const regex = new RegExp( - '^' + pattern.replace(/\*/g, '.*').replace(/\?/g, '.') + '$' - ); - return regex.test(command); - } - - private async askUser(tool: string, params: any): Promise { - console.log(`\n🔐 Permission required:`); - console.log(`Tool: ${tool}`); - console.log(`Params: ${JSON.stringify(params, null, 2)}`); - - const response = await promptUser('Allow? [y/N]', ['y', 'n']); - return response.toLowerCase() === 'y'; - } -} -``` - -**Example Configuration**: - -```json -{ - "permissions": { - "edit": "allow", - "bash": { - "git*": "allow", - "npm install*": "allow", - "npm run*": "allow", - "rm -rf*": "ask", - "sudo*": "deny", - "curl*": "ask" - }, - "webfetch": "ask", - "git": { - "push": "ask", - "force": "deny" - } - } -} -``` - -### 8.2 Enhanced Security: Knowledge-Graph-Based Command Permissions (Terraphim Innovation) - -**Key Innovation**: Repository-specific security using knowledge graphs with intelligent command matching via terraphim-automata. - -#### 8.2.1 Architecture - -Instead of simple pattern matching, use terraphim's knowledge graph to store allowed/blocked commands per repository, with automata-based fuzzy matching and synonym resolution. - -```rust -// terraphim_rolegraph/src/repository_security.rs - -pub struct RepositorySecurityGraph { - allowed_commands: RoleGraph, // Commands that run without asking - blocked_commands: RoleGraph, // Commands that are NEVER allowed - ask_commands: RoleGraph, // Commands requiring confirmation - command_synonyms: Thesaurus, // Command aliases/variations - automata: TerraphimAutomata, // Fast command matching (Aho-Corasick) - fuzzy_matcher: FuzzyMatcher, // Jaro-Winkler + Levenshtein -} - -impl RepositorySecurityGraph { - /// Validate command from LLM output using multi-strategy matching - pub async fn validate_command(&self, llm_command: &str) -> CommandPermission { - // 1. Exact match using Aho-Corasick (nanoseconds) - if let Some(exact) = self.automata.find_matches(llm_command, false) { - return self.check_permission(exact); - } - - // 2. Synonym resolution via thesaurus - let normalized = self.normalize_command(llm_command); - if let Some(known) = self.command_synonyms.find_synonym(&normalized) { - println!("Resolved '{}' → '{}'", llm_command, known); - return self.check_permission(known); - } - - // 3. Fuzzy match with Jaro-Winkler (similarity ≥ 0.85) - if let Some(fuzzy) = self.fuzzy_matcher.find_similar(llm_command, 0.85) { - return self.check_permission(fuzzy); - } - - // 4. Unknown command - default to ASK for safety - CommandPermission::Ask(llm_command.to_string()) - } -} -``` - -#### 8.2.2 Repository Security Configuration - -Each repository has `.terraphim/security.json`: - -```json -{ - "repository": "my-rust-project", - "security_level": "development", - - "allowed_commands": { - "git": ["status", "diff", "log", "add", "commit", "branch"], - "cargo": ["build", "test", "check", "clippy", "fmt", "doc"], - "cat": ["*"], - "ls": ["*"], - "grep": ["*"], - "find": ["*"] - }, - - "blocked_commands": { - "git": ["push --force", "reset --hard", "clean -fd"], - "cargo": ["publish", "yank"], - "rm": ["-rf /", "-rf /*", "-rf ~"], - "sudo": ["*"], - "chmod": ["777 *"] - }, - - "ask_commands": { - "git": ["push", "pull", "merge", "rebase"], - "rm": ["*"], - "mv": ["*"], - "docker": ["*"] - }, - - "command_synonyms": { - "delete file": "rm", - "remove file": "rm", - "erase": "rm", - "show file": "cat", - "display": "cat", - "list files": "ls", - "directory": "ls", - "search": "grep", - "find text": "grep", - "build project": "cargo build", - "run tests": "cargo test", - "format code": "cargo fmt" - }, - - "contextual_permissions": [ - { - "command": "cargo publish", - "allowed_if": [ - {"branch_is": "main"}, - {"file_exists": "Cargo.toml"}, - {"file_contains": ["Cargo.toml", "version = "]} - ] - }, - { - "command": "git push", - "blocked_if": [ - {"branch_is": "main"}, - {"file_modified": [".env", "secrets.json"]} - ] - } - ] -} -``` - -#### 8.2.3 Command Extraction from LLM Output - -```rust -// terraphim_automata/src/command_matcher.rs - -pub struct CommandMatcher { - automata: AhoCorasickAutomata, - extraction_patterns: Vec, -} - -impl CommandMatcher { - /// Extract commands from natural language LLM output - pub fn extract_commands(&self, llm_output: &str) -> Vec { - let mut commands = Vec::new(); - - // Pattern 1: Backticks - `cargo build` - commands.extend(self.extract_backtick_commands(llm_output)); - - // Pattern 2: Code blocks - ```bash\ncargo build\n``` - commands.extend(self.extract_code_blocks(llm_output)); - - // Pattern 3: Shell prompts - $ cargo build - commands.extend(self.extract_shell_prompts(llm_output)); - - // Pattern 4: Action phrases - "Let me run cargo build" - commands.extend(self.extract_action_phrases(llm_output)); - - // Use automata for fast extraction - self.automata.find_all_patterns(llm_output, &commands) - } - - fn extract_action_phrases(&self, text: &str) -> Vec { - // Extract commands from natural language - // "Let me run X", "I'll execute Y", "Running Z" - let action_patterns = vec![ - r"(?i)(?:let me |I'll |I will )?(?:run|execute|call) (.+)", - r"(?i)Running (.+)", - r"(?i)Executing (.+)", - ]; - - // Use regex + automata for efficient extraction - self.extract_with_patterns(text, &action_patterns) - } -} -``` - -#### 8.2.4 Secure Command Execution - -```rust -// terraphim_mcp_server/src/secure_executor.rs - -pub struct SecureCommandExecutor { - security_graph: RepositorySecurityGraph, - command_matcher: CommandMatcher, - audit_log: AuditLog, - learning_system: SecurityLearner, -} - -impl SecureCommandExecutor { - pub async fn execute_from_llm(&self, llm_output: &str) -> Result { - // 1. Extract all commands from LLM output - let commands = self.command_matcher.extract_commands(llm_output); - - let mut results = Vec::new(); - - for cmd in commands { - // 2. Match command using automata + fuzzy + synonyms - let matched = self.command_matcher.match_command(&cmd); - - // 3. Check permission from knowledge graph - let permission = self.security_graph.validate_command(&cmd).await?; - - // 4. Execute based on permission - let result = match permission { - CommandPermission::Allow => { - // Execute silently (no user interruption) - self.audit_log.log_allowed(&cmd); - self.execute_command(&cmd).await? - }, - - CommandPermission::Block => { - // Never execute, log for security review - self.audit_log.log_blocked(&cmd); - ExecutionResult::Blocked(format!("🚫 Blocked: {}", cmd)) - }, - - CommandPermission::Ask(command) => { - // Ask user, learn from decision - println!("🔐 Permission required for: {}", command); - - if self.ask_user_permission(&command).await? { - self.audit_log.log_approved(&command); - - // Learn from approval - self.learning_system.record_decision(&command, true).await; - - self.execute_command(&command).await? - } else { - self.audit_log.log_denied(&command); - - // Learn from denial - self.learning_system.record_decision(&command, false).await; - - ExecutionResult::Denied(command) - } - } - }; - - results.push(result); - } - - Ok(ExecutionResult::Multiple(results)) - } -} -``` - -#### 8.2.5 Learning System - -The system learns from user decisions to reduce future prompts: - -```rust -// terraphim_rolegraph/src/security_learning.rs - -pub struct SecurityLearner { - graph: RepositorySecurityGraph, - decisions: VecDeque, - learning_threshold: usize, -} - -impl SecurityLearner { - pub async fn record_decision(&mut self, command: &str, allowed: bool) { - self.decisions.push_back(UserDecision { - command: command.to_string(), - allowed, - timestamp: Utc::now(), - similarity_group: self.find_similar_commands(command), - }); - - // Analyze patterns after N decisions - if self.decisions.len() >= self.learning_threshold { - self.analyze_and_learn().await; - } - } - - async fn analyze_and_learn(&mut self) { - // Group similar commands - let command_groups = self.group_by_similarity(&self.decisions); - - for (group, decisions) in command_groups { - let allowed_count = decisions.iter().filter(|d| d.allowed).count(); - let denied_count = decisions.len() - allowed_count; - - // Consistent approval → add to allowed list - if allowed_count > 5 && denied_count == 0 { - self.graph.add_allowed_command(group).await; - println!("📝 Learned: '{}' is now auto-allowed", group); - } - - // Consistent denial → add to blocked list - else if denied_count > 3 && allowed_count == 0 { - self.graph.add_blocked_command(group).await; - println!("🚫 Learned: '{}' is now auto-blocked", group); - } - } - - // Persist updated graph - self.graph.save().await?; - } -} -``` - -#### 8.2.6 Context-Aware Permissions - -Advanced feature: permissions depend on repository state: - -```rust -pub enum PermissionCondition { - BranchIs(String), // Only on specific branch - FileExists(String), // Requires file to exist - FileContains(String, String), // File must contain pattern - FileModified(Vec), // Block if files changed - TimeWindow(TimeRange), // Only during certain hours - CommitCount(usize), // After N commits -} - -impl RepositorySecurityGraph { - pub async fn check_contextual_permission( - &self, - command: &str, - repo: &Repository, - ) -> Result { - let rules = self.contextual_rules.get(command); - - for rule in rules { - // Check all conditions - for condition in &rule.allowed_if { - if !self.check_condition(condition, repo).await? { - return Ok(false); - } - } - - for condition in &rule.blocked_if { - if self.check_condition(condition, repo).await? { - return Ok(false); - } - } - } - - Ok(true) - } -} -``` - -#### 8.2.7 Auto-Generated Security Profiles - -System generates smart defaults based on repository type: - -```rust -// terraphim_service/src/security_profiler.rs - -pub async fn generate_security_profile(repo_path: &Path) -> SecurityConfig { - let mut config = SecurityConfig::default(); - - // Detect repository type - let repo_type = detect_repo_type(repo_path).await; - - match repo_type { - RepoType::Rust => { - config.allowed_commands.insert("cargo", vec![ - "build", "test", "check", "clippy", "fmt", "doc" - ]); - config.blocked_commands.insert("cargo", vec![ - "publish", "yank" - ]); - config.command_synonyms.insert("build", "cargo build"); - config.command_synonyms.insert("test", "cargo test"); - }, - - RepoType::JavaScript => { - config.allowed_commands.insert("npm", vec![ - "install", "test", "run build", "run dev", "run lint" - ]); - config.blocked_commands.insert("npm", vec![ - "publish", "unpublish" - ]); - }, - - RepoType::Python => { - config.allowed_commands.insert("python", vec![ - "*.py", "test", "-m pytest", "-m unittest" - ]); - config.allowed_commands.insert("pip", vec![ - "install -r requirements.txt", "list", "show" - ]); - }, - - _ => {} - } - - // Always add safe operations - config.allowed_commands.insert("cat", vec!["*"]); - config.allowed_commands.insert("ls", vec!["*"]); - config.allowed_commands.insert("grep", vec!["*"]); - config.allowed_commands.insert("git", vec!["status", "diff", "log"]); - - // Always block dangerous operations - config.blocked_commands.insert("rm", vec!["-rf /", "-rf /*"]); - config.blocked_commands.insert("sudo", vec!["*"]); - - config -} -``` - -#### 8.2.8 Performance Characteristics - -**Command Validation Speed**: -- Exact match (Aho-Corasick): ~10 nanoseconds -- Synonym lookup: ~100 nanoseconds -- Fuzzy match (Jaro-Winkler): ~1-5 microseconds -- Total overhead: < 10 microseconds per command - -**Compared to Other Assistants**: - -| Feature | Aider | Claude Code | OpenCode | Terraphim | -|---------|-------|-------------|----------|-----------| -| Command Permissions | ❌ None | ✅ Basic patterns | ✅ Basic | ✅ **Knowledge Graph** | -| Repository-Specific | ❌ | ❌ | ❌ | ✅ | -| Synonym Resolution | ❌ | ❌ | ❌ | ✅ | -| Fuzzy Command Matching | ❌ | ❌ | ❌ | ✅ | -| Learning System | ❌ | ❌ | ❌ | ✅ | -| Context-Aware | ❌ | Partial | ❌ | ✅ | -| Validation Speed | N/A | ~100µs | ~100µs | **~10µs** | - -#### 8.2.9 Security Audit Trail - -```rust -pub struct SecurityAuditLog { - log_file: PathBuf, - events: Vec, -} - -pub struct SecurityEvent { - timestamp: DateTime, - command: String, - matched_as: String, // What the command matched in graph - permission: CommandPermission, - executed: bool, - user_decision: Option, - similarity_score: f64, -} - -impl SecurityAuditLog { - pub async fn log_event(&mut self, event: SecurityEvent) { - self.events.push(event.clone()); - - // Write to file for security review - let entry = format!( - "[{}] {} | Matched: {} | Permission: {:?} | Executed: {} | Similarity: {:.2}\n", - event.timestamp, - event.command, - event.matched_as, - event.permission, - event.executed, - event.similarity_score - ); - - fs::append(self.log_file, entry).await?; - } - - pub fn generate_security_report(&self) -> SecurityReport { - SecurityReport { - total_commands: self.events.len(), - allowed_auto: self.events.iter().filter(|e| matches!(e.permission, CommandPermission::Allow)).count(), - blocked: self.events.iter().filter(|e| matches!(e.permission, CommandPermission::Block)).count(), - asked: self.events.iter().filter(|e| matches!(e.permission, CommandPermission::Ask(_))).count(), - learned_commands: self.count_learned_patterns(), - } - } -} -``` - -**Key Advantages of This Security Model**: - -1. **Minimal Interruptions**: Known safe commands run automatically -2. **Repository-Specific**: Each project has its own security profile -3. **Intelligent Matching**: Handles command variations via fuzzy match + synonyms -4. **Learning System**: Reduces prompts over time by learning from user decisions -5. **Lightning Fast**: Aho-Corasick automata provides nanosecond exact matching -6. **Context-Aware**: Permissions can depend on branch, files, time, etc. -7. **Audit Trail**: Complete security log for compliance/review - -This security model makes Terraphim the **safest code assistant** while being the **least intrusive**. - ---- - -## 9. Testing & Quality Assurance - -### 9.1 Testing Requirements - -**Mandatory Rules**: -1. ❌ **No mocks in tests** (from Aider and OpenCode) -2. ✅ **Integration tests over unit tests** for file operations -3. ✅ **Benchmark-driven development** (from Aider) -4. ✅ **Coverage tracking** with minimum thresholds - -```typescript -class TestRunner { - async runTests(files: string[]): Promise { - // 1. Run affected tests - const tests = await this.findAffectedTests(files); - - console.log(`Running ${tests.length} affected tests...`); - const result = await this.execute(tests); - - // 2. Check coverage - if (this.config.coverageEnabled) { - const coverage = await this.calculateCoverage(files); - - if (coverage < this.config.minCoverage) { - throw new InsufficientCoverageError( - `Coverage ${coverage}% is below minimum ${this.config.minCoverage}%` - ); - } - } - - return result; - } - - async runBenchmarks(): Promise { - // Run performance benchmarks - const benchmarks = await this.findBenchmarks(); - - const results = []; - for (const benchmark of benchmarks) { - console.log(`Running benchmark: ${benchmark.name}`); - const result = await this.executeBenchmark(benchmark); - results.push(result); - - // Check regression - const baseline = await this.getBaseline(benchmark.name); - if (result.duration > baseline * 1.1) { // 10% regression threshold - console.warn(`⚠️ Performance regression detected: ${benchmark.name}`); - } - } - - return { benchmarks: results }; - } -} -``` - -### 9.2 Benchmark-Driven Development (Aider Approach) - -```python -class ExercismBenchmark: - """Test against Exercism programming problems""" - - def run_benchmark(self, model: str) -> BenchmarkResult: - problems = self.load_exercism_problems() - - results = { - 'passed': 0, - 'failed': 0, - 'errors': 0, - 'times': [] - } - - for problem in problems: - start = time.time() - - try: - # Have AI solve the problem - solution = self.ai_solve(problem, model) - - # Run test suite - test_result = self.run_problem_tests(problem, solution) - - if test_result.passed: - results['passed'] += 1 - else: - results['failed'] += 1 - - except Exception as e: - results['errors'] += 1 - print(f"Error on {problem.name}: {e}") - - duration = time.time() - start - results['times'].append(duration) - - return results -``` - ---- - -## 10. Feature Comparison & Priorities - -### 10.1 Complete Feature Matrix - -| Feature | Claude Code | Aider | OpenCode | Required | Priority | -|---------|-------------|-------|----------|----------|----------| -| **Editing** | -| Tool-based edit | ✅ | ❌ | ✅ | ✅ | P0 | -| Text-based SEARCH/REPLACE | ❌ | ✅ | ❌ | ✅ | P0 | -| Unified diff/patch | ✅ | ✅ | ✅ | ✅ | P0 | -| Fuzzy matching | ❌ | ✅ (0.8) | ✅ (multiple) | ✅ | P0 | -| Levenshtein distance | ❌ | ✅ | ✅ | ✅ | P0 | -| Block anchor matching | ❌ | ❌ | ✅ | ✅ | P0 | -| Whitespace-flexible | ❌ | ✅ | ✅ | ✅ | P0 | -| Dotdotdot handling | ❌ | ✅ | ❌ | ✅ | P1 | -| Context-aware matching | ❌ | ❌ | ✅ | ✅ | P1 | -| Whole file rewrite | ✅ | ✅ | ✅ | ✅ | P2 | -| **Validation** | -| Pre-tool hooks | ✅ | ❌ | ❌ | ✅ | P0 | -| Post-tool hooks | ✅ | ❌ | ❌ | ✅ | P0 | -| Pre-LLM validation | ❌ | ❌ | ❌ | ✅ | P0 | -| Post-LLM validation | ❌ | ❌ | ❌ | ✅ | P0 | -| LSP integration | ✅ (via MCP) | ❌ | ✅ (built-in) | ✅ | P0 | -| Auto-linting | ✅ (via hooks) | ✅ | ❌ | ✅ | P0 | -| Test execution | ✅ (via hooks) | ✅ | ❌ | ✅ | P1 | -| Confidence scoring | ✅ (≥80) | ❌ | ❌ | ✅ | P1 | -| **Context** | -| RepoMap (tree-sitter) | ❌ | ✅ | ❌ | ✅ | P0 | -| Dependency analysis | ❌ | ✅ (networkx) | ❌ | ✅ | P1 | -| Token management | ✅ | ✅ | ✅ | ✅ | P0 | -| Cache system | ✅ | ✅ (disk) | ✅ (memory) | ✅ | P1 | -| 100+ languages | ✅ (via MCP) | ✅ | Limited | ✅ | P1 | -| **Architecture** | -| Plugin system | ✅ | Limited | ✅ | ✅ | P0 | -| Agent system | ✅ | Single | ✅ | ✅ | P0 | -| Parallel execution | ✅ | ❌ | ❌ | ✅ | P1 | -| Event hooks | ✅ (9 types) | ❌ | Limited | ✅ | P0 | -| Client/server | ❌ | ❌ | ✅ | ✅ | P1 | -| Permission system | ✅ | .aiderignore | ✅ | ✅ | P0 | -| **Recovery** | -| Git auto-commit | ✅ | ✅ | ❌ | ✅ | P0 | -| Undo command | ❌ | ✅ | ❌ | ✅ | P1 | -| Snapshot system | ❌ | ❌ | ✅ | ✅ | P1 | -| Rollback on error | ✅ | ✅ | ✅ | ✅ | P0 | -| **User Experience** | -| Plan mode | ✅ | ❌ | ✅ | ✅ | P1 | -| Extended thinking | ✅ | ❌ | ❌ | ✅ | P2 | -| Multi-phase workflows | ✅ | ❌ | ❌ | ✅ | P2 | -| CLI | ✅ | ✅ | ✅ | ✅ | P0 | -| TUI | ❌ | ❌ | ✅ | Optional | P2 | -| Web UI | ❌ | ❌ | Possible | Optional | P3 | -| **Integration** | -| GitHub (gh CLI) | ✅ | ❌ | ❌ | ✅ | P1 | -| MCP support | ✅ | ❌ | ❌ | ✅ | P1 | -| Multi-provider LLM | ✅ | ✅ (200+) | ✅ | ✅ | P0 | -| Local models | ✅ | ✅ | ✅ | ✅ | P1 | - -**Priority Levels**: -- **P0**: Critical - Must have for MVP -- **P1**: Important - Include in v1.0 -- **P2**: Nice to have - Include in v1.1+ -- **P3**: Optional - Future consideration - ---- - -## 11. Implementation Roadmap - -### Phase 1: Core Foundation (Weeks 1-2) -**Goal**: Basic file editing with validation - -- [ ] Project setup and architecture -- [ ] Tool-based editor (Strategy 1) -- [ ] Text-based SEARCH/REPLACE parser (Strategy 2.1-2.3) -- [ ] Pre-tool validation hooks -- [ ] Post-tool validation hooks -- [ ] Permission system (basic) -- [ ] Git auto-commit -- [ ] CLI interface - -**Deliverable**: Can apply edits using tools OR text-based fallback with basic validation - -### Phase 2: Advanced Editing (Weeks 3-4) -**Goal**: Robust multi-strategy editing - -- [ ] Levenshtein fuzzy matching (Strategy 2.4) -- [ ] Context-aware matching (Strategy 2.5) -- [ ] Dotdotdot handling (Strategy 2.6) -- [ ] Unified diff/patch support (Strategy 3) -- [ ] Whole file rewrite (Strategy 4) -- [ ] Edit orchestrator with fallback chain -- [ ] Diff generation for all strategies - -**Deliverable**: Highly reliable edit application with 9+ fallback strategies - -### Phase 3: Validation Pipeline (Weeks 5-6) -**Goal**: 4-layer validation system - -- [ ] Pre-LLM validation layer -- [ ] Post-LLM validation layer -- [ ] LSP manager (TypeScript, Python, Rust, Go) -- [ ] Auto-linter integration -- [ ] Test runner integration -- [ ] Confidence scoring system -- [ ] Error recovery with rollback - -**Deliverable**: Complete validation pipeline catching errors at every stage - -### Phase 4: Context Management (Weeks 7-8) -**Goal**: Intelligent codebase understanding - -- [ ] Tree-sitter integration -- [ ] RepoMap implementation -- [ ] Language query definitions (20+ languages) -- [ ] Dependency graph builder (networkx) -- [ ] File ranking algorithm (PageRank-style) -- [ ] Token budget management -- [ ] Disk cache system - -**Deliverable**: Automatic discovery of relevant code across codebase - -### Phase 5: Agent System (Weeks 9-10) -**Goal**: Multi-agent orchestration - -- [ ] Agent base class -- [ ] Specialized agents (developer, reviewer, debugger, etc.) -- [ ] Agent orchestrator -- [ ] Parallel execution engine -- [ ] Agent isolation (context, permissions) -- [ ] Inter-agent communication - -**Deliverable**: Multiple specialized agents working in parallel - -### Phase 6: Plugin Architecture (Weeks 11-12) -**Goal**: Extensibility and customization - -- [ ] Plugin loader -- [ ] Hook system (9+ event types) -- [ ] Command registration -- [ ] Custom tool registration -- [ ] Plugin marketplace (design) -- [ ] Configuration system -- [ ] Plugin API documentation - -**Deliverable**: Fully extensible system via plugins - -### Phase 7: Advanced Features (Weeks 13-14) -**Goal**: Polish and advanced capabilities - -- [ ] Plan mode -- [ ] Multi-phase workflows -- [ ] Snapshot system -- [ ] Extended thinking mode -- [ ] GitHub integration (gh CLI) -- [ ] MCP server/client -- [ ] Client/server architecture - -**Deliverable**: Feature-complete system matching/exceeding existing tools - -### Phase 8: Testing & Quality (Weeks 15-16) -**Goal**: Production-ready quality - -- [ ] Integration test suite -- [ ] Benchmark suite (Exercism-style) -- [ ] Coverage tracking -- [ ] Performance profiling -- [ ] Security audit -- [ ] Documentation -- [ ] User guides - -**Deliverable**: Production-ready v1.0 release - ---- - -## 12. Technical Specifications - -### 12.1 Tech Stack - -**Language**: TypeScript + Rust (for performance-critical parts) - -**Justification**: -- TypeScript: Rapid development, rich ecosystem, strong typing -- Rust: Performance-critical components (tree-sitter parsing, fuzzy matching) - -**Core Libraries**: -```json -{ - "dependencies": { - "tree-sitter": "^0.20.0", - "tree-sitter-cli": "^0.20.0", - "levenshtein-edit-distance": "^3.0.0", - "diff": "^5.1.0", - "diff-match-patch": "^1.0.5", - "networkx": "via WASM or JS port", - "anthropic-sdk": "^0.9.0", - "openai": "^4.20.0", - "hono": "^3.11.0", - "ws": "^8.14.0", - "commander": "^11.1.0", - "chalk": "^5.3.0", - "ora": "^7.0.1", - "simple-git": "^3.20.0" - } -} -``` - -### 12.2 File Structure - -``` -code-assistant/ -├── packages/ -│ ├── core/ -│ │ ├── src/ -│ │ │ ├── edit/ -│ │ │ │ ├── strategies/ -│ │ │ │ │ ├── tool-based.ts -│ │ │ │ │ ├── search-replace.ts -│ │ │ │ │ ├── patch.ts -│ │ │ │ │ └── whole-file.ts -│ │ │ │ ├── orchestrator.ts -│ │ │ │ └── index.ts -│ │ │ ├── validation/ -│ │ │ │ ├── pre-llm.ts -│ │ │ │ ├── post-llm.ts -│ │ │ │ ├── pre-tool.ts -│ │ │ │ ├── post-tool.ts -│ │ │ │ └── pipeline.ts -│ │ │ ├── context/ -│ │ │ │ ├── repo-map.ts -│ │ │ │ ├── tree-sitter.ts -│ │ │ │ ├── dependency-graph.ts -│ │ │ │ └── token-manager.ts -│ │ │ ├── agent/ -│ │ │ │ ├── base.ts -│ │ │ │ ├── developer.ts -│ │ │ │ ├── reviewer.ts -│ │ │ │ ├── debugger.ts -│ │ │ │ └── orchestrator.ts -│ │ │ ├── lsp/ -│ │ │ │ ├── manager.ts -│ │ │ │ ├── server.ts -│ │ │ │ └── diagnostics.ts -│ │ │ ├── recovery/ -│ │ │ │ ├── git.ts -│ │ │ │ ├── snapshot.ts -│ │ │ │ └── manager.ts -│ │ │ ├── permission/ -│ │ │ │ ├── manager.ts -│ │ │ │ └── config.ts -│ │ │ └── plugin/ -│ │ │ ├── loader.ts -│ │ │ ├── hook.ts -│ │ │ └── registry.ts -│ │ └── package.json -│ ├── server/ -│ │ ├── src/ -│ │ │ ├── api/ -│ │ │ ├── session/ -│ │ │ └── index.ts -│ │ └── package.json -│ ├── cli/ -│ │ ├── src/ -│ │ │ ├── commands/ -│ │ │ ├── ui/ -│ │ │ └── index.ts -│ │ └── package.json -│ └── fuzzy-matcher/ (Rust via WASM) -│ ├── src/ -│ │ ├── lib.rs -│ │ ├── levenshtein.rs -│ │ └── block-anchor.rs -│ └── Cargo.toml -├── plugins/ -│ ├── example-plugin/ -│ └── ... -├── benchmarks/ -│ ├── exercism/ -│ └── performance/ -├── tests/ -│ ├── integration/ -│ └── e2e/ -└── docs/ - ├── api/ - ├── guides/ - └── architecture/ -``` - -### 12.3 Configuration Schema - -```typescript -interface CodeAssistantConfig { - // LLM Providers - llm: { - provider: 'anthropic' | 'openai' | 'google' | 'local'; - model: string; - apiKey?: string; - baseUrl?: string; - maxTokens?: number; - }; - - // Validation - validation: { - preLLM: boolean; - postLLM: boolean; - preTool: boolean; - postTool: boolean; - autoLint: boolean; - autoTest: boolean; - confidenceThreshold: number; // 0-1 - }; - - // Editing - editing: { - strategies: string[]; // Order to try strategies - fuzzyThreshold: number; // 0-1 - contextLines: number; // Lines of context for matching - }; - - // Context Management - context: { - repoMapEnabled: boolean; - maxTokens: number; - cacheDir: string; - languages: string[]; - }; - - // LSP - lsp: { - [language: string]: { - command: string; - args: string[]; - rootPatterns: string[]; - }; - }; - - // Permissions - permissions: { - edit: 'allow' | 'deny' | 'ask'; - bash: { - [pattern: string]: 'allow' | 'deny' | 'ask'; - }; - webfetch: 'allow' | 'deny' | 'ask'; - git: { - push: 'allow' | 'deny' | 'ask'; - force: 'allow' | 'deny' | 'ask'; - }; - }; - - // Recovery - recovery: { - autoCommit: boolean; - snapshotEnabled: boolean; - snapshotDir: string; - }; - - // Agents - agents: { - [name: string]: { - enabled: boolean; - permissions: Partial; - prompt?: string; - }; - }; - - // Plugins - plugins: string[]; - - // Testing - testing: { - minCoverage: number; // 0-100 - benchmarkEnabled: boolean; - }; -} -``` - ---- - -## 13. Success Criteria - -The coding assistant will be considered superior when it achieves: - -### 13.1 Reliability -- [ ] **95%+ edit success rate** on first attempt across diverse codebases -- [ ] **Zero data loss** - all changes recoverable via git or snapshots -- [ ] **100% validation coverage** - no unchecked tool execution - -### 13.2 Performance -- [ ] **<2s latency** for simple edits (tool-based) -- [ ] **<5s latency** for fuzzy-matched edits -- [ ] **<10s latency** for RepoMap generation (cached) -- [ ] **Handle 1000+ file repositories** efficiently - -### 13.3 Quality -- [ ] **≥90% test coverage** for core modules -- [ ] **Zero critical security vulnerabilities** -- [ ] **LSP errors caught before commit** (when LSP available) -- [ ] **Confidence-filtered feedback** reduces noise by 50%+ - -### 13.4 Usability -- [ ] **No manual file path specification** - auto-discover via RepoMap -- [ ] **One-command feature implementation** using multi-phase workflows -- [ ] **Undo in <1s** using git or snapshots -- [ ] **Clear error messages** with actionable suggestions - -### 13.5 Extensibility -- [ ] **10+ built-in agents** for common tasks -- [ ] **Plugin system** enables community extensions -- [ ] **Hook system** allows custom validation/automation -- [ ] **MCP compatibility** for tool integration - ---- - -## 14. Conclusion - -This requirements document specifies a coding assistant that combines: - -1. **Aider's Reliability**: Text-based editing with multiple fallback strategies, works without tool support -2. **OpenCode's Validation**: Built-in LSP integration, 9+ edit strategies, immediate feedback -3. **Claude Code's Intelligence**: Multi-agent orchestration, confidence scoring, event-driven hooks - -**Key Innovations**: -- **4-layer validation** (pre-LLM, post-LLM, pre-tool, post-tool) -- **9+ edit strategies** with automatic fallback -- **RepoMap context management** using tree-sitter -- **Built-in LSP integration** for real-time diagnostics -- **Multi-agent parallel execution** for complex tasks -- **Git + snapshot dual recovery** system - -**The result**: A coding assistant that is more reliable than Aider, more intelligent than Claude Code, and more validating than OpenCode, while remaining fully extensible through plugins and hooks. - ---- - -**Next Steps**: -1. Review and approve this requirements document -2. Set up development environment -3. Begin Phase 1 implementation -4. Establish CI/CD pipeline for continuous testing -5. Create plugin API and documentation -6. Build benchmark suite for measuring progress - -**Estimated Timeline**: 16 weeks to v1.0 production release -**Team Size**: 2-4 developers recommended -**Language**: TypeScript + Rust (WASM for performance-critical parts) diff --git a/.docs/constraints-analysis.md b/.docs/constraints-analysis.md deleted file mode 100644 index 0dbd9244a..000000000 --- a/.docs/constraints-analysis.md +++ /dev/null @@ -1,257 +0,0 @@ -# Terraphim AI Release Constraints Analysis - -## Business Constraints - -### Release Frequency and Cadence -- **Continuous Delivery Pressure**: Community expects regular updates with bug fixes -- **Feature Release Timeline**: New features need predictable release windows -- **Patch Release Speed**: Security fixes must be deployed rapidly -- **Backward Compatibility**: Must maintain API stability between major versions -- **Version Bumping Strategy**: Semantic versioning with clear breaking change policies - -### Community and User Expectations -- **Zero-Downtime Updates**: Production deployments should not require service interruption -- **Rollback Capability**: Users need ability to revert problematic updates -- **Multi-Version Support**: Ability to run multiple versions concurrently for testing -- **Documentation同步**: Release notes must match actual changes -- **Transparent Roadmap**: Clear communication about future changes and deprecations - -### License and Compliance Requirements -- **Open Source Compliance**: All licenses must be properly declared -- **Third-Party Dependencies**: SPDX compliance and vulnerability disclosure -- **Export Controls**: No restricted cryptographic components without compliance -- **Data Privacy**: GDPR and privacy law compliance for user data handling -- **Attribution Requirements**: Proper credit for open source dependencies - -## Technical Constraints - -### Multi-Platform Build Complexity - -#### Architecture Support Matrix -| Architecture | Build Tool | Cross-Compilation | Testing Capability | -|--------------|------------|-------------------|--------------------| -| x86_64-linux | Native | Not needed | Full CI/CD | -| aarch64-linux | Cross | QEMU required | Limited testing | -| armv7-linux | Cross | QEMU required | Limited testing | -| x86_64-macos | Native (self-hosted) | Not needed | Partial testing | -| aarch64-macos | Native (self-hosted) | Not needed | Partial testing | -| x86_64-windows | Native | Not needed | Full CI/CD | - -#### Toolchain Dependencies -- **Rust Version**: Consistent toolchain across all platforms -- **Cross-Compilation Tools**: QEMU, binutils for non-native builds -- **System Libraries**: Platform-specific dependency management -- **Certificate Signing**: Platform-specific code signing certificates -- **Package Building**: cargo-deb, cargo-rpm, Tauri bundler tools - -### Dependency Management Constraints - -#### System-Level Dependencies -```toml -# Example dependency constraints -[dependencies] -# Core dependencies with version ranges -tokio = { version = "1.0", features = ["full"] } -serde = { version = "1.0", features = ["derive"] } -clap = { version = "4.0", features = ["derive"] } - -# Platform-specific dependencies -[target.'cfg(unix)'.dependencies] -nix = "0.27" - -[target.'cfg(windows)'.dependencies] -winapi = { version = "0.3", features = ["winuser"] } - -[target.'cfg(target_os = "macos")'.dependencies] -core-foundation = "0.9" -``` - -#### Package Manager Conflicts -- **APT (Debian/Ubuntu)**: Conflicts with existing packages, dependency versions -- **RPM (RHEL/CentOS/Fedora)**: Different naming conventions, requires explicit dependencies -- **Pacman (Arch)**: AUR package maintenance, user expectations for PKGBUILD standards -- **Homebrew**: Formula maintenance, bottle building for pre-compiled binaries - -### Build Infrastructure Constraints - -#### GitHub Actions Limitations -- **Runner Availability**: Limited self-hosted runners for macOS builds -- **Build Time Limits**: 6-hour job timeout for complex builds -- **Storage Limits**: Artifact storage and retention policies -- **Concurrency Limits**: Parallel job execution restrictions -- **Network Bandwidth**: Large binary upload/download constraints - -#### Resource Requirements -- **Memory Usage**: Cross-compilation can be memory-intensive -- **CPU Time**: Multi-architecture builds require significant compute -- **Storage Space**: Build cache management across platforms -- **Network I/O**: Dependency downloads and artifact uploads - -## User Experience Constraints - -### Installation Simplicity - -#### One-Command Installation Goals -```bash -# Ideal user experience -curl -fsSL https://install.terraphim.ai | sh - -# Should handle automatically: -# - Platform detection -# - Architecture detection -# - Package manager selection -# - Dependency resolution -# - Service configuration -# - User setup -``` - -#### Package Manager Integration -- **Zero Configuration**: Default settings work out of the box -- **Service Management**: Automatic systemd/launchd service setup -- **User Permissions**: Appropriate file permissions and user groups -- **Path Integration**: Proper PATH and environment setup -- **Documentation**: Manual pages and help system integration - -### Update Reliability - -#### Auto-Updater Requirements -- **Atomic Updates**: Never leave system in broken state -- **Rollback Support**: Ability to revert to previous version -- **Configuration Preservation**: User settings survive updates -- **Service Continuity**: Minimal downtime during updates -- **Progress Indication**: Clear feedback during update process - -#### Update Failure Scenarios -- **Network Interruption**: Handle partial downloads gracefully -- **Disk Space**: Verify adequate space before update -- **Permission Issues**: Handle permission denied scenarios -- **Service Conflicts**: Manage running services during update -- **Dependency Conflicts**: Resolve version incompatibilities - -### Performance Expectations - -#### Binary Size Constraints -| Component | Target Size | Current Size | Optimization Opportunities | -|----------|-------------|--------------|---------------------------| -| Server | < 15MB | 12.8MB | Strip symbols, optimize build | -| TUI | < 8MB | 7.2MB | Reduce dependencies | -| Desktop | < 50MB | 45.3MB | Asset optimization | -| Docker | < 200MB | 180MB | Multi-stage builds | - -#### Startup Performance -- **Server Cold Start**: < 3 seconds to ready state -- **TUI Response**: < 500ms initial interface -- **Desktop Launch**: < 2 seconds to usable state -- **Container Startup**: < 5 seconds to service ready -- **Memory Usage**: Server < 100MB baseline, Desktop < 200MB - -## Security Constraints - -### Code Signing and Verification - -#### Platform-Specific Requirements -- **macOS**: Apple Developer certificate, notarization required -- **Windows**: Authenticode certificate, SmartScreen compatibility -- **Linux**: GPG signatures for packages, repository trust -- **Docker**: Content trust, image signing support - -#### Certificate Management -- **Certificate Renewal**: Automated renewal before expiration -- **Key Rotation**: Secure private key management practices -- **Trust Chain**: Maintain valid certificate chains -- **Revocation Handling**: Respond to certificate compromises - -### Security Validation Requirements - -#### Vulnerability Scanning -- **Dependency Scanning**: Automated scanning of all dependencies -- **Container Scanning**: Docker image vulnerability assessment -- **Static Analysis**: Code security analysis tools integration -- **Dynamic Analysis**: Runtime security testing - -#### Integrity Verification -- **Checksum Validation**: SHA256 for all release artifacts -- **GPG Signatures**: Cryptographic verification of releases -- **Blockchain Integration**: Immutable release records (future) -- **Reproducible Builds**: Verifiable build process - -## Performance Constraints - -### Build Performance - -#### Parallelization Limits -- **Matrix Strategy**: Optimal parallel job distribution -- **Dependency Caching**: Effective build cache utilization -- **Artifact Distribution**: Efficient artifact sharing between jobs -- **Resource Allocation**: Balanced resource usage across jobs - -#### Build Time Targets -| Component | Current Time | Target Time | Optimization Strategy | -|-----------|--------------|-------------|----------------------| -| Server Binary | 8 min | 5 min | Better caching | -| Desktop App | 15 min | 10 min | Parallel builds | -| Docker Image | 12 min | 8 min | Layer optimization | -| Full Release | 45 min | 30 min | Pipeline optimization | - -### Runtime Performance - -#### Resource Utilization -- **CPU Usage**: Efficient multi-core utilization -- **Memory Management**: Minimal memory footprint -- **I/O Performance**: Optimized file operations -- **Network Efficiency**: Minimal bandwidth usage - -#### Scalability Constraints -- **Concurrent Users**: Support for multiple simultaneous connections -- **Data Volume**: Handle growing index sizes efficiently -- **Search Performance**: Sub-second response times -- **Update Frequency**: Efficient incremental updates - -## Compliance and Legal Constraints - -### Open Source Compliance - -#### License Requirements -- **MIT/Apache 2.0**: Dual license compatibility -- **Third-Party Licenses**: SPDX compliance for all dependencies -- **Attribution**: Proper license notices and acknowledgments -- **Source Availability**: Corresponding source code availability - -#### Export Controls -- **Cryptography**: Export control compliance for encryption features -- **Country Restrictions**: Geographical distribution limitations -- **Entity List Screening**: Restricted party screening processes - -### Privacy and Data Protection - -#### Data Handling Requirements -- **User Data**: Minimal data collection and processing -- **Local Storage**: No unnecessary data transmission -- **Data Retention**: Appropriate data lifecycle management -- **User Consent**: Clear privacy policies and consent mechanisms - -## Operational Constraints - -### Monitoring and Observability - -#### Release Monitoring -- **Download Metrics**: Track installation and update success rates -- **Error Reporting**: Automated error collection and analysis -- **Performance Metrics**: Real-time performance monitoring -- **User Feedback**: In-app feedback collection mechanisms - -#### Support Infrastructure -- **Documentation**: Comprehensive installation and troubleshooting guides -- **Community Support**: Issue tracking and response processes -- **Knowledge Base**: Self-service support resources -- **Escalation Process**: Clear support escalation procedures - -### Maintenance Constraints - -#### Long-Term Support -- **Version Support**: Multi-version support strategy -- **Security Updates**: Backport security fixes to older versions -- **Deprecation Policy**: Clear component deprecation timelines -- **Migration Paths**: Smooth upgrade paths between versions - -This constraints analysis provides the foundation for understanding the boundaries and requirements that the release validation system must operate within. Each constraint represents a potential failure point that must be monitored and validated during the release process. \ No newline at end of file diff --git a/.docs/design-ai-assistant-haystack.md b/.docs/design-ai-assistant-haystack.md deleted file mode 100644 index 94e8650fa..000000000 --- a/.docs/design-ai-assistant-haystack.md +++ /dev/null @@ -1,255 +0,0 @@ -# Design & Implementation Plan: AI Assistant Session Haystack - -## 1. Summary of Target Behavior - -A **unified haystack** for searching across AI coding assistant session logs. Uses `claude-log-analyzer`'s connector system to support: - -| Connector | Source ID | Format | Default Path | -|-----------|-----------|--------|--------------| -| Claude Code | `claude-code` | JSONL | `~/.claude/projects/` | -| OpenCode | `opencode` | JSONL | `~/.opencode/` | -| Cursor IDE | `cursor` | SQLite | `~/.config/Cursor/User/` | -| Aider | `aider` | Markdown | `~/projects/.aider.chat.history.md` | -| Codex | `codex` | JSONL | Codex CLI data | - -Users configure haystacks with `ServiceType::AiAssistant` and specify the connector via `extra_parameters["connector"]`. - -### Example Configurations - -```json -{ - "haystacks": [ - { - "name": "Claude Sessions", - "service": "AiAssistant", - "location": "~/.claude/projects/", - "extra_parameters": { - "connector": "claude-code" - } - }, - { - "name": "OpenCode Sessions", - "service": "AiAssistant", - "location": "~/.opencode/", - "extra_parameters": { - "connector": "opencode" - } - }, - { - "name": "Cursor Chats", - "service": "AiAssistant", - "location": "~/.config/Cursor/User/", - "extra_parameters": { - "connector": "cursor" - } - } - ] -} -``` - -## 2. Key Invariants and Acceptance Criteria - -### Invariants -- **I1**: Session files are read-only (never modified by haystack) -- **I2**: All Documents have unique IDs (`{connector}:{session_id}:{message_idx}`) -- **I3**: Each connector uses its own parsing logic via `SessionConnector` trait -- **I4**: All connectors produce `NormalizedSession` → `Document` mapping - -### Acceptance Criteria -- **AC1**: `ServiceType::AiAssistant` compiles and is recognized -- **AC2**: Config with `connector: "claude-code"` indexes Claude sessions -- **AC3**: Config with `connector: "opencode"` indexes OpenCode sessions -- **AC4**: Config with `connector: "cursor"` indexes Cursor chats -- **AC5**: Config with `connector: "aider"` indexes Aider history -- **AC6**: Search term matches message content, session title, project path -- **AC7**: Invalid connector name returns helpful error - -## 3. High-Level Design and Boundaries - -``` -┌────────────────────────────────────────────────────────────────┐ -│ terraphim_middleware │ -├────────────────────────────────────────────────────────────────┤ -│ indexer/mod.rs │ -│ └─ search_haystacks() │ -│ └─ match ServiceType::AiAssistant │ -│ └─ AiAssistantHaystackIndexer.index() │ -├────────────────────────────────────────────────────────────────┤ -│ haystack/ │ -│ ├─ mod.rs (add ai_assistant module) │ -│ └─ ai_assistant.rs (NEW) │ -│ ├─ AiAssistantHaystackIndexer │ -│ └─ Uses ConnectorRegistry to get connector │ -└────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌────────────────────────────────────────────────────────────────┐ -│ claude-log-analyzer │ -├────────────────────────────────────────────────────────────────┤ -│ connectors/mod.rs │ -│ ├─ SessionConnector trait │ -│ ├─ ConnectorRegistry (finds connectors) │ -│ ├─ NormalizedSession (unified session format) │ -│ └─ NormalizedMessage (unified message format) │ -│ │ -│ connectors/ │ -│ ├─ ClaudeCodeConnector (claude-code) │ -│ ├─ OpenCodeConnector (opencode) │ -│ ├─ CursorConnector (cursor) │ -│ ├─ AiderConnector (aider) │ -│ └─ CodexConnector (codex) │ -└────────────────────────────────────────────────────────────────┘ -``` - -### Key Design Decisions - -1. **Single ServiceType**: `AiAssistant` instead of 5 separate types -2. **Connector Selection**: Via `extra_parameters["connector"]` -3. **Feature Flag**: `connectors` feature in claude-log-analyzer (Cursor needs SQLite) -4. **Document Mapping**: `NormalizedSession` → multiple `Document` (one per message) - -## 4. File/Module-Level Change Plan - -| File/Module | Action | Change | Dependencies | -|-------------|--------|--------|--------------| -| `terraphim_config/src/lib.rs:273` | Modify | Add `AiAssistant` to ServiceType | None | -| `terraphim_middleware/Cargo.toml` | Modify | Add `claude-log-analyzer = { features = ["connectors"] }` | claude-log-analyzer | -| `terraphim_middleware/src/haystack/mod.rs` | Modify | Add `ai_assistant` module + export | ai_assistant.rs | -| `terraphim_middleware/src/haystack/ai_assistant.rs` | Create | `AiAssistantHaystackIndexer` | claude-log-analyzer connectors | -| `terraphim_middleware/src/indexer/mod.rs` | Modify | Add match arm for `ServiceType::AiAssistant` | ai_assistant module | - -## 5. Step-by-Step Implementation Sequence - -### Step 1: Add ServiceType variant -**File**: `crates/terraphim_config/src/lib.rs` -**Change**: Add after line 273: -```rust -/// Use AI coding assistant session logs (Claude Code, OpenCode, Cursor, Aider) -AiAssistant, -``` -**Deployable**: Yes - -### Step 2: Add dependency with connectors feature -**File**: `crates/terraphim_middleware/Cargo.toml` -**Change**: Add to `[dependencies]`: -```toml -claude-log-analyzer = { path = "../claude-log-analyzer", features = ["connectors"] } -``` -**Deployable**: Yes - -### Step 3: Create ai_assistant haystack module -**File**: `crates/terraphim_middleware/src/haystack/ai_assistant.rs` (NEW) -**Structure**: -```rust -pub struct AiAssistantHaystackIndexer; - -impl IndexMiddleware for AiAssistantHaystackIndexer { - fn index(&self, needle: &str, haystack: &Haystack) -> impl Future> { - async move { - // 1. Get connector name from extra_parameters["connector"] - // 2. Get connector from ConnectorRegistry - // 3. Import sessions with connector.import() - // 4. Convert NormalizedSession/Message to Documents - // 5. Filter by needle (search term) - // 6. Return Index - } - } -} - -fn session_to_documents(session: NormalizedSession, needle: &str) -> Vec { - // One document per message that matches needle -} -``` -**Deployable**: Yes (not wired up yet) - -### Step 4: Export ai_assistant module -**File**: `crates/terraphim_middleware/src/haystack/mod.rs` -**Change**: Add: -```rust -pub mod ai_assistant; -pub use ai_assistant::AiAssistantHaystackIndexer; -``` -**Deployable**: Yes - -### Step 5: Wire up in search_haystacks() -**File**: `crates/terraphim_middleware/src/indexer/mod.rs` -**Change**: Add match arm: -```rust -ServiceType::AiAssistant => { - let indexer = AiAssistantHaystackIndexer::default(); - indexer.index(needle, haystack).await -} -``` -**Deployable**: Yes (feature complete) - -### Step 6: Add integration tests -**File**: `crates/terraphim_middleware/tests/ai_assistant_haystack_test.rs` (NEW) -**Content**: Tests for each connector type with fixtures -**Deployable**: Yes - -## 6. Testing & Verification Strategy - -| Acceptance Criteria | Test Type | Test Location | -|---------------------|-----------|---------------| -| AC1: ServiceType compiles | Compile | Automatic | -| AC2: claude-code connector | Unit | `ai_assistant.rs::tests` | -| AC3: opencode connector | Unit | `ai_assistant.rs::tests` | -| AC4: cursor connector | Integration | Needs SQLite fixture | -| AC5: aider connector | Unit | Uses markdown fixture | -| AC6: Search matches content | Unit | `ai_assistant.rs::tests` | -| AC7: Invalid connector error | Unit | `ai_assistant.rs::tests` | - -### Test Fixtures -- Create minimal session files in `terraphim_middleware/fixtures/ai_sessions/`: - - `claude-code/session.jsonl` - - `opencode/session.jsonl` - - `aider/.aider.chat.history.md` - -## 7. Risk & Complexity Review - -| Risk | Mitigation | Residual Risk | -|------|------------|---------------| -| SQLite dependency for Cursor | Feature-gate Cursor connector | Minimal | -| Large session directories | ConnectorRegistry streams efficiently | Low | -| Multiple message formats | All connectors normalize to NormalizedMessage | None | -| Missing connector name | Return clear error with valid options | None | - -## 8. Document Mapping Strategy - -Each `NormalizedMessage` becomes one `Document`: - -```rust -Document { - id: format!("{}:{}:{}", session.source, session.external_id, msg.idx), - title: format!("[{}] {}", session.source.to_uppercase(), - session.title.unwrap_or("Session".to_string())), - url: session.source_path.to_string_lossy().to_string(), - body: msg.content.clone(), - description: Some(format!( - "{} message from {} session", - msg.role, - session.source - )), - tags: Some(vec![ - session.source.clone(), - msg.role.clone(), - "ai-assistant".to_string(), - ]), - ..Default::default() -} -``` - -## 9. Open Questions / Decisions for Human Review - -1. **Granularity**: One document per message (current plan) or one per session? - - **Recommendation**: Per message for precise search results - -2. **Search scope**: Search message content only, or also session metadata? - - **Recommendation**: Both, with content weighted higher - -3. **Connector auto-detection**: Should we auto-detect if `connector` param is missing? - - **Recommendation**: No, require explicit connector for clarity - ---- - -**Do you approve this plan as-is, or would you like to adjust any part?** diff --git a/.docs/design-architecture.md b/.docs/design-architecture.md deleted file mode 100644 index e020304da..000000000 --- a/.docs/design-architecture.md +++ /dev/null @@ -1,536 +0,0 @@ -# Terraphim AI Release Validation System - Architecture Design - -## System Architecture Overview - -### High-Level Component Diagram - -``` -┌─────────────────────────────────────────────────────────────────────────────────┐ -│ Release Validation System │ -├─────────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────────┐ │ -│ │ GitHub │ │ Validation │ │ Reporting & │ │ -│ │ Release API │───▶│ Orchestrator │───▶│ Monitoring │ │ -│ │ (Input) │ │ (Core Engine) │ │ (Output) │ │ -│ └─────────────────┘ └──────────────────┘ └─────────────────────────┘ │ -│ │ │ │ │ -│ │ ┌───────────▼───────────┐ │ │ -│ │ │ Validation Pool │ │ │ -│ │ │ (Parallel Workers) │ │ │ -│ │ └───────────┬───────────┘ │ │ -│ │ │ │ │ -│ │ ┌──────────────────┼──────────────────┐ │ │ -│ │ │ │ │ │ │ -│ ┌──────▼─────┐ ┌─────────▼──────┐ ┌─────────▼─────┐ ┌─▼─────────────┐ │ -│ │ Artifact │ │ Platform │ │ Security │ │ Functional │ │ -│ │ Validator │ │ Validators │ │ Validators │ │ Test Runners │ │ -│ └─────────────┘ └────────────────┘ └────────────────┘ └──────────────┘ │ -│ │ │ │ │ │ -│ ┌──────▼─────┐ ┌─────────▼──────┐ ┌─────────▼─────┐ ┌─▼─────────────┐ │ -│ │ Docker │ │ VM/Container │ │ Security │ │ Integration │ │ -│ │ Registry │ │ Environments │ │ Scanning │ │ Tests │ │ -│ └─────────────┘ └────────────────┘ └────────────────┘ └──────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────────┘ -``` - -### Data Flow Between Components - -``` -[GitHub Release] → [Artifact Download] → [Validation Orchestrator] - ↓ -[Metadata Extraction] → [Validation Queue] → [Parallel Validation Workers] - ↓ -[Platform Testing] → [Security Scanning] → [Functional Testing] - ↓ -[Result Aggregation] → [Report Generation] → [Alert System] -``` - -### Integration Points with Existing Systems - -- **GitHub Actions**: Triggers validation workflows via webhook -- **Docker Hub**: Pulls and validates multi-arch container images -- **Package Registries**: Validates npm, PyPI, crates.io artifacts -- **Existing CI/CD**: Integrates with current release-comprehensive.yml -- **Terraphim Infrastructure**: Uses existing bigbox deployment patterns - -### Technology Stack and Tooling Choices - -- **Core Engine**: Rust with tokio async runtime (consistent with project) -- **Container Orchestration**: Docker with Buildx (existing infrastructure) -- **Web Framework**: Axum (existing server framework) -- **Database**: SQLite for validation results (lightweight, portable) -- **Monitoring**: Custom dashboards + existing logging patterns -- **Configuration**: TOML files (existing terraphim_settings pattern) - -## Core Components - -### 1. Validation Orchestrator - -**Purpose**: Central coordinator for all validation activities - -**Key Functions**: -- Process release events from GitHub API -- Schedule and coordinate validation tasks -- Manage parallel execution resources -- Aggregate results and trigger notifications - -**Technology**: Rust async service using tokio and Axum - -**API Endpoints**: -``` -POST /api/validation/start - Start validation for new release -GET /api/validation/{id} - Get validation status -GET /api/validation/{id}/report - Get validation report -``` - -### 2. Platform-Specific Validators - -**Purpose**: Validate artifacts on target platforms - -**Components**: -- **Linux Validator**: Ubuntu 20.04/22.04 validation -- **macOS Validator**: Intel and Apple Silicon validation -- **Windows Validator**: x64 architecture validation -- **Container Validator**: Docker image functionality testing - -**Validation Types**: -- Binary extraction and execution -- Dependency resolution testing -- Platform-specific integration testing -- Performance benchmarking - -### 3. Download/Installation Testers - -**Purpose**: Validate artifact integrity and installation processes - -**Functions**: -- Checksum verification (SHA256, GPG signatures) -- Installation script validation -- Package manager integration testing -- Download mirror verification - -**Supported Formats**: -- Native binaries (terraphim_server, terraphim-agent) -- Debian packages (.deb) -- Docker images (multi-arch) -- NPM packages (@terraphim/*) -- PyPI packages (terraphim-automata) -- Tauri installers (.dmg, .msi, .AppImage) - -### 4. Functional Test Runners - -**Purpose**: Execute functional validation of released components - -**Test Categories**: -- **Server Tests**: API endpoints, WebSocket connections -- **Agent Tests**: CLI functionality, TUI interface -- **Desktop Tests**: UI functionality, system integration -- **Integration Tests**: Cross-component workflows - -**Execution Pattern**: -``` -[Container Launch] → [Test Suite Execution] → [Result Collection] → [Cleanup] -``` - -### 5. Security Validators - -**Purpose**: Ensure security compliance and vulnerability scanning - -**Security Checks**: -- Static analysis (cargo audit, npm audit) -- Container image scanning (trivy, docker scout) -- Dependency vulnerability assessment -- Binary security analysis -- Code signing verification - -**Compliance Validation**: -- License compliance checking -- Export control validation -- Security policy adherence - -### 6. Reporting and Monitoring - -**Purpose**: Provide comprehensive validation insights and alerts - -**Report Types**: -- **Executive Summary**: High-level release status -- **Technical Report**: Detailed validation results -- **Security Report**: Vulnerability findings and mitigations -- **Performance Report**: Benchmarks and metrics - -**Monitoring Integration**: -- Real-time progress tracking -- Failure alerting (email, Slack, GitHub issues) -- Historical trend analysis -- Dashboard visualization - -## Data Flow Design - -### Input Sources - -``` -GitHub Release Events -├── Release metadata (version, assets, changelog) -├── Artifacts (binaries, packages, images) -├── Source code tags -└── Build artifacts -``` - -### Processing Pipeline Stages - -``` -Stage 1: Ingestion -├── GitHub API webhook processing -├── Artifact download and verification -├── Metadata extraction and normalization -└── Validation task creation - -Stage 2: Queue Management -├── Priority-based task scheduling -├── Resource allocation planning -├── Dependency resolution -└── Parallel execution orchestration - -Stage 3: Validation Execution -├── Platform-specific testing -├── Security scanning -├── Functional validation -└── Performance benchmarking - -Stage 4: Result Processing -├── Result aggregation and correlation -├── Report generation -├── Alert triggering -└── Historical data storage -``` - -### Output Destinations - -``` -Validation Results -├── GitHub Release Comments (status updates) -├── Validation Reports (JSON/HTML format) -├── Dashboard Visualizations -├── Alert Notifications -└── Historical Database Records -``` - -### Error Handling and Recovery Flows - -``` -Error Categories: -├── Transient Errors (retry with backoff) -│ ├── Network timeouts -│ ├── Resource unavailability -│ └── Temporary service failures -├── Validation Failures (continue with partial results) -│ ├── Platform-specific issues -│ ├── Security findings -│ └── Functional test failures -└── System Errors (immediate notification) - ├── Infrastructure failures - ├── Configuration errors - └── Critical system malfunctions -``` - -## Integration Architecture - -### GitHub Actions Integration Points - -``` -Existing Workflow Integration: -├── release-comprehensive.yml (build phase) -├── docker-multiarch.yml (container validation) -├── test-matrix.yml (test execution) -└── New validation-workflow.yml (post-release validation) - -Trigger Points: -├── Release creation event -├── Asset upload completion -├── Build pipeline success -└── Manual workflow dispatch -``` - -### Existing Validation Script Enhancement - -**Current Scripts to Integrate**: -- `test-matrix.sh` - Platform testing framework -- `run_test_matrix.sh` - Test orchestration -- `prove_rust_engineer_works.sh` - Functional validation -- Security testing scripts from Phase 1 & 2 - -**Enhancement Strategy**: -1. Wrap existing scripts in standardized interface -2. Add result collection and reporting -3. Integrate with orchestrator scheduling -4. Maintain backward compatibility - -### Docker and Container Orchestration - -**Container Strategy**: -``` -Validation Containers: -├── validator-base (common utilities) -├── validator-linux (Ubuntu environments) -├── validator-macos (macOS environments) -├── validator-windows (Windows environments) -└── validator-security (security scanning tools) -``` - -**Orchestration Patterns**: -- **Sequential**: Single platform validation -- **Parallel**: Multi-platform concurrent testing -- **Staged**: Progressive validation with early failure detection - -### External Service Integrations - -**Package Registries**: -- **Docker Hub**: Multi-arch image validation -- **npm Registry**: Package integrity testing -- **PyPI**: Python package validation -- **crates.io**: Rust crate validation - -**Security Services**: -- **GitHub Advisory Database**: Vulnerability checking -- **OSV Database**: Open source vulnerability data -- **Snyk**: Commercial security scanning (optional) - -## Scalability and Performance Design - -### Parallel Execution Strategies - -``` -Validation Parallelization: -├── Platform Parallelism -│ ├── Linux x86_64 validation -│ ├── Linux ARM64 validation -│ ├── macOS Intel validation -│ ├── macOS Apple Silicon validation -│ └── Windows x64 validation -├── Component Parallelism -│ ├── Server validation -│ ├── Agent validation -│ ├── Desktop validation -│ └── Container validation -└── Test Parallelism - ├── Unit test execution - ├── Integration test execution - ├── Security test execution - └── Performance test execution -``` - -### Resource Allocation and Optimization - -**Compute Resources**: -- **GitHub Actions**: Free tier for basic validation -- **Self-hosted runners**: Optimize for specific platforms -- **Cloud resources**: On-demand scaling for peak loads - -**Storage Optimization**: -- **Artifact caching**: Reuse common dependencies -- **Result compression**: Efficient historical data storage -- **Cleanup policies**: Automatic old data removal - -**Network Optimization**: -- **Artifact caching**: Local registry mirrors -- **Parallel downloads**: Optimized artifact retrieval -- **Retry strategies**: Resilient network operations - -### Caching and Reuse Mechanisms - -``` -Cache Hierarchy: -├── L1: Local build cache (GitHub Actions) -├── L2: Artifact cache (Docker layers, dependencies) -├── L3: Result cache (test results, security scans) -└── L4: Historical data (trend analysis) -``` - -**Cache Invalidation**: -- Version-based cache keys -- Dependency change detection -- Manual cache flushing for troubleshooting - -### Bottleneck Identification and Mitigation - -**Common Bottlenecks**: -1. **Artifact Download**: Parallel download optimization -2. **Container Build**: Layer caching, build parallelization -3. **Test Execution**: Smart test selection and parallelization -4. **Security Scanning**: Incremental scanning, caching -5. **Report Generation**: Template optimization, async processing - -**Mitigation Strategies**: -- **Resource Pooling**: Shared validation environments -- **Early Exit**: Fail-fast on critical issues -- **Partial Results**: Continue validation despite individual failures -- **Load Balancing**: Distribute work across available resources - -## Security Architecture - -### Secure Artifact Handling - -``` -Artifact Security Pipeline: -├── Source Verification -│ ├── GPG signature validation -│ ├── GitHub release integrity -│ └── Chain of custody tracking -├── Secure Transport -│ ├── HTTPS for all communications -│ ├── Container registry authentication -│ └── API token security -└── Secure Storage - ├── Encrypted artifact storage - ├── Access control and auditing - └── Secure disposal after validation -``` - -### Credential Management - -**Security Best Practices**: -- **GitHub Tokens**: Scoped, time-limited access tokens -- **Registry Credentials**: Encrypted storage with rotation -- **API Keys**: Environment-based injection -- **Secret Management**: Integration with 1Password CLI (existing pattern) - -**Token Scoping**: -``` -GitHub Token Permissions: -├── contents: read (access to releases) -├── issues: write (create validation issues) -├── pull-requests: write (comment on releases) -└── packages: read (access package registries) -``` - -### Isolated Execution Environments - -**Container Isolation**: -- **Docker Containers**: Sandboxed test execution -- **Resource Limits**: CPU, memory, and network restrictions -- **Network Isolation**: Restricted outbound access -- **File System Isolation**: Temporary scratch spaces - -**VM Isolation**: -- **Firecracker Integration**: Existing microVM infrastructure -- **Clean Environments**: Fresh VM instances for each validation -- **Secure Cleanup**: Complete environment sanitization - -### Audit Trail and Compliance - -**Audit Data Collection**: -- **Validation Events**: Timestamped, user-traceable -- **Artifact Provenance**: Complete chain of custody -- **Security Findings**: Detailed vulnerability reports -- **Configuration Changes**: System modification tracking - -**Compliance Features**: -- **SOC 2 Alignment**: Security controls documentation -- **GDPR Compliance**: Data handling and privacy -- **Export Control**: License and compliance checking -- **Audit Reporting**: Regular compliance reports - -## Technology Choices - -### Programming Languages and Frameworks - -**Primary Language: Rust** -- **Rationale**: Consistent with existing codebase -- **Benefits**: Performance, safety, async ecosystem -- **Key Crates**: tokio, axum, serde, reqwest, sqlx - -**Supporting Languages**: -- **Shell Scripts**: Platform-specific validation (existing) -- **Python**: Security scanning tools integration -- **JavaScript/TypeScript**: Dashboard and reporting UI - -### Container and Orchestration Platforms - -**Docker with Buildx** -- **Multi-arch Support**: native cross-platform building -- **Layer Caching**: Optimized build times -- **Registry Integration**: Push/pull from multiple registries - -**GitHub Actions** -- **Native Integration**: Existing CI/CD platform -- **Self-hosted Runners**: Platform-specific testing -- **Artifact Storage**: Built-in artifact management - -### Monitoring and Logging Solutions - -**Logging Strategy**: -- **Structured Logging**: JSON format for consistent parsing -- **Log Levels**: Debug, Info, Warn, Error with appropriate filtering -- **Log Aggregation**: Centralized log collection and analysis - -**Monitoring Stack**: -- **Health Checks**: Component health monitoring -- **Metrics Collection**: Performance and usage metrics -- **Alerting**: Multi-channel alert system -- **Dashboards**: Real-time validation status visualization - -### Database and Storage Requirements - -**SQLite Database** -- **Primary Use**: Validation results storage -- **Benefits**: Lightweight, portable, no external dependencies -- **Schema**: Versioned, migrable schema design - -**File Storage**: -- **Local Storage**: Temporary artifacts and test data -- **GitHub Storage**: Long-term report archiving -- **Cleanup Policies**: Automated storage management - -## Implementation Strategy - -### Incremental Implementation Phases - -**Phase 1: Core Infrastructure (Weeks 1-2)** -- Validation orchestrator service -- Basic GitHub webhook integration -- Simple validation task scheduling -- Basic reporting framework - -**Phase 2: Platform Validation (Weeks 3-4)** -- Linux validation pipeline -- Container validation integration -- Security scanning foundation -- Enhanced reporting capabilities - -**Phase 3: Multi-Platform Expansion (Weeks 5-6)** -- macOS and Windows validation -- Advanced security scanning -- Performance benchmarking -- Dashboard development - -**Phase 4: Production Integration (Weeks 7-8)** -- Full GitHub Actions integration -- Alert system implementation -- Historical data analysis -- Production deployment and testing - -### Integration with Existing Infrastructure - -**Leveraging Existing Patterns**: -- **1Password CLI**: Secret management integration -- **Caddy + Rsync**: Deployment patterns for dashboard -- **Rust Workspace**: Existing code structure and conventions -- **Testing Framework**: Current test patterns and utilities - -**Minimal Disruption Approach**: -- Non-breaking additions to existing workflows -- Gradual migration of current validation processes -- Backward compatibility maintenance -- Feature flags for progressive rollout - ---- - -## Conclusion - -This architecture provides a comprehensive, scalable, and maintainable release validation system that integrates seamlessly with the existing Terraphim AI infrastructure. The design follows the SIMPLE over EASY principle with clear separation of concerns, leveraging proven technologies and patterns already established in the codebase. - -The system is designed for incremental implementation, allowing for gradual rollout and validation of each component. By building on existing infrastructure and patterns, the implementation risk is minimized while maximizing value to the release process. - -The architecture emphasizes security, performance, and maintainability while providing the comprehensive validation coverage needed for a production-grade multi-platform release system. \ No newline at end of file diff --git a/.docs/design-ci-workflow-fixes.md b/.docs/design-ci-workflow-fixes.md deleted file mode 100644 index a580a3e6c..000000000 --- a/.docs/design-ci-workflow-fixes.md +++ /dev/null @@ -1,117 +0,0 @@ -# Design & Implementation Plan: Fix All CI Workflow Failures - -## 1. Summary of Target Behavior - -After implementation: -1. **Query parser** correctly treats mixed-case keywords ("oR", "Or", "AND", etc.) as concepts, not boolean operators -2. **Earthly CI/CD** includes `terraphim_ai_nodejs` in the build and passes all checks -3. **CI Optimized** workflow runs successfully with all lint/format checks passing - -## 2. Key Invariants and Acceptance Criteria - -### Invariants -- Query parser MUST only recognize lowercase keywords: "and", "or", "not" -- All workspace members in Cargo.toml MUST be copied in Earthfile -- CI workflows MUST pass without manual intervention - -### Acceptance Criteria -| Criterion | Verification Method | -|-----------|-------------------| -| "oR" is parsed as concept, not OR keyword | Proptest passes consistently | -| "AND" is parsed as concept, not AND keyword | Unit test | -| Earthly `+lint-and-format` target passes | `earthly +lint-and-format` | -| CI PR Validation workflow passes | GitHub Actions check | -| CI Optimized Main workflow passes | GitHub Actions check | - -## 3. High-Level Design and Boundaries - -### Component Changes - -**Query Parser (crates/terraphim-session-analyzer/src/kg/query.rs)** -- Change from case-insensitive to case-sensitive keyword matching -- Only exact lowercase "and", "or", "not" are treated as operators -- All other variations ("AND", "Or", "NOT") become concepts - -**Earthfile** -- Add `terraphim_ai_nodejs` to COPY commands at lines 120 and 162 -- Ensure all workspace members are synchronized - -### No Changes Required -- CI Optimized workflow file itself (failure was downstream of Earthly) -- Rate limiting configuration (already fixed) - -## 4. File/Module-Level Change Plan - -| File/Module | Action | Before | After | Dependencies | -|-------------|--------|--------|-------|--------------| -| `crates/terraphim-session-analyzer/src/kg/query.rs:69-76` | Modify | Case-insensitive keyword matching via `to_lowercase()` | Case-sensitive exact match | None | -| `Earthfile:120` | Modify | Missing `terraphim_ai_nodejs` | Include `terraphim_ai_nodejs` in COPY | None | -| `Earthfile:162` | Modify | Missing `terraphim_ai_nodejs` | Include `terraphim_ai_nodejs` in COPY | None | - -## 5. Step-by-Step Implementation Sequence - -### Step 1: Fix Query Parser Keyword Matching -**Purpose**: Make keyword matching case-sensitive so only lowercase keywords are operators -**Deployable state**: Yes - backwards compatible change, stricter parsing - -Change `word_to_token()` function: -```rust -// Before (line 70): -match word.to_lowercase().as_str() { - -// After: -match word { -``` - -This ensures: -- "and" → Token::And (operator) -- "AND" → Token::Concept("AND") (not operator) -- "oR" → Token::Concept("oR") (not operator) - -### Step 2: Add Regression Test -**Purpose**: Prevent future regressions with explicit test cases -**Deployable state**: Yes - -Add test for mixed-case keywords being treated as concepts. - -### Step 3: Update Earthfile COPY Commands -**Purpose**: Include all workspace members in build -**Deployable state**: Yes - -Modify lines 120 and 162 to include `terraphim_ai_nodejs`: -``` -COPY --keep-ts --dir terraphim_server terraphim_firecracker terraphim_ai_nodejs desktop default crates ./ -``` - -### Step 4: Verify CI Passes -**Purpose**: Confirm all fixes work together -**Deployable state**: Yes - -Run local tests and push to trigger CI. - -## 6. Testing & Verification Strategy - -| Acceptance Criteria | Test Type | Test Location | -|---------------------|-----------|---------------| -| Mixed-case keywords are concepts | Unit | `query.rs::tests::test_mixed_case_keywords` | -| Proptest passes | Property | `query.rs::tests::test_boolean_expression_parsing` | -| Earthly build succeeds | Integration | `earthly +lint-and-format` | -| CI workflows pass | E2E | GitHub Actions | - -## 7. Risk & Complexity Review - -| Risk | Mitigation | Residual Risk | -|------|------------|---------------| -| Breaking existing queries using uppercase keywords | This is intentional - uppercase should be concepts | Low - existing queries were likely incorrect | -| Earthfile change breaks other targets | Only affects COPY, not build logic | Low | -| Proptest still fails with other shrunk cases | Case-sensitive matching addresses root cause | Low | - -## 8. Open Questions / Decisions for Human Review - -None - the fix is straightforward: -1. Case-sensitive keyword matching is the correct behavior -2. All workspace members should be in Earthfile - ---- - -**Do you approve this plan as-is, or would you like to adjust any part?** diff --git a/.docs/design-claude-analyzer-terraphim-integration.md b/.docs/design-claude-analyzer-terraphim-integration.md deleted file mode 100644 index d61ab4008..000000000 --- a/.docs/design-claude-analyzer-terraphim-integration.md +++ /dev/null @@ -1,81 +0,0 @@ -# Design: Enhance claude-log-analyzer with terraphim_automata - -## Problem Statement - -The claude-log-analyzer crate has terraphim_automata integration behind a feature flag, but underutilizes its capabilities: -- Only uses `find_matches()` for basic pattern matching -- Hardcodes concept definitions instead of dynamic extraction -- Clones thesaurus on every call (inefficient) -- Doesn't use fuzzy matching, autocomplete, or graph connectivity features - -## Goals - -1. Better text processing using terraphim_automata capabilities -2. Dynamic concept/pattern learning from observed tool usage -3. Efficient thesaurus management with caching -4. Leverage graph connectivity for relationship inference - -## Implementation Plan - -### Step 1: Improve TerraphimMatcher Pattern Matching - -**File**: `src/patterns/matcher.rs` - -Current: -```rust -terraphim_find_matches(text, thesaurus.clone(), true) -``` - -Changes: -- Cache compiled automata instead of rebuilding -- Use fuzzy matching for typo tolerance -- Add context extraction for better pattern understanding - -### Step 2: Dynamic Concept Building in KnowledgeGraphBuilder - -**File**: `src/kg/builder.rs` - -Current: Hardcoded concept lists (BUN, NPM, INSTALL, etc.) - -Changes: -- Learn concepts dynamically from observed patterns -- Use terraphim_automata's thesaurus capabilities -- Build hierarchical concept relationships - -### Step 3: Enhanced Search with Graph Connectivity - -**File**: `src/kg/search.rs` - -Current: Manual proximity-based result merging - -Changes: -- Use `is_all_terms_connected_by_path()` for semantic relationships -- Improve relevance scoring with concept graph distances -- Cache search results for repeated queries - -### Step 4: Integrate Learned Patterns with Terraphim - -**File**: `src/patterns/knowledge_graph.rs` - -Current: Separate learning system from terraphim - -Changes: -- Store learned patterns in terraphim thesaurus -- Use graph structure for relationship inference -- Export/import learned knowledge - -## File Changes - -| File | Action | Purpose | -|------|--------|---------| -| `src/patterns/matcher.rs:159-290` | Modify | Cache automata, add fuzzy matching | -| `src/kg/builder.rs` | Modify | Dynamic concept learning | -| `src/kg/search.rs` | Modify | Graph connectivity for search | -| `src/patterns/knowledge_graph.rs` | Modify | Integrate with terraphim learning | - -## Acceptance Criteria - -1. Pattern matching uses cached automata (no clone per call) -2. Concepts learned dynamically from tool observation -3. Fuzzy matching available for typo-tolerant search -4. Tests pass with `--features terraphim` diff --git a/.docs/design-file-changes.md b/.docs/design-file-changes.md deleted file mode 100644 index 92f1eea82..000000000 --- a/.docs/design-file-changes.md +++ /dev/null @@ -1,427 +0,0 @@ -# Terraphim AI Release Validation System - File/Module Change Plan - -## File Structure Overview - -### New Directories and Files to be Created - -``` -crates/terraphim_validation/ # Core validation system crate -├── src/ -│ ├── lib.rs # Main library entry point -│ ├── orchestrator/ # Validation orchestration -│ │ ├── mod.rs -│ │ ├── service.rs # Main orchestrator service -│ │ ├── scheduler.rs # Task scheduling logic -│ │ └── coordinator.rs # Multi-platform coordination -│ ├── validators/ # Platform-specific validators -│ │ ├── mod.rs -│ │ ├── base.rs # Base validator trait -│ │ ├── linux.rs # Linux platform validator -│ │ ├── macos.rs # macOS platform validator -│ │ ├── windows.rs # Windows platform validator -│ │ ├── container.rs # Docker/container validator -│ │ └── security.rs # Security validator -│ ├── artifacts/ # Artifact management -│ │ ├── mod.rs -│ │ ├── downloader.rs # Artifact download logic -│ │ ├── verifier.rs # Checksum/signature verification -│ │ └── registry.rs # Registry interface -│ ├── testing/ # Functional test runners -│ │ ├── mod.rs -│ │ ├── runner.rs # Test execution framework -│ │ ├── integration.rs # Integration test suite -│ │ └── performance.rs # Performance benchmarking -│ ├── reporting/ # Results and monitoring -│ │ ├── mod.rs -│ │ ├── generator.rs # Report generation -│ │ ├── dashboard.rs # Dashboard data API -│ │ └── alerts.rs # Alert system -│ ├── config/ # Configuration management -│ │ ├── mod.rs -│ │ ├── settings.rs # Configuration structures -│ │ └── environment.rs # Environment handling -│ └── types.rs # Shared type definitions -├── tests/ # Integration tests -│ ├── end_to_end.rs # Full workflow tests -│ ├── platform_validation.rs # Platform-specific tests -│ └── security_validation.rs # Security validation tests -├── fixtures/ # Test fixtures -│ ├── releases/ # Sample release data -│ └── artifacts/ # Test artifacts -├── Cargo.toml -└── README.md - -validation_scripts/ # Enhanced validation scripts -├── validation-orchestrator.sh # Main validation orchestrator -├── platform-validation.sh # Platform-specific validation -├── security-validation.sh # Security scanning scripts -├── functional-validation.sh # Functional test runner -├── artifact-validation.sh # Artifact integrity checks -└── report-generation.sh # Report generation scripts - -validation_config/ # Configuration files -├── validation.toml # Main validation configuration -├── platforms.toml # Platform-specific settings -├── security.toml # Security scanning config -└── alerts.toml # Alert configuration - -.github/workflows/validation/ # New validation workflows -├── release-validation.yml # Main release validation -├── platform-validation.yml # Platform-specific validation -├── security-validation.yml # Security scanning workflow -└── validation-reporting.yml # Report generation workflow - -docker/validation/ # Validation container images -├── base/ # Base validation image -│ └── Dockerfile -├── linux/ # Linux validation image -│ └── Dockerfile -├── macos/ # macOS validation image -│ └── Dockerfile -├── windows/ # Windows validation image -│ └── Dockerfile -└── security/ # Security scanning image - └── Dockerfile - -docs/validation/ # Documentation -├── README.md # Validation system overview -├── architecture.md # Architecture documentation -├── configuration.md # Configuration guide -├── troubleshooting.md # Troubleshooting guide -└── api-reference.md # API documentation - -tests/validation/ # Validation test suites -├── unit/ # Unit tests -├── integration/ # Integration tests -├── e2e/ # End-to-end tests -└── fixtures/ # Test data and fixtures -``` - -## Existing Files to Modify - -### Core Workspace Files -- **Cargo.toml** - Add terraphim_validation crate to workspace members -- **crates/terraphim_config/Cargo.toml** - Add validation configuration dependencies -- **crates/terraphim_settings/default/settings.toml** - Add validation settings - -### Script Enhancements -- **scripts/validate-release.sh** - Integrate with new validation system -- **scripts/test-matrix.sh** - Add validation test scenarios -- **scripts/run_test_matrix.sh** - Incorporate validation workflows -- **scripts/prove_rust_engineer_works.sh** - Enhance functional validation - -### GitHub Actions Workflows -- **.github/workflows/release-comprehensive.yml** - Add validation trigger points -- **.github/workflows/test-matrix.yml** - Include validation test matrix -- **.github/workflows/docker-multiarch.yml** - Add container validation steps - -### Documentation Updates -- **README.md** - Add validation system overview -- **CONTRIBUTING.md** - Include validation testing guidelines -- **AGENTS.md** - Update agent instructions for validation - -## File Change Tables - -### New Core Files - -| File Path | Purpose | Type | Key Functionality | Dependencies | Complexity | Risk | -|-----------|---------|------|-------------------|--------------|------------|------| -| `crates/terraphim_validation/Cargo.toml` | Crate configuration | New | Dependencies, features | Workspace config | Low | Low | -| `crates/terraphim_validation/src/lib.rs` | Main library | New | Public API, re-exports | Internal modules | Medium | Low | -| `crates/terraphim_validation/src/orchestrator/service.rs` | Core orchestrator | New | Validation coordination | GitHub API, async | High | Medium | -| `crates/terraphim_validation/src/validators/base.rs` | Base validator | New | Common validator traits | Async traits | Medium | Low | -| `crates/terraphim_validation/src/validators/linux.rs` | Linux validator | New | Linux-specific validation | Docker, containers | High | Medium | -| `crates/terraphim_validation/src/artifacts/downloader.rs` | Artifact download | New | GitHub release downloads | reqwest, async | Medium | Low | -| `crates/terraphim_validation/src/config/settings.rs` | Configuration | New | Settings management | serde, toml | Low | Low | -| `validation_scripts/validation-orchestrator.sh` | Main orchestrator script | New | End-to-end validation | Docker, gh CLI | Medium | Medium | - -### Modified Existing Files - -| File Path | Purpose | Type | Key Changes | Dependencies | Complexity | Risk | -|-----------|---------|------|-------------|--------------|------------|------| -| `Cargo.toml` | Workspace config | Modify | Add validation crate | N/A | Low | Low | -| `scripts/validate-release.sh` | Release validation | Modify | Integration with new system | Validation crate | Medium | Medium | -| `.github/workflows/release-comprehensive.yml` | Release workflow | Modify | Add validation trigger | Validation workflows | High | High | -| `crates/terraphim_settings/default/settings.toml` | Settings | Modify | Add validation config | Validation config | Low | Low | - -## Module Dependencies - -### Dependency Graph - -``` -terraphim_validation (Core Crate) -├── orchestrator -│ ├── service.rs (depends on: validators, artifacts, reporting) -│ ├── scheduler.rs (depends on: config, types) -│ └── coordinator.rs (depends on: all validators) -├── validators -│ ├── base.rs (trait definition) -│ ├── linux.rs (depends on: artifacts, config) -│ ├── macos.rs (depends on: artifacts, config) -│ ├── windows.rs (depends on: artifacts, config) -│ ├── container.rs (depends on: artifacts) -│ └── security.rs (depends on: artifacts, reporting) -├── artifacts -│ ├── downloader.rs (depends on: config, types) -│ ├── verifier.rs (depends on: config) -│ └── registry.rs (depends on: config) -├── testing -│ ├── runner.rs (depends on: validators, artifacts) -│ ├── integration.rs (depends on: all modules) -│ └── performance.rs (depends on: testing/runner) -├── reporting -│ ├── generator.rs (depends on: types, config) -│ ├── dashboard.rs (depends on: generator) -│ └── alerts.rs (depends on: generator) -└── config - ├── settings.rs (depends on: types) - └── environment.rs (depends on: settings) -``` - -### Interface Definitions and Contracts - -#### Core Validator Trait -```rust -#[async_trait] -pub trait Validator: Send + Sync { - type Result: ValidationResult; - type Config: ValidatorConfig; - - async fn validate(&self, artifact: &Artifact, config: &Self::Config) -> Result; - fn name(&self) -> &'static str; - fn supported_platforms(&self) -> Vec; -} -``` - -#### Orchestrator Service Interface -```rust -pub trait ValidationOrchestrator: Send + Sync { - async fn start_validation(&self, release: Release) -> Result; - async fn get_status(&self, id: ValidationId) -> Result; - async fn get_report(&self, id: ValidationId) -> Result; -} -``` - -### Data Structures and Shared Types - -```rust -// Core types -pub struct ValidationId(pub Uuid); -pub struct Release { - pub version: String, - pub tag: String, - pub artifacts: Vec, - pub metadata: ReleaseMetadata, -} -pub struct Artifact { - pub name: String, - pub url: String, - pub checksum: Option, - pub platform: Platform, - pub artifact_type: ArtifactType, -} - -// Validation results -pub struct ValidationResult { - pub validator_name: String, - pub status: ValidationStatus, - pub details: ValidationDetails, - pub duration: Duration, - pub issues: Vec, -} -``` - -## Implementation Order - -### Phase 1: Core Infrastructure (Weeks 1-2) - -1. **Create Base Crate Structure** - - `crates/terraphim_validation/Cargo.toml` - - `crates/terraphim_validation/src/lib.rs` - - `crates/terraphim_validation/src/types.rs` - -2. **Configuration System** - - `crates/terraphim_validation/src/config/mod.rs` - - `crates/terraphim_validation/src/config/settings.rs` - - `validation_config/validation.toml` - -3. **Base Validator Framework** - - `crates/terraphim_validation/src/validators/base.rs` - - `crates/terraphim_validation/src/artifacts/downloader.rs` - -4. **Basic Orchestrator** - - `crates/terraphim_validation/src/orchestrator/scheduler.rs` - - `crates/terraphim_validation/src/orchestrator/service.rs` - -**Prerequisites**: Rust workspace setup, basic dependencies -**Rollback**: Remove crate from workspace, revert workspace Cargo.toml - -### Phase 2: Platform Validation (Weeks 3-4) - -1. **Linux Validator** - - `crates/terraphim_validation/src/validators/linux.rs` - - `docker/validation/linux/Dockerfile` - -2. **Container Validator** - - `crates/terraphim_validation/src/validators/container.rs` - - Integration with existing `docker-multiarch.yml` - -3. **Security Validator** - - `crates/terraphim_validation/src/validators/security.rs` - - Security scanning scripts - -4. **Basic Reporting** - - `crates/terraphim_validation/src/reporting/generator.rs` - - `validation_scripts/report-generation.sh` - -**Prerequisites**: Phase 1 completion, container infrastructure -**Rollback**: Disable validators in config, remove specific validators - -### Phase 3: Multi-Platform Expansion (Weeks 5-6) - -1. **macOS and Windows Validators** - - `crates/terraphim_validation/src/validators/macos.rs` - - `crates/terraphim_validation/src/validators/windows.rs` - -2. **Functional Test Runners** - - `crates/terraphim_validation/src/testing/runner.rs` - - `crates/terraphim_validation/src/testing/integration.rs` - -3. **Advanced Reporting** - - `crates/terraphim_validation/src/reporting/dashboard.rs` - - `crates/terraphim_validation/src/reporting/alerts.rs` - -4. **Enhanced Workflows** - - `.github/workflows/validation/release-validation.yml` - - `.github/workflows/validation/platform-validation.yml` - -**Prerequisites**: Phase 2 completion, multi-platform CI access -**Rollback**: Platform-specific feature flags - -### Phase 4: Production Integration (Weeks 7-8) - -1. **Workflow Integration** - - Modify `scripts/validate-release.sh` - - Update `.github/workflows/release-comprehensive.yml` - -2. **Performance Optimization** - - `crates/terraphim_validation/src/testing/performance.rs` - - Caching and optimization improvements - -3. **Documentation and Training** - - `docs/validation/` documentation files - - Agent instruction updates - -4. **Production Deployment** - - Final testing and validation - - Production configuration deployment - -**Prerequisites**: All previous phases, production approval -**Rollback**: Feature flags, workflow reversion - -## Risk Assessment - -### High-Risk Changes and Mitigation Strategies - -| Risk | Impact | Mitigation Strategy | -|------|---------|---------------------| -| **GitHub Actions Workflow Integration** | High - Could break releases | Feature flags, gradual rollout, extensive testing | -| **Multi-platform Container Validation** | High - Resource intensive | Resource limits, parallel execution control | -| **Security Scanning Integration** | High - False positives/negatives | Tuning, baseline establishment, manual review | -| **Database Schema Changes** | Medium - Data migration | Versioned schemas, migration scripts, backward compatibility | - -### Breaking Changes and Compatibility Considerations - -| Change | Breaking? | Compatibility Strategy | -|--------|-----------|------------------------| -| **New Validation Crate** | No | Pure addition, no breaking changes | -| **Enhanced validate-release.sh** | Minimal | Maintain backward compatibility flags | -| **GitHub Actions Changes** | Yes | Use feature flags, parallel workflows | -| **Configuration Structure** | Minimal | Migration scripts, backward-compatible defaults | - -### Rollback Plans for Each Significant Change - -#### Core Crate Implementation -- **Rollback**: Remove from workspace Cargo.toml, delete crate directory -- **Time**: 5 minutes -- **Impact**: Low (no production usage yet) - -#### GitHub Actions Integration -- **Rollback**: Revert workflow files, disable validation triggers -- **Time**: 10 minutes -- **Impact**: Medium (release process continues without validation) - -#### Container Validation System -- **Rollback**: Disable in configuration, stop containers -- **Time**: 15 minutes -- **Impact**: Medium (reverts to script-based validation) - -#### Security Scanning Integration -- **Rollback**: Disable security validators, remove from pipeline -- **Time**: 5 minutes -- **Impact**: Low (security checks become manual) - -## Testing Requirements Per File - -### Core Crate Files -- **Unit tests**: All modules require >90% coverage -- **Integration tests**: Cross-module interactions -- **Mock services**: GitHub API, container orchestration - -### Script Files -- **Syntax validation**: Shellcheck compliance -- **Integration tests**: End-to-end execution -- **Error handling**: Failure scenario testing - -### Configuration Files -- **Schema validation**: TOML structure verification -- **Default values**: Configuration loading tests -- **Environment handling**: Variable substitution tests - -### Workflow Files -- **Syntax validation**: YAML structure verification -- **Integration tests**: Actual workflow execution -- **Security tests**: Permission and secret handling - -## Context Integration - -### Existing Project Structure Integration - -The validation system leverages existing Terraphim AI patterns: - -- **Rust Workspace Structure**: Follows established crate organization -- **Configuration Management**: Integrates with terraphim_settings -- **Container Infrastructure**: Builds on existing Docker patterns -- **GitHub Actions**: Extends current CI/CD workflows -- **Security Practices**: Aligns with 1Password integration patterns - -### Non-Breaking Integration with Current Workflows - -- **Gradual Feature Rollout**: Use feature flags for progressive deployment -- **Backward Compatibility**: Maintain existing script interfaces -- **Parallel Validation**: Run alongside current validation during transition -- **Fallback Mechanisms**: Graceful degradation when validation fails - -### Multi-Platform Validation Requirements - -- **Cross-Platform Support**: Linux, macOS, Windows, and containers -- **Architecture Coverage**: x86_64, ARM64, and other target architectures -- **Package Formats**: Native binaries, DEB/RPM, Docker images, npm packages -- **Registry Integration**: Docker Hub, npm registry, PyPI, crates.io - -### Performance and Scalability Considerations - -- **Parallel Execution**: Concurrent platform validation -- **Resource Management**: Efficient container and VM usage -- **Caching Strategies**: Artifact and result caching -- **Scalable Architecture**: Horizontal scaling for large releases - ---- - -## Conclusion - -This file/module change plan provides a comprehensive, incremental approach to implementing the Terraphim AI release validation system. The plan is designed to minimize risk while maximizing value through careful staging, rollback capabilities, and extensive testing at each phase. - -The implementation follows established Terraphim AI patterns and conventions, ensuring seamless integration with the existing codebase and infrastructure. The modular design allows for progressive enhancement and adaptation to changing requirements while maintaining system stability and reliability. - -By following this structured approach, the validation system will provide comprehensive release coverage, improve release quality, and enable confident multi-platform deployments of Terraphim AI components. \ No newline at end of file diff --git a/.docs/design-firecracker-e2e-test-fixes.md b/.docs/design-firecracker-e2e-test-fixes.md deleted file mode 100644 index 0027a1cbc..000000000 --- a/.docs/design-firecracker-e2e-test-fixes.md +++ /dev/null @@ -1,165 +0,0 @@ -# Design & Implementation Plan: Firecracker E2E Test Fixes - -## 1. Summary of Target Behavior - -After implementation: -- E2E tests execute successfully using `bionic-test` VM type (verified working) -- Tests create VMs, execute commands, and verify results -- Commands execute in <200ms inside VMs -- VMs are cleaned up after test execution to prevent stale VM accumulation -- Test failures provide clear error messages indicating root cause - -## 2. Key Invariants and Acceptance Criteria - -### Invariants -| ID | Invariant | Verification | -|----|-----------|--------------| -| INV-1 | Default VM type must have valid images | Test startup validates VM type | -| INV-2 | VM commands execute within timeout | 5-second timeout per command | -| INV-3 | Test cleanup prevents VM accumulation | Cleanup runs in teardown | - -### Acceptance Criteria -| ID | Criterion | Testable | -|----|-----------|----------| -| AC-1 | E2E test passes with bionic-test VM type | Run test with `--ignored` flag | -| AC-2 | All 3 test commands execute with exit_code=0 | Assert exit codes in test | -| AC-3 | LearningCoordinator records >= 3 successes | Assert stats after execution | -| AC-4 | Test VM is deleted after test completion | Verify VM count after test | -| AC-5 | Boot wait reduced from 10s to 3s (VM boots in 0.2s) | Test timing assertion | - -## 3. High-Level Design and Boundaries - -### Components Affected - -``` -┌─────────────────────────────────────────────────────────────┐ -│ E2E Test Flow │ -├─────────────────────────────────────────────────────────────┤ -│ 1. Test Setup │ -│ └─> Validate fcctl-web health │ -│ └─> Create VM with bionic-test type ← CHANGE │ -│ └─> Wait 3s for boot ← CHANGE (was 10s) │ -│ │ -│ 2. Test Execution │ -│ └─> Execute commands via VmCommandExecutor │ -│ └─> Record results in LearningCoordinator │ -│ │ -│ 3. Test Teardown ← NEW │ -│ └─> Delete test VM │ -│ └─> Verify cleanup │ -└─────────────────────────────────────────────────────────────┘ -``` - -### Boundaries -- **Changes inside** `terraphim_github_runner` crate only -- **No changes** to fcctl-web (external) -- **No changes** to VmCommandExecutor (working correctly) -- **Minimal changes** to SessionManagerConfig default - -## 4. File/Module-Level Change Plan - -| File | Action | Before | After | Dependencies | -|------|--------|--------|-------|--------------| -| `src/session/manager.rs:98` | Modify | `default_vm_type: "focal-optimized"` | `default_vm_type: "bionic-test"` | None | -| `tests/end_to_end_test.rs:137,162` | Modify | `sleep(10)` wait | `sleep(3)` wait | None | -| `tests/end_to_end_test.rs:~365` | Add | No cleanup | Add VM deletion in teardown | reqwest client | - -### Detailed Changes - -**File 1: `src/session/manager.rs`** -- Line 98: Change default VM type string -- Responsibility: Provide working default for all session consumers -- Side-effects: Any code using `SessionManagerConfig::default()` gets correct VM type - -**File 2: `tests/end_to_end_test.rs`** -- Lines 137, 162: Reduce boot wait from 10s to 3s -- After line 362: Add cleanup section to delete test VM -- Responsibility: Test now self-cleans after execution - -## 5. Step-by-Step Implementation Sequence - -### Step 1: Change Default VM Type -**Purpose**: Fix root cause - incorrect default VM type -**File**: `src/session/manager.rs` -**Change**: Line 98: `"focal-optimized"` → `"bionic-test"` -**Deployable**: Yes (backwards compatible - just changes default) -**Feature flag**: No - -### Step 2: Reduce Boot Wait Time -**Purpose**: Optimize test speed (VMs boot in 0.2s, not 10s) -**File**: `tests/end_to_end_test.rs` -**Change**: Lines 137, 162: `Duration::from_secs(10)` → `Duration::from_secs(3)` -**Deployable**: Yes (test-only change) -**Feature flag**: No - -### Step 3: Add Test Cleanup -**Purpose**: Prevent stale VM accumulation (150 VM limit) -**File**: `tests/end_to_end_test.rs` -**Change**: Add cleanup block after assertions to delete test VM -**Deployable**: Yes (test-only change) -**Feature flag**: No - -### Step 4: Run and Verify E2E Test -**Purpose**: Validate all changes work together -**Command**: `cargo test -p terraphim_github_runner end_to_end_real_firecracker_vm -- --ignored --nocapture` -**Expected**: All 3 commands execute successfully, cleanup completes - -## 6. Testing & Verification Strategy - -| Acceptance Criteria | Test Type | Verification Method | -|---------------------|-----------|---------------------| -| AC-1: E2E passes | E2E | Run `end_to_end_real_firecracker_vm` test | -| AC-2: Commands succeed | E2E | Assert `all_success == true`, `executed_count == 3` | -| AC-3: Learning records | E2E | Assert `learning_stats.total_successes >= 3` | -| AC-4: VM cleanup | E2E | Query `/api/vms` after test, verify test VM deleted | -| AC-5: Fast boot wait | E2E | Test completes in <30s total (was ~60s) | - -### Test Execution Plan -```bash -# 1. Ensure fcctl-web is running -curl http://127.0.0.1:8080/health - -# 2. Set auth token -export FIRECRACKER_AUTH_TOKEN="" - -# 3. Run E2E test -cargo test -p terraphim_github_runner end_to_end_real_firecracker_vm -- --ignored --nocapture - -# 4. Verify no leaked VMs (optional manual check) -curl -H "Authorization: Bearer $JWT" http://127.0.0.1:8080/api/vms | jq '.vms | length' -``` - -## 7. Risk & Complexity Review - -| Risk | Mitigation | Residual Risk | -|------|------------|---------------| -| focal-optimized needed later | Document in CLAUDE.md that bionic-test is preferred | Low - can add focal images if needed | -| fcctl-web unavailable | Test already checks health, fails fast | Low - expected for ignored test | -| JWT expiration | Test uses env var, user controls token | Low - standard practice | -| VM cleanup fails | Add error handling, log warning but don't fail test | Low - minor resource leak | -| 3s boot wait insufficient | bionic-test boots in 0.2s, 3s is 15x margin | Very Low | - -## 8. Open Questions / Decisions for Human Review - -1. **Cleanup on failure**: Should we clean up VM even if test assertions fail? - - **Recommendation**: Yes, use `defer`-style cleanup pattern - -2. **Stale VM batch cleanup**: Should we add a cleanup of ALL user VMs at test start? - - **Recommendation**: No, could interfere with other running tests - -3. **Documentation update**: Should we update `END_TO_END_PROOF.md` with new test instructions? - - **Recommendation**: Yes, after implementation verified - ---- - -## Implementation Checklist - -- [ ] Step 1: Change `SessionManagerConfig::default()` VM type to `bionic-test` -- [ ] Step 2: Reduce boot wait from 10s to 3s in test -- [ ] Step 3: Add VM cleanup in test teardown -- [ ] Step 4: Run E2E test and verify all criteria pass -- [ ] Step 5: Commit changes with clear message - ---- - -**Do you approve this plan as-is, or would you like to adjust any part?** diff --git a/.docs/design-llmrouter-integration.md b/.docs/design-llmrouter-integration.md deleted file mode 100644 index 43e2e77b3..000000000 --- a/.docs/design-llmrouter-integration.md +++ /dev/null @@ -1,107 +0,0 @@ -### Step 3: Adapter Layer - Library Mode ✅ COMPLETE - -**Files Created:** -- `crates/terraphim_service/src/llm/routed_adapter.rs` - Library mode adapter -- `crates/terraphim_service/src/llm/proxy_client.rs` - External service mode (stub for now) - -**Key Features:** -- `RoutedLlmClient` wraps `GenAiLlmClient` with intelligent routing -- Graceful degradation: routing failure → static client fallback -- Debug logging for routing decisions and fallbacks -- Feature flag: `llm_router_enabled` controls routing behavior -- Name: "routed_llm" (distinguishes from underlying client) - -**Files Modified:** -- `crates/terraphim_config/src/llm_router.rs` - Configuration types -- `crates/terraphim_config/src/lib.rs` - Added router module import and fields to `Role` struct - -**Current Status:** -- ✅ Workspace integration complete (Step 1) -- ✅ Configuration types complete (Step 2) -- ✅ Adapter layer implementation complete (Step 3 - library mode) -- 🔄 Service mode adapter: Stub created (not full implementation) -- ✅ Compilation successful: \`cargo test -p terraphim_service llm_router --lib\` - -**Next Step:** Step 4 - Integration Point (modify \`build_llm_from_role\` to use \`RoutedLlmClient\`) - -**Note:** Service mode proxy client is stubbed - full external service mode implementation deferred to future phases based on complexity and requirements. - -### Step 3B: Service Mode Adapter ✅ COMPLETE - -**Status:** **COMPLETE** ✅ - -**Implementation Summary:** -- ✅ **External Proxy Client Created:** `crates/terraphim_service/src/llm/proxy_client.rs` implements HTTP client for service mode - - ProxyClientConfig with configurable base URL and timeout - - Routes all requests through external terraphim-llm-proxy on port 3456 - - Request/Response transformation for compatibility - - Streaming support (stub for now, enhanced in later steps) - -- ✅ **Proxy Types Re-exported:** `crates/terraphim_service/src/llm/proxy_types.rs` provides clean interface - - Re-exports: RouterConfig, RouterMode, RouterStrategy, Priority from proxy - - Avoids workspace member path resolution issues - - Unit tests verify HTTP client behavior and JSON parsing - -- ✅ **Dual-Mode Support:** Both Library (in-process) and Service (HTTP proxy) modes fully functional - - Library mode: Direct use of GenAiLlmClient via RoutedLlmClient adapter - - Service mode: External HTTP proxy client with request/response transformation - -- ✅ **Workspace Configuration:** - - Added `terraphim_llm-proxy` as workspace member - - Terraphim Service and Server crates can reference proxy as dependency - - Path resolution: `../terraphim-llm-proxy` works correctly - -- ✅ **Graceful Degradation Implemented:** - - Service mode (external proxy) fails gracefully - - Library mode (in-process router) fails gracefully - - Both modes support fallback to static LLM clients - - Matches specification interview decisions (Option A, B, B, etc.) - -- ✅ **Build Verification:** - - `cargo test -p terraphim_service llm_router --lib` passes all tests - - Feature flag `llm_router` functional - - Compiles successfully with workspace member - -**Files Modified:** -- `Cargo.toml` - Added `terraphim_llm-proxy` to workspace members -- `terraphim_server/Cargo.toml` - Added `llm_router` feature flag -- `terraphim_service/Cargo.toml` - Added `terraphim_llm_proxy` dependency and feature - -**Files Created:** -- `crates/terraphim_service/src/llm/proxy_types.rs` - Clean type re-exports -- `crates/terraphim_service/src/llm/proxy_client.rs` - HTTP proxy client implementation -- `crates/terraphim_service/src/llm/routed_adapter.rs` - Modified to use ProxyLlmClient - -**Current Status:** -- ✅ Workspace integration: Complete (Step 1) -- ✅ Configuration types: Complete (Step 2) -- ✅ Adapter layer: Complete (Step 3A - library mode) -- ✅ Adapter layer: Complete (Step 3B - service mode) - -**Architecture Achieved:** -``` -Terraphim AI Main Application - ├─ LlmRouterConfig (Role-based) - ├─ RoutedLlmClient (library mode) - │ └─ GenAiLlmClient - └─ ProxyLlmClient (service mode) - └─ HTTP Client - └─ External terraphim-llm-proxy (port 3456) -``` - -**Next Steps:** -- Step 4: Integration Point - Modify `build_llm_from_role()` in llm.rs to create RoutedLlmClient when `llm_router_enabled` -- Step 5: Service Mode Integration - Add HTTP proxy mode to server if needed -- Step 6: Testing - Integration tests and end-to-end tests -- Step 7: Advanced Features - Cost optimization, performance metrics -- Step 8-10: Production readiness - Documentation, monitoring, deployment - -**Estimated Effort:** -- Step 1 (Research): 1 day ✅ -- Step 2 (Design): 1 day ✅ -- Step 3A (Library Adapter): 1 day ✅ -- Step 3B (Service Adapter): 1 day ✅ -- Remaining steps 4-10: 5-7 days estimated -- **Total: 8-9 days** - -**Ready to proceed with Step 4 (Integration Point modification)? diff --git a/.docs/design-macos-homebrew-publication.md b/.docs/design-macos-homebrew-publication.md deleted file mode 100644 index 44fb7795a..000000000 --- a/.docs/design-macos-homebrew-publication.md +++ /dev/null @@ -1,322 +0,0 @@ -# Design & Implementation Plan: macOS Release Artifacts and Homebrew Publication - -## 1. Summary of Target Behavior - -After implementation, the system will: - -1. **Build universal macOS binaries** combining arm64 and x86_64 architectures using `lipo` -2. **Sign binaries** with Apple Developer ID certificate for Gatekeeper approval -3. **Notarize binaries** with Apple for malware scanning verification -4. **Publish to Homebrew tap** at `terraphim/homebrew-terraphim` -5. **Auto-update formulas** with correct SHA256 checksums on each release - -**User experience after implementation:** -```bash -# One-time setup -brew tap terraphim/terraphim - -# Install any tool -brew install terraphim/terraphim/terraphim-server -brew install terraphim/terraphim/terraphim-agent - -# No Gatekeeper warnings - binaries are signed and notarized -terraphim_server --version -``` - -## 2. Key Invariants and Acceptance Criteria - -### Invariants - -| Invariant | Guarantee | -|-----------|-----------| -| Binary universality | Every macOS binary contains both arm64 and x86_64 slices | -| Signature validity | All binaries pass `codesign --verify --deep --strict` | -| Notarization status | All binaries pass `spctl --assess --type execute` | -| Formula correctness | SHA256 checksums match downloaded artifacts exactly | -| Version consistency | Formula version matches GitHub release tag | - -### Acceptance Criteria - -| ID | Criterion | Verification Method | -|----|-----------|---------------------| -| AC1 | `brew install terraphim/terraphim/terraphim-server` succeeds on Intel Mac | Manual test on Intel Mac | -| AC2 | `brew install terraphim/terraphim/terraphim-server` succeeds on Apple Silicon Mac | Manual test on M1/M2/M3 Mac | -| AC3 | Installed binary runs without Gatekeeper warning | Launch binary, no security dialog | -| AC4 | `file $(which terraphim_server)` shows "universal binary" | Command output verification | -| AC5 | Release workflow completes without manual intervention | GitHub Actions log review | -| AC6 | Formula SHA256 matches release artifact | `shasum -a 256` comparison | -| AC7 | `brew upgrade terraphim-server` pulls new version after release | Version comparison after upgrade | - -## 3. High-Level Design and Boundaries - -### Architecture Overview - -``` -┌─────────────────────────────────────────────────────────────────────┐ -│ release-comprehensive.yml │ -├─────────────────────────────────────────────────────────────────────┤ -│ ┌────────────────────────┐ ┌────────────────────────┐ │ -│ │ build-binaries │ │ build-binaries │ │ -│ │ x86_64-apple-darwin │ │ aarch64-apple-darwin │ │ -│ │ [self-hosted,macOS,X64]│ │ [self-hosted,macOS,ARM]│ │ -│ └──────────┬─────────────┘ └──────────┬─────────────┘ │ -│ │ │ │ -│ └─────────┬─────────────────┘ │ -│ ▼ │ -│ ┌───────────────────────────────────────┐ │ -│ │ create-universal-macos │ NEW JOB │ -│ │ runs-on: [self-hosted, macOS, ARM64]│ (M3 Pro) │ -│ │ - Download both artifacts │ │ -│ │ - lipo -create universal │ │ -│ │ - Upload universal artifact │ │ -│ └──────────────────┬────────────────────┘ │ -│ ▼ │ -│ ┌───────────────────────────────────────┐ │ -│ │ sign-and-notarize-macos │ NEW JOB │ -│ │ runs-on: [self-hosted, macOS, ARM64]│ (M3 Pro) │ -│ │ - Import certificate from 1Password │ │ -│ │ - codesign --sign "Developer ID" │ │ -│ │ - xcrun notarytool submit │ │ -│ │ - Upload signed artifacts │ │ -│ └──────────────────┬────────────────────┘ │ -│ ▼ │ -│ ┌───────────────────────────────────────┐ │ -│ │ create-release (existing) │ MODIFIED │ -│ │ - Include signed macOS binaries │ │ -│ │ - All platforms in one release │ │ -│ └──────────────────┬────────────────────┘ │ -│ ▼ │ -│ ┌───────────────────────────────────────┐ │ -│ │ update-homebrew-tap │ NEW JOB │ -│ │ runs-on: ubuntu-latest │ │ -│ │ - Clone homebrew-terraphim │ │ -│ │ - Update formula versions │ │ -│ │ - Update SHA256 checksums │ │ -│ │ - Commit and push │ │ -│ └───────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────────────┘ - -┌─────────────────────────────────────────────────────────────────────┐ -│ terraphim/homebrew-terraphim (NEW REPO) │ -├─────────────────────────────────────────────────────────────────────┤ -│ Formula/ │ -│ ├── terraphim-server.rb # Server formula with universal binary │ -│ ├── terraphim-agent.rb # TUI formula with universal binary │ -│ └── terraphim.rb # Meta-formula (optional, installs all) │ -└─────────────────────────────────────────────────────────────────────┘ -``` - -### Component Responsibilities - -| Component | Responsibility | Changes | -|-----------|---------------|---------| -| `release-comprehensive.yml` | Orchestrates full release pipeline | Add 3 new jobs | -| `create-universal-macos` job | Combines arch-specific binaries | New | -| `sign-and-notarize-macos` job | Apple code signing and notarization | New | -| `update-homebrew-tap` job | Updates formulas in tap repository | New | -| `homebrew-terraphim` repo | Hosts Homebrew formulas | New repository | -| `scripts/sign-macos-binary.sh` | Reusable signing script | New | -| `scripts/update-homebrew-formula.sh` | Formula update script | Modify existing | - -### Boundaries - -**Inside this change:** -- `release-comprehensive.yml` workflow modifications -- New shell scripts for signing -- New Homebrew tap repository -- New formula files - -**Outside this change (no modifications):** -- `publish-tauri.yml` - Desktop app has separate signing -- `package-release.yml` - Linux/Arch packages unchanged -- Existing Linux Homebrew formulas in `homebrew-formulas/` -- Rust source code - -## 4. File/Module-Level Change Plan - -| File/Module | Action | Before | After | Dependencies | -|-------------|--------|--------|-------|--------------| -| `.github/workflows/release-comprehensive.yml` | Modify | Builds separate arch binaries, placeholder Homebrew step | Adds universal binary, signing, notarization, and Homebrew update jobs | Self-hosted macOS runner, 1Password | -| `scripts/sign-macos-binary.sh` | Create | N/A | Signs and notarizes a macOS binary | Xcode CLI tools, Apple credentials | -| `scripts/update-homebrew-formula.sh` | Modify | Updates Linux checksums only | Updates macOS universal binary URL and checksum | GitHub CLI | -| `terraphim/homebrew-terraphim` (repo) | Create | N/A | Homebrew tap repository with formulas | GitHub organization access | -| `homebrew-terraphim/Formula/terraphim-server.rb` | Create | N/A | Formula for server binary | Release artifacts | -| `homebrew-terraphim/Formula/terraphim-agent.rb` | Create | N/A | Formula for TUI binary | Release artifacts | -| `1Password vault` | Modify | Tauri signing keys only | Add Apple Developer ID cert + credentials | Apple Developer account | - -### New 1Password Items Required - -| Item | Type | Contents | -|------|------|----------| -| `apple.developer.certificate` | Document | Developer ID Application certificate (.p12) | -| `apple.developer.certificate.password` | Password | Certificate import password | -| `apple.developer.credentials` | Login | APPLE_ID, APPLE_TEAM_ID, APPLE_APP_SPECIFIC_PASSWORD | - -## 5. Step-by-Step Implementation Sequence - -### Phase A: Infrastructure Setup (No Code Signing) - -| Step | Action | Purpose | Deployable? | -|------|--------|---------|-------------| -| A1 | Create `terraphim/homebrew-terraphim` repository on GitHub | Establish tap location | Yes | -| A2 | Add initial `Formula/terraphim-server.rb` with source build | Basic formula structure | Yes, but builds from source | -| A3 | Add initial `Formula/terraphim-agent.rb` with source build | Basic formula structure | Yes, but builds from source | -| A4 | Test `brew tap terraphim/terraphim && brew install terraphim-server` | Verify tap works | Yes | -| A5 | Add `create-universal-macos` job to `release-comprehensive.yml` | Create universal binaries | Yes, produces unsigned universals | -| A6 | Update formulas to use pre-built universal binaries (unsigned) | Faster installation | Yes, Gatekeeper warnings expected | - -### Phase B: Code Signing Pipeline - -| Step | Action | Purpose | Deployable? | -|------|--------|---------|-------------| -| B1 | Store Apple Developer ID certificate in 1Password | Secure credential storage | N/A | -| B2 | Store Apple credentials (ID, Team ID, App Password) in 1Password | Notarization auth | N/A | -| B3 | Create `scripts/sign-macos-binary.sh` | Reusable signing logic | N/A (script only) | -| B4 | Add `sign-and-notarize-macos` job to workflow | Integrate signing into CI | Yes | -| B5 | Test signing with manual workflow dispatch | Verify signing works | Yes, test release only | -| B6 | Verify notarization status with `spctl` | Confirm Gatekeeper approval | Yes | - -### Phase C: Homebrew Automation - -| Step | Action | Purpose | Deployable? | -|------|--------|---------|-------------| -| C1 | Add GitHub PAT for homebrew-terraphim repo access | Cross-repo commits | N/A | -| C2 | Create `update-homebrew-tap` job in workflow | Automate formula updates | Yes | -| C3 | Modify `scripts/update-homebrew-formula.sh` for macOS | Handle universal binary URLs | Yes | -| C4 | Test full release cycle with tag push | End-to-end verification | Yes | -| C5 | Document installation in README | User documentation | Yes | - -### Phase D: Cleanup and Polish - -| Step | Action | Purpose | Deployable? | -|------|--------|---------|-------------| -| D1 | Remove placeholder `update-homebrew` step from workflow | Clean up dead code | Yes | -| D2 | Archive old `homebrew-formulas/` directory | Consolidate to tap | Yes | -| D3 | Add Homebrew badge to README | Discoverability | Yes | -| D4 | Create release checklist documentation | Operational runbook | Yes | - -## 6. Testing & Verification Strategy - -| Acceptance Criteria | Test Type | Test Location/Method | -|---------------------|-----------|---------------------| -| AC1: Intel Mac install | Manual E2E | Run on Intel Mac hardware | -| AC2: Apple Silicon install | Manual E2E | Run on M1/M2/M3 Mac hardware | -| AC3: No Gatekeeper warning | Manual E2E | First launch after install | -| AC4: Universal binary | Integration | `file` command in workflow | -| AC5: Workflow completion | Integration | GitHub Actions status | -| AC6: SHA256 match | Integration | Workflow checksum step | -| AC7: Upgrade works | Manual E2E | Version bump and upgrade test | - -### Automated Verification Steps (in workflow) - -```yaml -# Verify universal binary -- name: Verify universal binary - run: | - file artifacts/terraphim_server-universal-apple-darwin | grep -q "universal binary" - -# Verify signature -- name: Verify code signature - run: | - codesign --verify --deep --strict artifacts/terraphim_server-universal-apple-darwin - -# Verify notarization -- name: Verify notarization - run: | - spctl --assess --type execute artifacts/terraphim_server-universal-apple-darwin -``` - -## 7. Risk & Complexity Review - -| Risk (from Phase 1) | Mitigation in Design | Residual Risk | -|---------------------|---------------------|---------------| -| Notarization fails for Rust binaries | Test with simple binary in Phase B5; check entitlements | May need `--options runtime` or entitlements.plist | -| Self-hosted runner unavailable | Document manual release procedure; alert on runner offline | Manual intervention required if runner down | -| Cross-compilation fails for arm64 | Existing workflow already builds aarch64 successfully | Low - already working | -| Certificate expiration | Add 1Password expiry monitoring; document renewal | Requires annual renewal attention | -| Homebrew tap push fails | Use dedicated GitHub PAT with repo scope; test in Phase C4 | May need org admin for initial setup | - -### New Risks Identified - -| Risk | Likelihood | Impact | Mitigation | -|------|------------|--------|------------| -| Apple notarization service unavailable | Low | Medium | Add retry logic with exponential backoff | -| 1Password CLI rate limiting | Low | Low | Cache credentials within job | -| Formula syntax errors | Medium | Low | Test formula locally before push | -| Universal binary size too large | Low | Low | Acceptable tradeoff for compatibility | - -## 8. Confirmed Decisions - -### Decisions Made (2024-12-20) - -| Decision | Choice | Rationale | -|----------|--------|-----------| -| Homebrew tap repository | `terraphim/homebrew-terraphim` | Follows Homebrew conventions | -| Formula organization | Separate formulas per binary | User preference for granularity | -| Signing scope | All GitHub Release binaries | Consistency across distribution channels | -| ARM runner availability | `[self-hosted, macOS, ARM64]` M3 Pro | Native arm64 builds, no cross-compilation needed | - -### Runner Configuration - -**Available self-hosted macOS runners:** - -| Runner Label | Architecture | Use Case | -|--------------|--------------|----------| -| `[self-hosted, macOS, X64]` | Intel x86_64 | Build x86_64 binaries natively | -| `[self-hosted, macOS, ARM64]` | Apple Silicon M3 Pro | Build arm64 binaries natively | - -**Updated build strategy:** Build each architecture on native hardware (no cross-compilation), then combine with `lipo` on either runner. - -### Remaining Setup Required - -1. **Apple Developer Program enrollment** - See `.docs/guide-apple-developer-setup.md` -2. **1Password credential storage** - After enrollment, store in `TerraphimPlatform` vault -3. **GitHub PAT for tap repo** - Create token with `repo` scope after tap creation - ---- - -## Appendix: Formula Template - -```ruby -# Formula/terraphim-server.rb -class TerraphimServer < Formula - desc "Privacy-first AI assistant HTTP server with semantic search" - homepage "https://github.com/terraphim/terraphim-ai" - version "VERSION_PLACEHOLDER" - license "Apache-2.0" - - on_macos do - if Hardware::CPU.arm? - url "https://github.com/terraphim/terraphim-ai/releases/download/vVERSION_PLACEHOLDER/terraphim_server-universal-apple-darwin" - else - url "https://github.com/terraphim/terraphim-ai/releases/download/vVERSION_PLACEHOLDER/terraphim_server-universal-apple-darwin" - end - sha256 "SHA256_PLACEHOLDER" - end - - on_linux do - url "https://github.com/terraphim/terraphim-ai/releases/download/vVERSION_PLACEHOLDER/terraphim_server-x86_64-unknown-linux-gnu" - sha256 "LINUX_SHA256_PLACEHOLDER" - end - - def install - bin.install "terraphim_server-universal-apple-darwin" => "terraphim_server" if OS.mac? - bin.install "terraphim_server-x86_64-unknown-linux-gnu" => "terraphim_server" if OS.linux? - end - - service do - run opt_bin/"terraphim_server" - keep_alive true - log_path var/"log/terraphim-server.log" - error_log_path var/"log/terraphim-server-error.log" - end - - test do - assert_match "terraphim", shell_output("#{bin}/terraphim_server --version") - end -end -``` - ---- - -**Do you approve this plan as-is, or would you like to adjust any part?** diff --git a/.docs/design-phase2-server-api-testing.md b/.docs/design-phase2-server-api-testing.md deleted file mode 100644 index 891ee8945..000000000 --- a/.docs/design-phase2-server-api-testing.md +++ /dev/null @@ -1,1151 +0,0 @@ -# Terraphim AI Server API Testing Framework Design - -## Overview - -This document outlines a comprehensive testing framework for the Terraphim AI server API to ensure robust release validation. The framework covers all HTTP endpoints, providing systematic testing for functionality, performance, and security. - -## Server API Testing Strategy - -### API Endpoint Coverage - -Based on the current server implementation (`terraphim_server/src/api.rs`), the following endpoints require comprehensive testing: - -#### Core System Endpoints -- `GET /health` - Health check endpoint -- `GET /config` - Fetch current configuration -- `POST /config` - Update configuration -- `GET /config/schema` - Get configuration JSON schema -- `POST /config/selected_role` - Update selected role - -#### Document Management Endpoints -- `POST /documents` - Create new document -- `GET /documents/search` - Search documents (GET method) -- `POST /documents/search` - Search documents (POST method) -- `POST /documents/summarize` - Generate document summary -- `POST /documents/async_summarize` - Async document summarization -- `POST /summarization/batch` - Batch document summarization - -#### Summarization Queue Management -- `GET /summarization/status` - Check summarization capabilities -- `GET /summarization/queue/stats` - Queue statistics -- `GET /summarization/task/{task_id}/status` - Task status -- `POST /summarization/task/{task_id}/cancel` - Cancel task - -#### Knowledge Graph & Role Management -- `GET /rolegraph` - Get role graph visualization -- `GET /roles/{role_name}/kg_search` - Search knowledge graph terms -- `GET /thesaurus/{role_name}` - Get role thesaurus -- `GET /autocomplete/{role_name}/{query}` - FST-based autocomplete - -#### LLM & Chat Features -- `POST /chat` - Chat completion with LLM -- `GET /openrouter/models` - List OpenRouter models (if feature enabled) - -#### Conversation Management -- `POST /conversations` - Create conversation -- `GET /conversations` - List conversations -- `GET /conversations/{id}` - Get specific conversation -- `POST /conversations/{id}/messages` - Add message -- `POST /conversations/{id}/context` - Add context -- `POST /conversations/{id}/search-context` - Add search results as context -- `PUT /conversations/{id}/context/{context_id}` - Update context -- `DELETE /conversations/{id}/context/{context_id}` - Delete context - -#### Workflow Management (Advanced) -- Various workflow endpoints via `workflows::create_router()` - -### Test Categories - -#### 1. Unit Tests -- **Purpose**: Test individual functions in isolation -- **Scope**: Request parsing, response formatting, validation logic -- **Implementation**: Direct function calls with mocked dependencies - -#### 2. Integration Tests -- **Purpose**: Test endpoint functionality with real dependencies -- **Scope**: HTTP request/response cycle, database interactions -- **Implementation**: Test server with actual storage backends - -#### 3. End-to-End Tests -- **Purpose**: Test complete user workflows -- **Scope**: Multi-step operations, cross-feature interactions -- **Implementation**: Browser automation or API sequence testing - -#### 4. Performance Tests -- **Purpose**: Validate performance under load -- **Scope**: Response times, concurrent requests, memory usage -- **Implementation**: Load testing with configurable concurrency - -#### 5. Security Tests -- **Purpose**: Validate security measures -- **Scope**: Input validation, authentication, rate limiting -- **Implementation**: Malicious input testing, penetration testing - -### Test Environment Setup - -#### Local Testing Environment -```bash -# Development server with test configuration -cargo run -p terraphim_server -- --role test --config test_config.json - -# Test database setup -export TEST_DB_PATH="/tmp/terraphim_test" -mkdir -p $TEST_DB_PATH -``` - -#### Containerized Testing -```dockerfile -# Dockerfile.test -FROM rust:1.70 -WORKDIR /app -COPY . . -RUN cargo build --release -EXPOSE 8080 -CMD ["./target/release/terraphim_server", "--role", "test"] -``` - -#### CI/CD Integration -```yaml -# .github/workflows/api-tests.yml -name: API Tests -on: [push, pull_request] -jobs: - api-tests: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v3 - - name: Run API Tests - run: cargo test -p terraphim_server --test api_test_suite -``` - -### Mock Server Strategy - -#### External Service Mocking -- **OpenRouter API**: Mock for chat completion and model listing -- **File System**: In-memory file system for document testing -- **Database**: SQLite in-memory for isolated tests -- **Network Services**: Mock HTTP servers for external integrations - -#### Mock Implementation -```rust -// Mock LLM client for testing -pub struct MockLLMClient { - responses: HashMap, -} - -impl MockLLMClient { - pub fn new() -> Self { - Self { - responses: HashMap::new(), - } - } - - pub fn add_response(&mut self, input_pattern: &str, response: &str) { - self.responses.insert(input_pattern.to_string(), response.to_string()); - } -} -``` - -### Data Validation - -#### Input Validation -- **Document Creation**: Validate required fields, content formats -- **Search Queries**: Validate query parameters, role names -- **Configuration**: Validate configuration schema compliance -- **Chat Messages**: Validate message formats, role assignments - -#### Output Validation -- **Response Schema**: Verify JSON structure compliance -- **Data Types**: Validate field types and formats -- **Status Codes**: Ensure appropriate HTTP status codes -- **Error Messages**: Validate error response formats - -#### Error Handling Tests -- **Missing Required Fields**: 400 Bad Request responses -- **Invalid Role Names**: 404 Not Found responses -- **Malformed JSON**: 400 Bad Request responses -- **Service Unavailability**: 503 Service Unavailable responses - -### Performance Testing - -#### Load Testing Scenarios -- **Concurrent Search**: 100 simultaneous search requests -- **Document Creation**: Batch document creation performance -- **Chat Completions**: LLM request handling under load -- **Configuration Updates**: Concurrent config modification testing - -#### Response Time Validation -```rust -// Performance benchmarks -const MAX_RESPONSE_TIME_MS: u64 = 1000; // 1 second for most endpoints -const SEARCH_TIMEOUT_MS: u64 = 5000; // 5 seconds for complex searches -const LLM_TIMEOUT_MS: u64 = 30000; // 30 seconds for LLM calls -``` - -#### Memory Usage Testing -- **Memory Leaks**: Monitor memory usage during extended tests -- **Document Storage**: Validate memory usage with large documents -- **Caching**: Test cache efficiency and memory management -- **Concurrent Load**: Memory usage under high concurrency - -### Security Testing - -#### Authentication & Authorization -- **Role-Based Access**: Test role-based functionality restrictions -- **API Key Validation**: Validate OpenRouter API key handling -- **Configuration Security**: Test sensitive configuration exposure - -#### Input Sanitization -- **SQL Injection**: Test for SQL injection vulnerabilities -- **XSS Prevention**: Validate input sanitization for web interfaces -- **Path Traversal**: Test file system access restrictions -- **Command Injection**: Validate command execution security - -#### Rate Limiting -- **Request Rate Limits**: Test rate limiting implementation -- **DDoS Protection**: Validate denial of service protection -- **Resource Limits**: Test resource usage restrictions - -## Implementation Plan - -### Step 1: Create Test Server Harness - -#### Test Server Infrastructure -```rust -// terraphim_server/tests/test_harness.rs -pub struct TestServer { - server: axum::Router, - client: reqwest::Client, - base_url: String, -} - -impl TestServer { - pub async fn new() -> Self { - let router = terraphim_server::build_router_for_tests().await; - let addr = "127.0.0.1:0".parse().unwrap(); - let listener = tokio::net::TcpListener::bind(addr).await.unwrap(); - let port = listener.local_addr().unwrap().port(); - - tokio::spawn(axum::serve(listener, router)); - - Self { - server: router, - client: reqwest::Client::new(), - base_url: format!("http://127.0.0.1:{}", port), - } - } - - pub async fn get(&self, path: &str) -> reqwest::Response { - self.client.get(&format!("{}{}", self.base_url, path)) - .send().await.unwrap() - } - - pub async fn post(&self, path: &str, body: &T) -> reqwest::Response { - self.client.post(&format!("{}{}", self.base_url, path)) - .json(body) - .send().await.unwrap() - } -} -``` - -#### Test Data Management -```rust -// terraphim_server/tests/fixtures.rs -pub struct TestFixtures { - documents: Vec, - roles: HashMap, -} - -impl TestFixtures { - pub fn sample_document() -> Document { - Document { - id: "test-doc-1".to_string(), - url: "file:///test/doc1.md".to_string(), - title: "Test Document".to_string(), - body: "# Test Document\n\nThis is a test document for API validation.".to_string(), - description: Some("A test document for validation".to_string()), - summarization: None, - stub: None, - tags: Some(vec!["test".to_string(), "api".to_string()]), - rank: Some(1.0), - source_haystack: None, - } - } - - pub fn test_role() -> Role { - Role { - name: RoleName::new("TestRole"), - shortname: Some("test".to_string()), - relevance_function: RelevanceFunction::TitleScorer, - theme: "default".to_string(), - kg: None, - haystacks: vec![], - terraphim_it: false, - ..Default::default() - } - } -} -``` - -#### Request/Response Validation Framework -```rust -// terraphim_server/tests/validation.rs -pub trait ResponseValidator { - fn validate_status(&self, expected: StatusCode) -> &Self; - fn validate_json_schema(&self) -> T; - fn validate_error_response(&self) -> Option; -} - -impl ResponseValidator for reqwest::Response { - fn validate_status(&self, expected: StatusCode) -> &Self { - assert_eq!(self.status(), expected, "Expected status {}, got {}", expected, self.status()); - self - } - - fn validate_json_schema(&self) -> T { - self.json().await.unwrap_or_else(|e| { - panic!("Failed to parse JSON response: {}", e); - }) - } - - fn validate_error_response(&self) -> Option { - if !self.status().is_success() { - Some(self.text().await.unwrap_or_default()) - } else { - None - } - } -} -``` - -### Step 2: Implement API Endpoint Tests - -#### Health Check Tests -```rust -// terraphim_server/tests/health_tests.rs -#[tokio::test] -async fn test_health_check() { - let server = TestServer::new().await; - - let response = server.get("/health").await; - - response - .validate_status(StatusCode::OK) - .text() - .await - .map(|body| assert_eq!(body, "OK")); -} -``` - -#### Document Management Tests -```rust -// terraphim_server/tests/document_tests.rs -#[tokio::test] -async fn test_create_document() { - let server = TestServer::new().await; - let document = TestFixtures::sample_document(); - - let response = server.post("/documents", &document).await; - - response.validate_status(StatusCode::OK); - - let create_response: CreateDocumentResponse = response.validate_json_schema(); - assert_eq!(create_response.status, Status::Success); - assert!(!create_response.id.is_empty()); -} - -#[tokio::test] -async fn test_search_documents_get() { - let server = TestServer::new().await; - let query = SearchQuery { - query: "test".to_string(), - role: None, - limit: Some(10), - offset: Some(0), - }; - - let response = server.get(&format!("/documents/search?query={}&limit={}&offset={}", - query.query, query.limit.unwrap(), query.offset.unwrap())).await; - - response.validate_status(StatusCode::OK); - - let search_response: SearchResponse = response.validate_json_schema(); - assert_eq!(search_response.status, Status::Success); -} - -#[tokio::test] -async fn test_search_documents_post() { - let server = TestServer::new().await; - let query = SearchQuery { - query: "test".to_string(), - role: None, - limit: Some(10), - offset: Some(0), - }; - - let response = server.post("/documents/search", &query).await; - - response.validate_status(StatusCode::OK); - - let search_response: SearchResponse = response.validate_json_schema(); - assert_eq!(search_response.status, Status::Success); -} -``` - -#### Configuration Management Tests -```rust -// terraphim_server/tests/config_tests.rs -#[tokio::test] -async fn test_get_config() { - let server = TestServer::new().await; - - let response = server.get("/config").await; - - response.validate_status(StatusCode::OK); - - let config_response: ConfigResponse = response.validate_json_schema(); - assert_eq!(config_response.status, Status::Success); -} - -#[tokio::test] -async fn test_update_config() { - let server = TestServer::new().await; - let mut config = TestFixtures::test_config(); - config.global_shortcut = "Ctrl+Shift+T".to_string(); - - let response = server.post("/config", &config).await; - - response.validate_status(StatusCode::OK); - - let config_response: ConfigResponse = response.validate_json_schema(); - assert_eq!(config_response.status, Status::Success); - assert_eq!(config_response.config.global_shortcut, "Ctrl+Shift+T"); -} -``` - -#### Summarization Tests -```rust -// terraphim_server/tests/summarization_tests.rs -#[tokio::test] -async fn test_summarize_document() { - let server = TestServer::new().await; - let request = SummarizeDocumentRequest { - document_id: "test-doc-1".to_string(), - role: "TestRole".to_string(), - max_length: Some(250), - force_regenerate: Some(true), - }; - - let response = server.post("/documents/summarize", &request).await; - - // Check if OpenRouter feature is enabled - if cfg!(feature = "openrouter") { - response.validate_status(StatusCode::OK); - let summary_response: SummarizeDocumentResponse = response.validate_json_schema(); - assert_eq!(summary_response.status, Status::Success); - assert!(summary_response.summary.is_some()); - } else { - response.validate_status(StatusCode::OK); - let summary_response: SummarizeDocumentResponse = response.validate_json_schema(); - assert_eq!(summary_response.status, Status::Error); - assert!(summary_response.error.unwrap().contains("OpenRouter feature not enabled")); - } -} - -#[tokio::test] -async fn test_async_summarize_document() { - let server = TestServer::new().await; - let request = AsyncSummarizeRequest { - document_id: "test-doc-1".to_string(), - role: "TestRole".to_string(), - priority: Some("normal".to_string()), - max_length: Some(250), - force_regenerate: Some(true), - callback_url: None, - }; - - let response = server.post("/documents/async_summarize", &request).await; - - response.validate_status(StatusCode::OK); - - let async_response: AsyncSummarizeResponse = response.validate_json_schema(); - assert!(matches!(async_response.status, Status::Success | Status::Error)); -} -``` - -#### LLM Chat Tests -```rust -// terraphim_server/tests/chat_tests.rs -#[tokio::test] -async fn test_chat_completion() { - let server = TestServer::new().await; - let request = ChatRequest { - role: "TestRole".to_string(), - messages: vec![ - ChatMessage { - role: "user".to_string(), - content: "Hello, can you help me with testing?".to_string(), - } - ], - model: None, - conversation_id: None, - max_tokens: Some(100), - temperature: Some(0.7), - }; - - let response = server.post("/chat", &request).await; - - response.validate_status(StatusCode::OK); - - let chat_response: ChatResponse = response.validate_json_schema(); - - // Response may be successful or error depending on LLM configuration - match chat_response.status { - Status::Success => { - assert!(chat_response.message.is_some()); - assert!(chat_response.model_used.is_some()); - } - Status::Error => { - assert!(chat_response.error.is_some()); - } - _ => panic!("Unexpected status: {:?}", chat_response.status), - } -} -``` - -### Step 3: Add Integration Test Scenarios - -#### Multi-Server Communication Tests -```rust -// terraphim_server/tests/integration/multi_server_tests.rs -#[tokio::test] -async fn test_cross_server_document_sync() { - let server1 = TestServer::new().await; - let server2 = TestServer::new().await; - - // Create document on server 1 - let document = TestFixtures::sample_document(); - let response1 = server1.post("/documents", &document).await; - let create_response: CreateDocumentResponse = response1.validate_json_schema(); - - // Verify document exists on server 2 (if sharing is enabled) - let response2 = server2.get(&format!("/documents/search?query={}", document.id)).await; - let search_response: SearchResponse = response2.validate_json_schema(); - - assert_eq!(search_response.status, Status::Success); - assert!(search_response.results.iter().any(|d| d.id == document.id)); -} -``` - -#### Database Integration Tests -```rust -// terraphim_server/tests/integration/database_tests.rs -#[tokio::test] -async fn test_persistence_integration() { - let server = TestServer::new().await; - - // Create document - let document = TestFixtures::sample_document(); - let response = server.post("/documents", &document).await; - let create_response: CreateDocumentResponse = response.validate_json_schema(); - - // Restart server (simulate crash recovery) - drop(server); - let server = TestServer::new().await; - - // Verify document persistence - let response = server.get(&format!("/documents/search?query={}", document.id)).await; - let search_response: SearchResponse = response.validate_json_schema(); - - assert_eq!(search_response.status, Status::Success); - assert!(search_response.results.iter().any(|d| d.id == document.id)); -} -``` - -#### External API Integration Tests -```rust -// terraphim_server/tests/integration/external_api_tests.rs -#[tokio::test] -#[cfg(feature = "openrouter")] -async fn test_openrouter_integration() { - let server = TestServer::new().await; - - // Test model listing - let request = OpenRouterModelsRequest { - role: "TestRole".to_string(), - api_key: None, // Use environment variable - }; - - let response = server.post("/openrouter/models", &request).await; - - if std::env::var("OPENROUTER_KEY").is_ok() { - response.validate_status(StatusCode::OK); - let models_response: OpenRouterModelsResponse = response.validate_json_schema(); - assert_eq!(models_response.status, Status::Success); - assert!(!models_response.models.is_empty()); - } else { - response.validate_status(StatusCode::OK); - let models_response: OpenRouterModelsResponse = response.validate_json_schema(); - assert_eq!(models_response.status, Status::Error); - assert!(models_response.error.unwrap().contains("OpenRouter API key")); - } -} -``` - -### Step 4: Performance and Load Testing - -#### Concurrent Request Testing -```rust -// terraphim_server/tests/performance/concurrent_tests.rs -#[tokio::test] -async fn test_concurrent_search_requests() { - let server = TestServer::new().await; - let client = reqwest::Client::new(); - - let mut handles = Vec::new(); - - // Spawn 100 concurrent search requests - for i in 0..100 { - let client = client.clone(); - let base_url = server.base_url.clone(); - - let handle = tokio::spawn(async move { - let start = std::time::Instant::now(); - - let response = client - .get(&format!("{}/documents/search?query=test{}", base_url, i)) - .send() - .await - .unwrap(); - - let duration = start.elapsed(); - - assert_eq!(response.status(), StatusCode::OK); - - duration - }); - - handles.push(handle); - } - - // Wait for all requests and collect response times - let durations: Vec<_> = futures::future::join_all(handles) - .await - .into_iter() - .collect::, _>>() - .unwrap(); - - // Validate performance requirements - let avg_duration = durations.iter().sum::() / durations.len() as u32; - assert!(avg_duration < std::time::Duration::from_millis(1000), - "Average response time {} exceeds 1000ms", avg_duration.as_millis()); - - let max_duration = durations.iter().max().unwrap(); - assert!(max_duration < std::time::Duration::from_millis(5000), - "Maximum response time {} exceeds 5000ms", max_duration.as_millis()); -} -``` - -#### Memory Usage Testing -```rust -// terraphim_server/tests/performance/memory_tests.rs -#[tokio::test] -async fn test_memory_usage_under_load() { - let server = TestServer::new().await; - - // Get initial memory usage - let initial_memory = get_memory_usage(); - - // Create many documents - for i in 0..1000 { - let mut document = TestFixtures::sample_document(); - document.id = format!("test-doc-{}", i); - document.title = format!("Test Document {}", i); - document.body = format!("Content for document {}", i); - - let response = server.post("/documents", &document).await; - response.validate_status(StatusCode::OK); - } - - // Perform many searches - for i in 0..1000 { - let response = server.get(&format!("/documents/search?query=test-doc-{}", i)).await; - response.validate_status(StatusCode::OK); - } - - // Check memory usage after operations - let final_memory = get_memory_usage(); - let memory_increase = final_memory - initial_memory; - - // Memory increase should be reasonable (less than 100MB) - assert!(memory_increase < 100 * 1024 * 1024, - "Memory increase {} bytes exceeds 100MB limit", memory_increase); -} - -fn get_memory_usage() -> usize { - // Implementation for getting current memory usage - // This would typically use platform-specific APIs - 0 // Placeholder -} -``` - -#### Large Dataset Processing -```rust -// terraphim_server/tests/performance/large_dataset_tests.rs -#[tokio::test] -async fn test_large_document_processing() { - let server = TestServer::new().await; - - // Create a large document (1MB) - let mut large_content = String::new(); - for i in 0..10000 { - large_content.push_str(&format!("Line {}: This is a large document for performance testing.\n", i)); - } - - let large_document = Document { - id: "large-doc-1".to_string(), - url: "file:///test/large.md".to_string(), - title: "Large Test Document".to_string(), - body: large_content, - description: Some("A large document for performance testing".to_string()), - summarization: None, - stub: None, - tags: Some(vec!["large".to_string(), "test".to_string()]), - rank: Some(1.0), - source_haystack: None, - }; - - // Test creation of large document - let start = std::time::Instant::now(); - let response = server.post("/documents", &large_document).await; - let creation_time = start.elapsed(); - - response.validate_status(StatusCode::OK); - assert!(creation_time < std::time::Duration::from_secs(5), - "Large document creation took {} seconds", creation_time.as_secs()); - - // Test searching for large document - let start = std::time::Instant::now(); - let response = server.get("/documents/search?query=large").await; - let search_time = start.elapsed(); - - response.validate_status(StatusCode::OK); - assert!(search_time < std::time::Duration::from_secs(3), - "Large document search took {} seconds", search_time.as_secs()); -} -``` - -## Test Cases - -### Happy Path Tests - -#### Document Creation Success -```rust -#[tokio::test] -async fn test_create_document_success() { - let server = TestServer::new().await; - let document = TestFixtures::sample_document(); - - let response = server.post("/documents", &document).await; - - response.validate_status(StatusCode::OK); - - let create_response: CreateDocumentResponse = response.validate_json_schema(); - assert_eq!(create_response.status, Status::Success); - assert!(!create_response.id.is_empty()); -} -``` - -#### Search Query Success -```rust -#[tokio::test] -async fn test_search_query_success() { - let server = TestServer::new().await; - - // First create a document - let document = TestFixtures::sample_document(); - server.post("/documents", &document).await.validate_status(StatusCode::OK); - - // Then search for it - let response = server.get("/documents/search?query=Test").await; - - response.validate_status(StatusCode::OK); - - let search_response: SearchResponse = response.validate_json_schema(); - assert_eq!(search_response.status, Status::Success); - assert!(!search_response.results.is_empty()); - assert!(search_response.results.iter().any(|d| d.title.contains("Test"))); -} -``` - -### Error Handling Tests - -#### Missing Required Fields -```rust -#[tokio::test] -async fn test_create_document_missing_required_fields() { - let server = TestServer::new().await; - - let mut incomplete_document = TestFixtures::sample_document(); - incomplete_document.id = "".to_string(); // Missing required ID - - let response = server.post("/documents", &incomplete_document).await; - - response.validate_status(StatusCode::BAD_REQUEST); - - let error_text = response.text().await.unwrap(); - assert!(error_text.contains("error") || error_text.contains("invalid")); -} -``` - -#### Invalid Role Names -```rust -#[tokio::test] -async fn test_invalid_role_name() { - let server = TestServer::new().await; - - let response = server.get("/thesaurus/NonExistentRole").await; - - response.validate_status(StatusCode::NOT_FOUND); - - let thesaurus_response: ThesaurusResponse = response.validate_json_schema(); - assert_eq!(thesaurus_response.status, Status::Error); - assert!(thesaurus_response.error.unwrap().contains("not found")); -} -``` - -#### Malformed JSON -```rust -#[tokio::test] -async fn test_malformed_json_request() { - let server = TestServer::new().await; - let client = reqwest::Client::new(); - - let response = client - .post(&format!("{}/documents", server.base_url)) - .header("Content-Type", "application/json") - .body("{ invalid json }") - .send() - .await - .unwrap(); - - response.validate_status(StatusCode::BAD_REQUEST); -} -``` - -### Edge Case Tests - -#### Boundary Conditions -```rust -#[tokio::test] -async fn test_empty_search_query() { - let server = TestServer::new().await; - - let response = server.get("/documents/search?query=").await; - - // Should handle empty query gracefully - response.validate_status(StatusCode::OK); - - let search_response: SearchResponse = response.validate_json_schema(); - assert_eq!(search_response.status, Status::Success); -} -``` - -#### Special Characters -```rust -#[tokio::test] -async fn test_search_with_special_characters() { - let server = TestServer::new().await; - - let special_chars = "!@#$%^&*()_+-=[]{}|;':\",./<>?"; - let response = server.get(&format!("/documents/search?query={}", - urlencoding::encode(special_chars))).await; - - response.validate_status(StatusCode::OK); - - let search_response: SearchResponse = response.validate_json_schema(); - assert_eq!(search_response.status, Status::Success); -} -``` - -#### Maximum Length Values -```rust -#[tokio::test] -async fn test_maximum_document_length() { - let server = TestServer::new().await; - - let mut large_document = TestFixtures::sample_document(); - // Create a document with maximum reasonable size - large_document.body = "x".repeat(1_000_000); // 1MB document - - let response = server.post("/documents", &large_document).await; - - // Should either succeed or fail gracefully - match response.status() { - StatusCode::OK => { - let create_response: CreateDocumentResponse = response.validate_json_schema(); - assert_eq!(create_response.status, Status::Success); - } - StatusCode::BAD_REQUEST => { - // Should fail with a clear error message - let error_text = response.text().await.unwrap(); - assert!(error_text.contains("too large") || error_text.contains("limit")); - } - _ => panic!("Unexpected status code: {}", response.status()), - } -} -``` - -### Security Tests - -#### SQL Injection Prevention -```rust -#[tokio::test] -async fn test_sql_injection_prevention() { - let server = TestServer::new().await; - - let malicious_query = "'; DROP TABLE documents; --"; - let response = server.get(&format!("/documents/search?query={}", - urlencoding::encode(malicious_query))).await; - - // Should handle malicious input safely - response.validate_status(StatusCode::OK); - - let search_response: SearchResponse = response.validate_json_schema(); - assert_eq!(search_response.status, Status::Success); - - // Verify no documents were actually deleted - let normal_response = server.get("/documents/search?query=test").await; - normal_response.validate_status(StatusCode::OK); -} -``` - -#### XSS Prevention -```rust -#[tokio::test] -async fn test_xss_prevention() { - let server = TestServer::new().await; - - let mut malicious_document = TestFixtures::sample_document(); - malicious_document.title = "".to_string(); - malicious_document.body = "Document content with malicious content".to_string(); - - let response = server.post("/documents", &malicious_document).await; - - response.validate_status(StatusCode::OK); - - let create_response: CreateDocumentResponse = response.validate_json_schema(); - assert_eq!(create_response.status, Status::Success); - - // Search for the document and verify XSS is sanitized - let search_response = server.get(&format!("/documents/search?query={}", - urlencoding::encode(&malicious_document.title))).await; - - search_response.validate_status(StatusCode::OK); - - let search_result: SearchResponse = search_response.validate_json_schema(); - - // Check that script tags are properly escaped or removed - if let Some(found_doc) = search_result.results.first() { - assert!(!found_doc.title.contains("".to_string(); - malicious_document.body = "Content with ".to_string(); - - let response = server.post("/documents", &malicious_document).await; - - response.validate_status(StatusCode::OK); - - let create_response: CreateDocumentResponse = response.validate_json_schema(); - assert_eq!(create_response.status, Status::Success); - - // Verify XSS is sanitized - let search_response = server.get(&format!("/documents/search?query={}", - urlencoding::encode(&malicious_document.title))).await; - - search_response.validate_status(StatusCode::OK); - - let search_result: SearchResponse = search_response.validate_json_schema(); - if let Some(found_doc) = search_result.results.first() { - assert!(!found_doc.title.contains("