Skip to content

Latest commit

 

History

History
538 lines (415 loc) · 21.5 KB

File metadata and controls

538 lines (415 loc) · 21.5 KB

KITT Project Status Report

Document Type: Comprehensive Project Status Report Generated: January 2026 Project: KITTY (Knowledgeable Intelligent Tool-using Tabletop Yoda)


Executive Summary

KITT is a sophisticated offline-first, voice-enabled fabrication lab orchestrator running on Mac Studio M3 Ultra. The system integrates local AI inference with 3D printing, CNC control, and smart home automation.

Project Health Overview

Dimension Status Confidence
Core Query/Chat Pipeline ✅ Production-Ready High
Fabrication Workflow ⚠️ Partially Complete Medium
Multi-Model Collective 🔄 Experimental Low-Medium
Voice Interface ⚠️ Functional but Undocumented Medium
Research Graph 🔄 In Development Low
Infrastructure ✅ Solid Foundation High
UI/UX ✅ Well-Architected High
Testing ⚠️ Gaps in E2E/UI Medium

1. Architecture Overview

1.1 Service Topology

KITT operates as a microservices architecture with 15+ independent services:

┌──────────────────────────────────────────────────────────────────────┐
│                         CLIENT LAYER                                  │
├──────────────┬──────────────┬─────────────────┬─────────────────────┤
│  React UI    │  CLI/TUI     │  Voice Service  │   kitty-code TUI    │
│  (4173)      │  (kitty-cli) │  (8400/8550)    │   (Textual)         │
└──────────────┴──────────────┴─────────────────┴─────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────────┐
│                      GATEWAY LAYER (HAProxy :8080)                    │
│  • 3 gateway replicas, round-robin                                    │
│  • WebSocket sticky sessions, SSE streaming                           │
│  • Stats dashboard on :8404                                           │
└──────────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────────┐
│                      ORCHESTRATION LAYER                              │
├─────────────────────────────┬────────────────────────────────────────┤
│    Brain Service (:8000)    │  • LLM routing (confidence-based)      │
│                             │  • ReAct agent (max 10 iterations)     │
│                             │  • Tool execution via MCP              │
│                             │  • Research graph (LangGraph)          │
│                             │  • Conversation management             │
└─────────────────────────────┴────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────────┐
│                      DOMAIN SERVICES                                  │
├────────────────┬───────────────┬───────────────┬─────────────────────┤
│ Fabrication    │ CAD           │ Discovery     │ Other Services      │
│ (:8300)        │ (:8200)       │ (:8500)       │                     │
│ • Printers     │ • Zoo/Tripo   │ • mDNS/SSDP   │ • broker (:8777)    │
│ • Slicing      │ • CadQuery    │ • Bambu UDP   │ • images (:8600)    │
│ • Outcomes     │ • Artifacts   │ • ARP scan    │ • mem0-mcp (:8765)  │
└────────────────┴───────────────┴───────────────┴─────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────────┐
│                      INFRASTRUCTURE LAYER                             │
├────────────┬──────────┬─────────┬─────────┬────────────┬─────────────┤
│ PostgreSQL │ Redis    │ Qdrant  │ MinIO   │ RabbitMQ   │ Mosquitto   │
│ (:5432)    │ (:6379)  │ (:6333) │ (:9000) │ (AMQP)     │ (:1883)     │
│ State      │ Cache    │ Vectors │ S3      │ Messaging  │ MQTT        │
└────────────┴──────────┴─────────┴─────────┴────────────┴─────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────────┐
│                      LLM INFERENCE LAYER                              │
├────────────────────────────────┬─────────────────────────────────────┤
│    Ollama (:11434)             │    llama.cpp Servers                │
│    • GPT-OSS 120B (reasoner)   │    • :8083 Athene V2 Q4 (tools)     │
│    • 128K context              │    • :8084 Hermes 3 8B (summary)    │
│                                │    • :8086 Gemma 3 27B (vision)     │
│                                │    • :8087 Devstral 2 123B (coder)  │
└────────────────────────────────┴─────────────────────────────────────┘

1.2 Key Technology Stack

Layer Technology Version
Frontend React + Vite + TypeScript 18.3.1
UI Components Radix UI (shadcn/ui) Latest
Styling Tailwind CSS v4 Latest
3D Rendering Three.js 0.172.0
Backend FastAPI + Pydantic Latest
LLM (Primary) Ollama (GPT-OSS 120B) Latest
LLM (Specialized) llama.cpp (GGUF sharded) Latest
Database PostgreSQL 16 16.x
Vector DB Qdrant 1.11.0
Message Queue RabbitMQ Latest
MQTT Broker Eclipse Mosquitto 2.0
Load Balancer HAProxy Latest
Observability Prometheus + Grafana + Loki 2.53/10.4/3.0

2. Feature Status by Domain

2.1 Core Query Pipeline ✅

Status: Production-Ready

The central chat/query flow is fully operational:

User Query → Gateway → Brain Router → LLM Selection → Response
                              ↓
              • Local (free): llama.cpp multi-server
              • MCP (cheap): Perplexity search-augmented
              • Frontier (expensive): OpenAI/Anthropic

Routing Logic:

  • Confidence-based model selection with 3 tiers
  • Semantic/keyword tool selection
  • Vision pipeline for image understanding
  • ReAct agent with max 10 iterations
  • Fallback cascade: local → MCP → frontier

Files:

  • services/brain/routing/router.py - Main routing engine
  • services/brain/llm_client.py - Provider registry
  • services/brain/tools/mcp_client.py - MCP tool execution

2.2 Fabrication Workflow ⚠️

Status: Partially Complete (Core works, integrations pending)

5-Step Progressive Workflow:

Step Name Status Notes
1 Generate Zoo/Tripo/CadQuery providers working
2 Orient Rotation matrix optimization
3 Segment Dimension checking, joint types
4 Slice ⚠️ CuraEngine integration needs profiles
5 Print ⚠️ Bambu Cloud works, local queue partial

Known Incomplete Integrations (from TODOs in code):

  • MinIO client not wired for snapshot uploads
  • MQTT client not fully integrated for Bambu telemetry
  • Material spool auto-selection not implemented
  • Camera capture MQTT pattern incomplete
  • Outcome tracking missing print success metrics

Files:

  • services/fabrication/app.py - Service entrypoint
  • services/ui/src/pages/FabricationConsole/ - React workflow
  • config/slicer_profiles/ - CuraEngine configurations

2.3 Tiered Collective Architecture 🔄

Status: Experimental (Recently Developed, 6 commits in active branch)

This is a multi-model orchestration system implementing Senior/Junior delegation:

User Request
    ↓
ComplexityRouter (pattern matching + confidence)
    ↓
    ├─→ Trivial patterns? → Direct execution (skip collective)
    └─→ Complex task → Collective Orchestration
          ↓
          Planner (Devstral 2 123B) - Strategic decomposition
          ↓
          For each step:
              ├─→ Executor (Devstral Small 2 24B) - Fast implementation
              ├─→ Judge (Devstral 2 123B) - Validation & mentorship
              └─→ Loop: APPROVE → next step | REVISE → retry
          ↓
          Complete

Model Allocation:

Role Model Speed Purpose
Planner Devstral 2 123B ~5 tok/s Task decomposition
Executor Devstral Small 2 24B ~20-25 tok/s Code generation
Judge Devstral 2 123B ~5 tok/s Quality assurance

Trade-off: 24B Executor provides ~4-5x speedup with only 4% accuracy loss (68% vs 72% SWE-bench)

State Machine:

IDLE → ROUTING → PLANNING → EXECUTING → JUDGING → COMPLETE
                                            ↓
                              ├─→ REVISING (loop back)
                              └─→ FALLBACK (graceful degradation)

Configuration (~/.kitty-code/config.toml):

[collective]
enabled = false  # Disabled by default
planner_model = "local"
executor_model = "executor"
max_revision_cycles = 20
teaching_mode = true

Known Issues:

  1. JSON parsing fragility - _repair_truncated_json() workaround
  2. Tool execution in collective doesn't write files
  3. Plan caching not implemented despite config flag
  4. Context blinding filter may be too aggressive

Files:

  • services/kitty-code/src/kitty_code/core/collective/ - Module
  • services/kitty-code/docs/COLLECTIVE_ARCHITECTURE.md - Documentation

2.4 Research Graph 🔄

Status: In Development (Debug logging still present)

LangGraph-based research automation:

  • Multi-tool researcher agent
  • Claim extraction and verification
  • Source retrieval and summarization
  • PostgreSQL checkpointing

Known Issues:

  • 15+ DEBUG statements scattered throughout
  • Session model synthesis not fully wired
  • Topic-based research not implemented
  • Budget manager token tracking manual

Files:

  • services/brain/research/graph/ - LangGraph implementation
  • services/brain/research/scheduler.py - Job scheduling

2.5 kitty-code TUI ✅

Status: Functional with Active Development

Textual-based coding assistant with:

  • 5 agent modes (DEFAULT, PLAN, ACCEPT_EDITS, AUTO_APPROVE, AUTO_ITERATE)
  • MCP tool discovery and integration
  • Middleware pipeline for focus preservation
  • Session persistence and resumption

Agent Modes:

Mode Safety Behavior
DEFAULT Neutral Approval required
PLAN Safe Read-only exploration
ACCEPT_EDITS Destructive Auto-approve file edits
AUTO_APPROVE YOLO Auto-approve all
AUTO_ITERATE YOLO Loop until complete

Key Middleware:

  • TaskInjectionMiddleware - Focus preservation (injects plan/todos)
  • CompletionCheckMiddleware - Ralph-Wiggum pattern (prevents premature stop)
  • ModelRoutingMiddleware - Routes to collective or direct execution

Files:

  • services/kitty-code/src/kitty_code/ - Main package
  • ~/.kitty-code/config.toml - Configuration

2.6 React UI ✅

Status: Well-Architected (2.2MB codebase, 32 test files)

Key Pages:

  • FabricationConsole/ - 5-step printing workflow
  • Dashboard/ - Device management hub
  • ResearchHub/ - Research interface
  • MediaHub/ - Stable Diffusion UI
  • Settings/ - System configuration

Component Library:

  • shadcn/ui primitives (Radix UI)
  • VoiceAssistant components with Zustand state
  • Three.js-based 3D viewers
  • Glassmorphism design system

Test Coverage: 32 Vitest test files (but FabricationConsole untested)


3. Infrastructure Assessment

3.1 Docker Compose

15+ services on bridged kitty network:

  • 9 named volumes for persistence
  • Health checks on key services (30s interval)
  • Environment template via YAML anchors

Missing:

  • RabbitMQ in separate compose file (confusing setup)
  • No database clustering in main config
  • No automated backup strategy

3.2 Startup Sequence

8-phase orchestration in ops/scripts/start-all.sh:

  1. LLM Servers (Ollama + llama.cpp)
  2. LLM Health Checks (10-min timeout)
  3. Docker Services
  4. Service Validation
  5. API Health Checks
  6. Images Service (non-critical)
  7. Voice Service (non-critical)
  8. HexStrike Security Tools (optional)

Startup time: ~5-10 minutes (Ollama model loading dominates)

3.3 Load Balancing

HAProxy configuration:

  • 3 gateway replicas, round-robin
  • WebSocket sticky sessions (source hash)
  • SSE streaming support
  • Stats dashboard on :8404
  • 30-minute timeouts, 1-hour tunnel

Missing:

  • Circuit breaker pattern
  • Rate limiting
  • Backend degradation handling

4. Improvement Opportunities

4.1 Critical Priority

Issue Location Impact Recommendation
Fabrication integrations incomplete services/fabrication/ Blocks 3D print outcomes Wire MinIO + MQTT clients
No database backup automation infra/compose/ Data loss risk Implement pg_dump cron job
FabricationConsole untested services/ui/ Regression risk Add Vitest coverage
JSON parsing fragility collective/orchestrator.py Collective failures Use structured output mode

4.2 High Priority

Issue Location Impact Recommendation
Research graph debug logging brain/research/ Performance impact Clean up DEBUG statements
Tool execution duplicated Router + Agent Code maintenance Consolidate tool paths
Async/sync mismatch brain/llm_client.py Performance hit Remove deprecated sync wrapper
Service discovery hardcoded common/service_manager/ Scaling limitation Implement service registry
No E2E test suite tests/ Integration bugs Add Playwright tests

4.3 Medium Priority

Issue Location Impact Recommendation
RabbitMQ separate compose infra/compose/ Setup confusion Merge into main compose
Stats password in env HAProxy config Security exposure RESOLVED - Now uses envsubst templating via docker-entrypoint.sh
Slicer profiles hardcoded config/slicer_profiles/ User friction Add dynamic loading from UI
Tool execution in collective collective/ Plans don't modify files Wire tool_executor callback
Plan caching not implemented collective/config.py Redundant computation Cache by request hash
Vision pipeline cleanup brain/routing/ Memory leaks Add download cleanup

4.4 Low Priority

Issue Location Impact Recommendation
Vendor lookup incomplete discovery/ Minor discovery gaps Expand OUI database
No OpenAPI schema Gateway Client integration Auto-generate from FastAPI
Voice service undocumented services/voice/ Operator confusion Add setup guide
Context blinding aggressive collective/prompts.py May lose context Review filter patterns

5. Security Considerations

5.1 Current Measures

  • Broker Service: Command allow-list, privilege dropping
  • SafetyChecker: Confirmation phrases for hazard tools
  • Permission Manager: User approval flow for tools
  • HexStrike: 151 security assessment tools (optional)

5.2 Gaps

Gap Severity Mitigation
HAProxy stats password in plaintext Medium RESOLVED - Uses envsubst templating in docker-entrypoint.sh
No rate limiting on gateway Medium Add HAProxy rate limits
Broker has no audit trail encryption Low Encrypt audit logs
Tool confirmation not enforced everywhere Medium Audit all code paths

6. Performance Characteristics

6.1 LLM Inference

Model Port Tokens/sec Context
GPT-OSS 120B 11434 ~3-5 128K
Athene V2 Q4 8083 ~15-20 128K
Hermes 3 8B 8084 ~30-40 8K
Gemma 3 27B 8086 ~10-15 8K
Devstral 2 123B 8087 ~5-8 128K

6.2 Bottlenecks

  1. Startup: 5-10 minutes (Ollama model loading)
  2. Semantic caching: Disabled by default (SentenceTransformer slow)
  3. Tool selection: Loads ALL tools if semantic mode enabled
  4. Research checkpointing: No batch writes, database contention

7. Testing Coverage

7.1 Current State

Category Files Coverage
Unit (Python) 20+ Core logic covered
Integration (Python) 15+ Major workflows
Unit (React) 32 Components + hooks
E2E 0 Gap
Visual Regression 0 Gap
Load Testing 0 Gap

7.2 Missing Coverage

  • FabricationConsole page (zero tests)
  • WebSocket/MQTT real-time features
  • HAProxy failover scenarios
  • Voice service integration
  • Collective architecture edge cases

8. Documentation Status

8.1 Existing

Document Size Coverage
README.md 46.6KB Project overview
KITTY_OperationsManual.md 42.1KB Operations guide
CLAUDE.md 14.8KB AI assistant instructions
COLLECTIVE_ARCHITECTURE.md ~8KB Multi-model orchestration

8.2 Missing

  • OpenAPI/AsyncAPI schema
  • Database migration guide
  • Troubleshooting playbook
  • Voice service setup
  • RabbitMQ message schema

9. Active Development Areas

9.1 Recent Commits (Collective Architecture)

  1. 21e1310 - Preserve middleware metadata in pipeline aggregation
  2. 792e3ad - Add Senior/Junior delegation pattern
  3. 0edf44f - Implement event streaming and tool execution
  4. 580d529 - Implement collective middleware
  5. d68804a - Backfill tool_calls, optimize Ralph-Wiggum
  6. 98d8dec - Add JSON embedded tool call parsing

9.2 In-Progress Features

  • Tiered Collective Architecture (experimental)
  • Research graph automation
  • Tool execution integration in collective
  • Plan caching implementation

10. Recommendations Summary

Immediate Actions (This Sprint)

  1. Wire MinIO client in fabrication service
  2. Add FabricationConsole unit tests
  3. Clean up research graph debug logging
  4. Implement database backup automation

Near-Term (Next 2 Sprints)

  1. Complete tool execution in collective workflow
  2. Add E2E test suite with Playwright
  3. Merge RabbitMQ into main compose
  4. Generate OpenAPI schema from FastAPI

Medium-Term (Quarter)

  1. Implement service discovery pattern
  2. Add circuit breaker to HAProxy
  3. Complete research graph automation
  4. Document voice service integration

Appendix A: File Reference

Core Services

  • services/brain/ - Orchestrator (8000)
  • services/gateway/ - API gateway
  • services/fabrication/ - Printer control (8300)
  • services/cad/ - 3D model generation (8200)
  • services/voice/ - STT/TTS (8400)

kitty-code TUI

  • services/kitty-code/src/kitty_code/core/ - Core agent
  • services/kitty-code/src/kitty_code/core/collective/ - Multi-model
  • ~/.kitty-code/config.toml - Configuration

Infrastructure

  • infra/compose/docker-compose.yml - Docker services
  • infra/haproxy/haproxy.cfg - Load balancer
  • ops/scripts/start-all.sh - Startup orchestration

Configuration

  • .env.example - Environment template
  • config/tool_registry.yaml - Tool definitions
  • config/slicer_profiles/ - CuraEngine configs

End of Project Status Report