Document Type: Comprehensive Project Status Report Generated: January 2026 Project: KITTY (Knowledgeable Intelligent Tool-using Tabletop Yoda)
KITT is a sophisticated offline-first, voice-enabled fabrication lab orchestrator running on Mac Studio M3 Ultra. The system integrates local AI inference with 3D printing, CNC control, and smart home automation.
| Dimension | Status | Confidence |
|---|---|---|
| Core Query/Chat Pipeline | ✅ Production-Ready | High |
| Fabrication Workflow | Medium | |
| Multi-Model Collective | 🔄 Experimental | Low-Medium |
| Voice Interface | Medium | |
| Research Graph | 🔄 In Development | Low |
| Infrastructure | ✅ Solid Foundation | High |
| UI/UX | ✅ Well-Architected | High |
| Testing | Medium |
KITT operates as a microservices architecture with 15+ independent services:
┌──────────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
├──────────────┬──────────────┬─────────────────┬─────────────────────┤
│ React UI │ CLI/TUI │ Voice Service │ kitty-code TUI │
│ (4173) │ (kitty-cli) │ (8400/8550) │ (Textual) │
└──────────────┴──────────────┴─────────────────┴─────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────────┐
│ GATEWAY LAYER (HAProxy :8080) │
│ • 3 gateway replicas, round-robin │
│ • WebSocket sticky sessions, SSE streaming │
│ • Stats dashboard on :8404 │
└──────────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────────┐
│ ORCHESTRATION LAYER │
├─────────────────────────────┬────────────────────────────────────────┤
│ Brain Service (:8000) │ • LLM routing (confidence-based) │
│ │ • ReAct agent (max 10 iterations) │
│ │ • Tool execution via MCP │
│ │ • Research graph (LangGraph) │
│ │ • Conversation management │
└─────────────────────────────┴────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────────┐
│ DOMAIN SERVICES │
├────────────────┬───────────────┬───────────────┬─────────────────────┤
│ Fabrication │ CAD │ Discovery │ Other Services │
│ (:8300) │ (:8200) │ (:8500) │ │
│ • Printers │ • Zoo/Tripo │ • mDNS/SSDP │ • broker (:8777) │
│ • Slicing │ • CadQuery │ • Bambu UDP │ • images (:8600) │
│ • Outcomes │ • Artifacts │ • ARP scan │ • mem0-mcp (:8765) │
└────────────────┴───────────────┴───────────────┴─────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────────┐
│ INFRASTRUCTURE LAYER │
├────────────┬──────────┬─────────┬─────────┬────────────┬─────────────┤
│ PostgreSQL │ Redis │ Qdrant │ MinIO │ RabbitMQ │ Mosquitto │
│ (:5432) │ (:6379) │ (:6333) │ (:9000) │ (AMQP) │ (:1883) │
│ State │ Cache │ Vectors │ S3 │ Messaging │ MQTT │
└────────────┴──────────┴─────────┴─────────┴────────────┴─────────────┘
↓
┌──────────────────────────────────────────────────────────────────────┐
│ LLM INFERENCE LAYER │
├────────────────────────────────┬─────────────────────────────────────┤
│ Ollama (:11434) │ llama.cpp Servers │
│ • GPT-OSS 120B (reasoner) │ • :8083 Athene V2 Q4 (tools) │
│ • 128K context │ • :8084 Hermes 3 8B (summary) │
│ │ • :8086 Gemma 3 27B (vision) │
│ │ • :8087 Devstral 2 123B (coder) │
└────────────────────────────────┴─────────────────────────────────────┘
| Layer | Technology | Version |
|---|---|---|
| Frontend | React + Vite + TypeScript | 18.3.1 |
| UI Components | Radix UI (shadcn/ui) | Latest |
| Styling | Tailwind CSS v4 | Latest |
| 3D Rendering | Three.js | 0.172.0 |
| Backend | FastAPI + Pydantic | Latest |
| LLM (Primary) | Ollama (GPT-OSS 120B) | Latest |
| LLM (Specialized) | llama.cpp (GGUF sharded) | Latest |
| Database | PostgreSQL 16 | 16.x |
| Vector DB | Qdrant | 1.11.0 |
| Message Queue | RabbitMQ | Latest |
| MQTT Broker | Eclipse Mosquitto | 2.0 |
| Load Balancer | HAProxy | Latest |
| Observability | Prometheus + Grafana + Loki | 2.53/10.4/3.0 |
Status: Production-Ready
The central chat/query flow is fully operational:
User Query → Gateway → Brain Router → LLM Selection → Response
↓
• Local (free): llama.cpp multi-server
• MCP (cheap): Perplexity search-augmented
• Frontier (expensive): OpenAI/Anthropic
Routing Logic:
- Confidence-based model selection with 3 tiers
- Semantic/keyword tool selection
- Vision pipeline for image understanding
- ReAct agent with max 10 iterations
- Fallback cascade: local → MCP → frontier
Files:
services/brain/routing/router.py- Main routing engineservices/brain/llm_client.py- Provider registryservices/brain/tools/mcp_client.py- MCP tool execution
Status: Partially Complete (Core works, integrations pending)
5-Step Progressive Workflow:
| Step | Name | Status | Notes |
|---|---|---|---|
| 1 | Generate | ✅ | Zoo/Tripo/CadQuery providers working |
| 2 | Orient | ✅ | Rotation matrix optimization |
| 3 | Segment | ✅ | Dimension checking, joint types |
| 4 | Slice | CuraEngine integration needs profiles | |
| 5 | Bambu Cloud works, local queue partial |
Known Incomplete Integrations (from TODOs in code):
- MinIO client not wired for snapshot uploads
- MQTT client not fully integrated for Bambu telemetry
- Material spool auto-selection not implemented
- Camera capture MQTT pattern incomplete
- Outcome tracking missing print success metrics
Files:
services/fabrication/app.py- Service entrypointservices/ui/src/pages/FabricationConsole/- React workflowconfig/slicer_profiles/- CuraEngine configurations
Status: Experimental (Recently Developed, 6 commits in active branch)
This is a multi-model orchestration system implementing Senior/Junior delegation:
User Request
↓
ComplexityRouter (pattern matching + confidence)
↓
├─→ Trivial patterns? → Direct execution (skip collective)
└─→ Complex task → Collective Orchestration
↓
Planner (Devstral 2 123B) - Strategic decomposition
↓
For each step:
├─→ Executor (Devstral Small 2 24B) - Fast implementation
├─→ Judge (Devstral 2 123B) - Validation & mentorship
└─→ Loop: APPROVE → next step | REVISE → retry
↓
Complete
Model Allocation:
| Role | Model | Speed | Purpose |
|---|---|---|---|
| Planner | Devstral 2 123B | ~5 tok/s | Task decomposition |
| Executor | Devstral Small 2 24B | ~20-25 tok/s | Code generation |
| Judge | Devstral 2 123B | ~5 tok/s | Quality assurance |
Trade-off: 24B Executor provides ~4-5x speedup with only 4% accuracy loss (68% vs 72% SWE-bench)
State Machine:
IDLE → ROUTING → PLANNING → EXECUTING → JUDGING → COMPLETE
↓
├─→ REVISING (loop back)
└─→ FALLBACK (graceful degradation)
Configuration (~/.kitty-code/config.toml):
[collective]
enabled = false # Disabled by default
planner_model = "local"
executor_model = "executor"
max_revision_cycles = 20
teaching_mode = trueKnown Issues:
- JSON parsing fragility -
_repair_truncated_json()workaround - Tool execution in collective doesn't write files
- Plan caching not implemented despite config flag
- Context blinding filter may be too aggressive
Files:
services/kitty-code/src/kitty_code/core/collective/- Moduleservices/kitty-code/docs/COLLECTIVE_ARCHITECTURE.md- Documentation
Status: In Development (Debug logging still present)
LangGraph-based research automation:
- Multi-tool researcher agent
- Claim extraction and verification
- Source retrieval and summarization
- PostgreSQL checkpointing
Known Issues:
- 15+ DEBUG statements scattered throughout
- Session model synthesis not fully wired
- Topic-based research not implemented
- Budget manager token tracking manual
Files:
services/brain/research/graph/- LangGraph implementationservices/brain/research/scheduler.py- Job scheduling
Status: Functional with Active Development
Textual-based coding assistant with:
- 5 agent modes (DEFAULT, PLAN, ACCEPT_EDITS, AUTO_APPROVE, AUTO_ITERATE)
- MCP tool discovery and integration
- Middleware pipeline for focus preservation
- Session persistence and resumption
Agent Modes:
| Mode | Safety | Behavior |
|---|---|---|
| DEFAULT | Neutral | Approval required |
| PLAN | Safe | Read-only exploration |
| ACCEPT_EDITS | Destructive | Auto-approve file edits |
| AUTO_APPROVE | YOLO | Auto-approve all |
| AUTO_ITERATE | YOLO | Loop until complete |
Key Middleware:
TaskInjectionMiddleware- Focus preservation (injects plan/todos)CompletionCheckMiddleware- Ralph-Wiggum pattern (prevents premature stop)ModelRoutingMiddleware- Routes to collective or direct execution
Files:
services/kitty-code/src/kitty_code/- Main package~/.kitty-code/config.toml- Configuration
Status: Well-Architected (2.2MB codebase, 32 test files)
Key Pages:
FabricationConsole/- 5-step printing workflowDashboard/- Device management hubResearchHub/- Research interfaceMediaHub/- Stable Diffusion UISettings/- System configuration
Component Library:
- shadcn/ui primitives (Radix UI)
- VoiceAssistant components with Zustand state
- Three.js-based 3D viewers
- Glassmorphism design system
Test Coverage: 32 Vitest test files (but FabricationConsole untested)
15+ services on bridged kitty network:
- 9 named volumes for persistence
- Health checks on key services (30s interval)
- Environment template via YAML anchors
Missing:
- RabbitMQ in separate compose file (confusing setup)
- No database clustering in main config
- No automated backup strategy
8-phase orchestration in ops/scripts/start-all.sh:
- LLM Servers (Ollama + llama.cpp)
- LLM Health Checks (10-min timeout)
- Docker Services
- Service Validation
- API Health Checks
- Images Service (non-critical)
- Voice Service (non-critical)
- HexStrike Security Tools (optional)
Startup time: ~5-10 minutes (Ollama model loading dominates)
HAProxy configuration:
- 3 gateway replicas, round-robin
- WebSocket sticky sessions (source hash)
- SSE streaming support
- Stats dashboard on :8404
- 30-minute timeouts, 1-hour tunnel
Missing:
- Circuit breaker pattern
- Rate limiting
- Backend degradation handling
| Issue | Location | Impact | Recommendation |
|---|---|---|---|
| Fabrication integrations incomplete | services/fabrication/ |
Blocks 3D print outcomes | Wire MinIO + MQTT clients |
| No database backup automation | infra/compose/ |
Data loss risk | Implement pg_dump cron job |
| FabricationConsole untested | services/ui/ |
Regression risk | Add Vitest coverage |
| JSON parsing fragility | collective/orchestrator.py |
Collective failures | Use structured output mode |
| Issue | Location | Impact | Recommendation |
|---|---|---|---|
| Research graph debug logging | brain/research/ |
Performance impact | Clean up DEBUG statements |
| Tool execution duplicated | Router + Agent | Code maintenance | Consolidate tool paths |
| Async/sync mismatch | brain/llm_client.py |
Performance hit | Remove deprecated sync wrapper |
| Service discovery hardcoded | common/service_manager/ |
Scaling limitation | Implement service registry |
| No E2E test suite | tests/ |
Integration bugs | Add Playwright tests |
| Issue | Location | Impact | Recommendation |
|---|---|---|---|
| RabbitMQ separate compose | infra/compose/ |
Setup confusion | Merge into main compose |
✅ RESOLVED - Now uses envsubst templating via docker-entrypoint.sh |
|||
| Slicer profiles hardcoded | config/slicer_profiles/ |
User friction | Add dynamic loading from UI |
| Tool execution in collective | collective/ |
Plans don't modify files | Wire tool_executor callback |
| Plan caching not implemented | collective/config.py |
Redundant computation | Cache by request hash |
| Vision pipeline cleanup | brain/routing/ |
Memory leaks | Add download cleanup |
| Issue | Location | Impact | Recommendation |
|---|---|---|---|
| Vendor lookup incomplete | discovery/ |
Minor discovery gaps | Expand OUI database |
| No OpenAPI schema | Gateway | Client integration | Auto-generate from FastAPI |
| Voice service undocumented | services/voice/ |
Operator confusion | Add setup guide |
| Context blinding aggressive | collective/prompts.py |
May lose context | Review filter patterns |
- Broker Service: Command allow-list, privilege dropping
- SafetyChecker: Confirmation phrases for hazard tools
- Permission Manager: User approval flow for tools
- HexStrike: 151 security assessment tools (optional)
| Gap | Severity | Mitigation |
|---|---|---|
✅ RESOLVED - Uses envsubst templating in docker-entrypoint.sh |
||
| No rate limiting on gateway | Medium | Add HAProxy rate limits |
| Broker has no audit trail encryption | Low | Encrypt audit logs |
| Tool confirmation not enforced everywhere | Medium | Audit all code paths |
| Model | Port | Tokens/sec | Context |
|---|---|---|---|
| GPT-OSS 120B | 11434 | ~3-5 | 128K |
| Athene V2 Q4 | 8083 | ~15-20 | 128K |
| Hermes 3 8B | 8084 | ~30-40 | 8K |
| Gemma 3 27B | 8086 | ~10-15 | 8K |
| Devstral 2 123B | 8087 | ~5-8 | 128K |
- Startup: 5-10 minutes (Ollama model loading)
- Semantic caching: Disabled by default (SentenceTransformer slow)
- Tool selection: Loads ALL tools if semantic mode enabled
- Research checkpointing: No batch writes, database contention
| Category | Files | Coverage |
|---|---|---|
| Unit (Python) | 20+ | Core logic covered |
| Integration (Python) | 15+ | Major workflows |
| Unit (React) | 32 | Components + hooks |
| E2E | 0 | Gap |
| Visual Regression | 0 | Gap |
| Load Testing | 0 | Gap |
- FabricationConsole page (zero tests)
- WebSocket/MQTT real-time features
- HAProxy failover scenarios
- Voice service integration
- Collective architecture edge cases
| Document | Size | Coverage |
|---|---|---|
| README.md | 46.6KB | Project overview |
| KITTY_OperationsManual.md | 42.1KB | Operations guide |
| CLAUDE.md | 14.8KB | AI assistant instructions |
| COLLECTIVE_ARCHITECTURE.md | ~8KB | Multi-model orchestration |
- OpenAPI/AsyncAPI schema
- Database migration guide
- Troubleshooting playbook
- Voice service setup
- RabbitMQ message schema
- 21e1310 - Preserve middleware metadata in pipeline aggregation
- 792e3ad - Add Senior/Junior delegation pattern
- 0edf44f - Implement event streaming and tool execution
- 580d529 - Implement collective middleware
- d68804a - Backfill tool_calls, optimize Ralph-Wiggum
- 98d8dec - Add JSON embedded tool call parsing
- Tiered Collective Architecture (experimental)
- Research graph automation
- Tool execution integration in collective
- Plan caching implementation
- Wire MinIO client in fabrication service
- Add FabricationConsole unit tests
- Clean up research graph debug logging
- Implement database backup automation
- Complete tool execution in collective workflow
- Add E2E test suite with Playwright
- Merge RabbitMQ into main compose
- Generate OpenAPI schema from FastAPI
- Implement service discovery pattern
- Add circuit breaker to HAProxy
- Complete research graph automation
- Document voice service integration
services/brain/- Orchestrator (8000)services/gateway/- API gatewayservices/fabrication/- Printer control (8300)services/cad/- 3D model generation (8200)services/voice/- STT/TTS (8400)
services/kitty-code/src/kitty_code/core/- Core agentservices/kitty-code/src/kitty_code/core/collective/- Multi-model~/.kitty-code/config.toml- Configuration
infra/compose/docker-compose.yml- Docker servicesinfra/haproxy/haproxy.cfg- Load balancerops/scripts/start-all.sh- Startup orchestration
.env.example- Environment templateconfig/tool_registry.yaml- Tool definitionsconfig/slicer_profiles/- CuraEngine configs
End of Project Status Report