KITT Project Status Report

Document Type: Comprehensive Project Status Report Generated: January 2026 Project: KITTY (Knowledgeable Intelligent Tool-using Tabletop Yoda)

Executive Summary

KITT is a sophisticated offline-first, voice-enabled fabrication lab orchestrator running on Mac Studio M3 Ultra. The system integrates local AI inference with 3D printing, CNC control, and smart home automation.

Project Health Overview

Dimension	Status	Confidence
Core Query/Chat Pipeline	✅ Production-Ready	High
Fabrication Workflow	⚠️ Partially Complete	Medium
Multi-Model Collective	🔄 Experimental	Low-Medium
Voice Interface	⚠️ Functional but Undocumented	Medium
Research Graph	🔄 In Development	Low
Infrastructure	✅ Solid Foundation	High
UI/UX	✅ Well-Architected	High
Testing	⚠️ Gaps in E2E/UI	Medium

1. Architecture Overview

1.1 Service Topology

KITT operates as a microservices architecture with 15+ independent services:

┌──────────────────────────────────────────────────────────────────────┐
│                         CLIENT LAYER                                  │
├──────────────┬──────────────┬─────────────────┬─────────────────────┤
│  React UI    │  CLI/TUI     │  Voice Service  │   kitty-code TUI    │
│  (4173)      │  (kitty-cli) │  (8400/8550)    │   (Textual)         │
└──────────────┴──────────────┴─────────────────┴─────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────────┐
│                      GATEWAY LAYER (HAProxy :8080)                    │
│  • 3 gateway replicas, round-robin                                    │
│  • WebSocket sticky sessions, SSE streaming                           │
│  • Stats dashboard on :8404                                           │
└──────────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────────┐
│                      ORCHESTRATION LAYER                              │
├─────────────────────────────┬────────────────────────────────────────┤
│    Brain Service (:8000)    │  • LLM routing (confidence-based)      │
│                             │  • ReAct agent (max 10 iterations)     │
│                             │  • Tool execution via MCP              │
│                             │  • Research graph (LangGraph)          │
│                             │  • Conversation management             │
└─────────────────────────────┴────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────────┐
│                      DOMAIN SERVICES                                  │
├────────────────┬───────────────┬───────────────┬─────────────────────┤
│ Fabrication    │ CAD           │ Discovery     │ Other Services      │
│ (:8300)        │ (:8200)       │ (:8500)       │                     │
│ • Printers     │ • Zoo/Tripo   │ • mDNS/SSDP   │ • broker (:8777)    │
│ • Slicing      │ • CadQuery    │ • Bambu UDP   │ • images (:8600)    │
│ • Outcomes     │ • Artifacts   │ • ARP scan    │ • mem0-mcp (:8765)  │
└────────────────┴───────────────┴───────────────┴─────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────────┐
│                      INFRASTRUCTURE LAYER                             │
├────────────┬──────────┬─────────┬─────────┬────────────┬─────────────┤
│ PostgreSQL │ Redis    │ Qdrant  │ MinIO   │ RabbitMQ   │ Mosquitto   │
│ (:5432)    │ (:6379)  │ (:6333) │ (:9000) │ (AMQP)     │ (:1883)     │
│ State      │ Cache    │ Vectors │ S3      │ Messaging  │ MQTT        │
└────────────┴──────────┴─────────┴─────────┴────────────┴─────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────────┐
│                      LLM INFERENCE LAYER                              │
├────────────────────────────────┬─────────────────────────────────────┤
│    Ollama (:11434)             │    llama.cpp Servers                │
│    • GPT-OSS 120B (reasoner)   │    • :8083 Athene V2 Q4 (tools)     │
│    • 128K context              │    • :8084 Hermes 3 8B (summary)    │
│                                │    • :8086 Gemma 3 27B (vision)     │
│                                │    • :8087 Devstral 2 123B (coder)  │
└────────────────────────────────┴─────────────────────────────────────┘

1.2 Key Technology Stack

Layer	Technology	Version
Frontend	React + Vite + TypeScript	18.3.1
UI Components	Radix UI (shadcn/ui)	Latest
Styling	Tailwind CSS v4	Latest
3D Rendering	Three.js	0.172.0
Backend	FastAPI + Pydantic	Latest
LLM (Primary)	Ollama (GPT-OSS 120B)	Latest
LLM (Specialized)	llama.cpp (GGUF sharded)	Latest
Database	PostgreSQL 16	16.x
Vector DB	Qdrant	1.11.0
Message Queue	RabbitMQ	Latest
MQTT Broker	Eclipse Mosquitto	2.0
Load Balancer	HAProxy	Latest
Observability	Prometheus + Grafana + Loki	2.53/10.4/3.0

2. Feature Status by Domain

2.1 Core Query Pipeline ✅

Status: Production-Ready

The central chat/query flow is fully operational:

User Query → Gateway → Brain Router → LLM Selection → Response
                              ↓
              • Local (free): llama.cpp multi-server
              • MCP (cheap): Perplexity search-augmented
              • Frontier (expensive): OpenAI/Anthropic

Routing Logic:

Confidence-based model selection with 3 tiers
Semantic/keyword tool selection
Vision pipeline for image understanding
ReAct agent with max 10 iterations
Fallback cascade: local → MCP → frontier

Files:

services/brain/routing/router.py - Main routing engine
services/brain/llm_client.py - Provider registry
services/brain/tools/mcp_client.py - MCP tool execution

2.2 Fabrication Workflow ⚠️

Status: Partially Complete (Core works, integrations pending)

5-Step Progressive Workflow:

Step	Name	Status	Notes
1	Generate	✅	Zoo/Tripo/CadQuery providers working
2	Orient	✅	Rotation matrix optimization
3	Segment	✅	Dimension checking, joint types
4	Slice	⚠️	CuraEngine integration needs profiles
5	Print	⚠️	Bambu Cloud works, local queue partial

Known Incomplete Integrations (from TODOs in code):

MinIO client not wired for snapshot uploads
MQTT client not fully integrated for Bambu telemetry
Material spool auto-selection not implemented
Camera capture MQTT pattern incomplete
Outcome tracking missing print success metrics

Files:

services/fabrication/app.py - Service entrypoint
services/ui/src/pages/FabricationConsole/ - React workflow
config/slicer_profiles/ - CuraEngine configurations

2.3 Tiered Collective Architecture 🔄

Status: Experimental (Recently Developed, 6 commits in active branch)

This is a multi-model orchestration system implementing Senior/Junior delegation:

User Request
    ↓
ComplexityRouter (pattern matching + confidence)
    ↓
    ├─→ Trivial patterns? → Direct execution (skip collective)
    └─→ Complex task → Collective Orchestration
          ↓
          Planner (Devstral 2 123B) - Strategic decomposition
          ↓
          For each step:
              ├─→ Executor (Devstral Small 2 24B) - Fast implementation
              ├─→ Judge (Devstral 2 123B) - Validation & mentorship
              └─→ Loop: APPROVE → next step | REVISE → retry
          ↓
          Complete

Model Allocation:

Role	Model	Speed	Purpose
Planner	Devstral 2 123B	~5 tok/s	Task decomposition
Executor	Devstral Small 2 24B	~20-25 tok/s	Code generation
Judge	Devstral 2 123B	~5 tok/s	Quality assurance

Trade-off: 24B Executor provides ~4-5x speedup with only 4% accuracy loss (68% vs 72% SWE-bench)

State Machine:

IDLE → ROUTING → PLANNING → EXECUTING → JUDGING → COMPLETE
                                            ↓
                              ├─→ REVISING (loop back)
                              └─→ FALLBACK (graceful degradation)

Configuration (~/.kitty-code/config.toml):

[collective]
enabled = false  # Disabled by default
planner_model = "local"
executor_model = "executor"
max_revision_cycles = 20
teaching_mode = true

Known Issues:

JSON parsing fragility - _repair_truncated_json() workaround
Tool execution in collective doesn't write files
Plan caching not implemented despite config flag
Context blinding filter may be too aggressive

Files:

services/kitty-code/src/kitty_code/core/collective/ - Module
services/kitty-code/docs/COLLECTIVE_ARCHITECTURE.md - Documentation

2.4 Research Graph 🔄

Status: In Development (Debug logging still present)

LangGraph-based research automation:

Multi-tool researcher agent
Claim extraction and verification
Source retrieval and summarization
PostgreSQL checkpointing

Known Issues:

15+ DEBUG statements scattered throughout
Session model synthesis not fully wired
Topic-based research not implemented
Budget manager token tracking manual

Files:

services/brain/research/graph/ - LangGraph implementation
services/brain/research/scheduler.py - Job scheduling

2.5 kitty-code TUI ✅

Status: Functional with Active Development

Textual-based coding assistant with:

5 agent modes (DEFAULT, PLAN, ACCEPT_EDITS, AUTO_APPROVE, AUTO_ITERATE)
MCP tool discovery and integration
Middleware pipeline for focus preservation
Session persistence and resumption

Agent Modes:

Mode	Safety	Behavior
DEFAULT	Neutral	Approval required
PLAN	Safe	Read-only exploration
ACCEPT_EDITS	Destructive	Auto-approve file edits
AUTO_APPROVE	YOLO	Auto-approve all
AUTO_ITERATE	YOLO	Loop until complete

Key Middleware:

TaskInjectionMiddleware - Focus preservation (injects plan/todos)
CompletionCheckMiddleware - Ralph-Wiggum pattern (prevents premature stop)
ModelRoutingMiddleware - Routes to collective or direct execution

Files:

services/kitty-code/src/kitty_code/ - Main package
~/.kitty-code/config.toml - Configuration

2.6 React UI ✅

Status: Well-Architected (2.2MB codebase, 32 test files)

Key Pages:

FabricationConsole/ - 5-step printing workflow
Dashboard/ - Device management hub
ResearchHub/ - Research interface
MediaHub/ - Stable Diffusion UI
Settings/ - System configuration

Component Library:

shadcn/ui primitives (Radix UI)
VoiceAssistant components with Zustand state
Three.js-based 3D viewers
Glassmorphism design system

Test Coverage: 32 Vitest test files (but FabricationConsole untested)

3. Infrastructure Assessment

3.1 Docker Compose

15+ services on bridged kitty network:

9 named volumes for persistence
Health checks on key services (30s interval)
Environment template via YAML anchors

Missing:

RabbitMQ in separate compose file (confusing setup)
No database clustering in main config
No automated backup strategy

3.2 Startup Sequence

8-phase orchestration in ops/scripts/start-all.sh:

LLM Servers (Ollama + llama.cpp)
LLM Health Checks (10-min timeout)
Docker Services
Service Validation
API Health Checks
Images Service (non-critical)
Voice Service (non-critical)
HexStrike Security Tools (optional)

Startup time: ~5-10 minutes (Ollama model loading dominates)

3.3 Load Balancing

HAProxy configuration:

3 gateway replicas, round-robin
WebSocket sticky sessions (source hash)
SSE streaming support
Stats dashboard on :8404
30-minute timeouts, 1-hour tunnel

Missing:

Circuit breaker pattern
Rate limiting
Backend degradation handling

4. Improvement Opportunities

4.1 Critical Priority

Issue	Location	Impact	Recommendation
Fabrication integrations incomplete	`services/fabrication/`	Blocks 3D print outcomes	Wire MinIO + MQTT clients
No database backup automation	`infra/compose/`	Data loss risk	Implement pg_dump cron job
FabricationConsole untested	`services/ui/`	Regression risk	Add Vitest coverage
JSON parsing fragility	`collective/orchestrator.py`	Collective failures	Use structured output mode

4.2 High Priority

Issue	Location	Impact	Recommendation
Research graph debug logging	`brain/research/`	Performance impact	Clean up DEBUG statements
Tool execution duplicated	Router + Agent	Code maintenance	Consolidate tool paths
Async/sync mismatch	`brain/llm_client.py`	Performance hit	Remove deprecated sync wrapper
Service discovery hardcoded	`common/service_manager/`	Scaling limitation	Implement service registry
No E2E test suite	`tests/`	Integration bugs	Add Playwright tests

4.3 Medium Priority

Issue	Location	Impact	Recommendation
RabbitMQ separate compose	`infra/compose/`	Setup confusion	Merge into main compose
~~Stats password in env~~	~~HAProxy config~~	~~Security exposure~~	✅ RESOLVED - Now uses `envsubst` templating via `docker-entrypoint.sh`
Slicer profiles hardcoded	`config/slicer_profiles/`	User friction	Add dynamic loading from UI
Tool execution in collective	`collective/`	Plans don't modify files	Wire tool_executor callback
Plan caching not implemented	`collective/config.py`	Redundant computation	Cache by request hash
Vision pipeline cleanup	`brain/routing/`	Memory leaks	Add download cleanup

4.4 Low Priority

Issue	Location	Impact	Recommendation
Vendor lookup incomplete	`discovery/`	Minor discovery gaps	Expand OUI database
No OpenAPI schema	Gateway	Client integration	Auto-generate from FastAPI
Voice service undocumented	`services/voice/`	Operator confusion	Add setup guide
Context blinding aggressive	`collective/prompts.py`	May lose context	Review filter patterns

5. Security Considerations

5.1 Current Measures

Broker Service: Command allow-list, privilege dropping
SafetyChecker: Confirmation phrases for hazard tools
Permission Manager: User approval flow for tools
HexStrike: 151 security assessment tools (optional)

5.2 Gaps

Gap	Severity	Mitigation
~~HAProxy stats password in plaintext~~	~~Medium~~	✅ RESOLVED - Uses `envsubst` templating in `docker-entrypoint.sh`
No rate limiting on gateway	Medium	Add HAProxy rate limits
Broker has no audit trail encryption	Low	Encrypt audit logs
Tool confirmation not enforced everywhere	Medium	Audit all code paths

6. Performance Characteristics

6.1 LLM Inference

Model	Port	Tokens/sec	Context
GPT-OSS 120B	11434	~3-5	128K
Athene V2 Q4	8083	~15-20	128K
Hermes 3 8B	8084	~30-40	8K
Gemma 3 27B	8086	~10-15	8K
Devstral 2 123B	8087	~5-8	128K

6.2 Bottlenecks

Startup: 5-10 minutes (Ollama model loading)
Semantic caching: Disabled by default (SentenceTransformer slow)
Tool selection: Loads ALL tools if semantic mode enabled
Research checkpointing: No batch writes, database contention

7. Testing Coverage

7.1 Current State

Category	Files	Coverage
Unit (Python)	20+	Core logic covered
Integration (Python)	15+	Major workflows
Unit (React)	32	Components + hooks
E2E	0	Gap
Visual Regression	0	Gap
Load Testing	0	Gap

7.2 Missing Coverage

FabricationConsole page (zero tests)
WebSocket/MQTT real-time features
HAProxy failover scenarios
Voice service integration
Collective architecture edge cases

8. Documentation Status

8.1 Existing

Document	Size	Coverage
README.md	46.6KB	Project overview
KITTY_OperationsManual.md	42.1KB	Operations guide
CLAUDE.md	14.8KB	AI assistant instructions
COLLECTIVE_ARCHITECTURE.md	~8KB	Multi-model orchestration

8.2 Missing

OpenAPI/AsyncAPI schema
Database migration guide
Troubleshooting playbook
Voice service setup
RabbitMQ message schema

9. Active Development Areas

9.1 Recent Commits (Collective Architecture)

21e1310 - Preserve middleware metadata in pipeline aggregation
792e3ad - Add Senior/Junior delegation pattern
0edf44f - Implement event streaming and tool execution
580d529 - Implement collective middleware
d68804a - Backfill tool_calls, optimize Ralph-Wiggum
98d8dec - Add JSON embedded tool call parsing

9.2 In-Progress Features

Tiered Collective Architecture (experimental)
Research graph automation
Tool execution integration in collective
Plan caching implementation

10. Recommendations Summary

Immediate Actions (This Sprint)

Wire MinIO client in fabrication service
Add FabricationConsole unit tests
Clean up research graph debug logging
Implement database backup automation

Near-Term (Next 2 Sprints)

Complete tool execution in collective workflow
Add E2E test suite with Playwright
Merge RabbitMQ into main compose
Generate OpenAPI schema from FastAPI

Medium-Term (Quarter)

Implement service discovery pattern
Add circuit breaker to HAProxy
Complete research graph automation
Document voice service integration

Appendix A: File Reference

Core Services

services/brain/ - Orchestrator (8000)
services/gateway/ - API gateway
services/fabrication/ - Printer control (8300)
services/cad/ - 3D model generation (8200)
services/voice/ - STT/TTS (8400)

kitty-code TUI

services/kitty-code/src/kitty_code/core/ - Core agent
services/kitty-code/src/kitty_code/core/collective/ - Multi-model
~/.kitty-code/config.toml - Configuration

Infrastructure

infra/compose/docker-compose.yml - Docker services
infra/haproxy/haproxy.cfg - Load balancer
ops/scripts/start-all.sh - Startup orchestration

Configuration

.env.example - Environment template
config/tool_registry.yaml - Tool definitions
config/slicer_profiles/ - CuraEngine configs

End of Project Status Report

FilesExpand file tree

PROJECT_STATUS.md

Latest commit

History

PROJECT_STATUS.md

File metadata and controls

KITT Project Status Report

Executive Summary

Project Health Overview

1. Architecture Overview

1.1 Service Topology

1.2 Key Technology Stack

2. Feature Status by Domain

2.1 Core Query Pipeline ✅

2.2 Fabrication Workflow ⚠️

2.3 Tiered Collective Architecture 🔄

2.4 Research Graph 🔄

2.5 kitty-code TUI ✅

2.6 React UI ✅

3. Infrastructure Assessment

3.1 Docker Compose

3.2 Startup Sequence

3.3 Load Balancing

4. Improvement Opportunities

4.1 Critical Priority

4.2 High Priority

4.3 Medium Priority

4.4 Low Priority

5. Security Considerations

5.1 Current Measures

5.2 Gaps

6. Performance Characteristics

6.1 LLM Inference

6.2 Bottlenecks

7. Testing Coverage

7.1 Current State

7.2 Missing Coverage

8. Documentation Status

8.1 Existing

8.2 Missing

9. Active Development Areas

9.1 Recent Commits (Collective Architecture)

9.2 In-Progress Features

10. Recommendations Summary

Immediate Actions (This Sprint)

Near-Term (Next 2 Sprints)

Medium-Term (Quarter)

Appendix A: File Reference

Core Services

kitty-code TUI

Infrastructure

Configuration