Jarvis is a low-latency, integration-based Speech-to-Speech (STS) and Vision-Language (VLM) assistant framework optimized for the NVIDIA RTX 5090 (Blackwell). It orchestrates cutting-edge AI models into a unified pipeline capable of real-time voice interaction and visual analysis.
- Multi-Modal Brain: Support for Large Language Models (LLM) and Vision-Language Models (VLM) via Ollama and vLLM.
- Fast Transcription: Optimized Speech-to-Text (STT) powered by
faster-whisper. - Natural Voice: High-quality Text-to-Speech (TTS) using the
Chatterboxengine. - Hierarchical Dashboard: Real-time TUI dashboard for monitoring benchmarks, logs, and VRAM usage.
- Benchmarking Suite: Comprehensive test runner with automated Google Drive reporting and session-based artifact persistence.
- Refactor Guard: A high-fidelity "Plumbing Mode" to verify code integrity without requiring GPU resources.
Jarvis interacts with your environment through prioritized data channels, focusing on tactile productivity first.
| Channel | Type | Priority | Usage Example |
|---|---|---|---|
| Microphone | aud |
P0 | Capture "Summarize this" voice command. |
| Selection | txt |
P0 | Read highlighted code for refactoring. |
| Clipboard | txt |
P0 | Paste result directly back into a draft. |
| Chat UI | txt |
P1 | Monitor real-time logs and history. |
| Speaker | aud |
P1 | Verbal responses for hands-free mode. |
| Screenshot | img |
P2 | Analyze a static error or UI element. |
/servers: Individual component servers (STT, TTS, STS)./utils: Core system utilities (Config, Infra, VRAM, Hardware)./tests: Benchmarking logic, test plans, and domain suites./loadouts: Production-ready model configurations./docs: Detailed architectural and procedural documentation.
Jarvis requires a specialized environment for NVIDIA Blackwell hardware.
# Run the automated bootstrap script
python setup/setup_env.pyFor manual installation or troubleshooting, see docs/TUTORIAL_QUICKSTART.md.
Use the loadout manager to start or stop the Jarvis cluster.
# Apply a specific model preset
python manage_loadout.py --apply base-qwen30-multi
# Check cluster health
python manage_loadout.py --status- Quickstart: Installation, dependencies, and Hello World.
- Model Onboarding: Adding new models to the physics database.
- Contributing: Developer setup, commit hygiene, and refactor guards.
- Benchmarking: Running component tests and performance reports.
- Hardware Testing: Verifying Mic, Screen, and Camera with virtual drivers.
- Reporting: Regenerating and synchronizing benchmark data.
- Engine Management: Configuring Ollama and vLLM (Docker) lifecycles.
- Troubleshooting: Common errors, CUDA issues, and log analysis.
- Using the GUI: Interacting with the Speech-to-Speech assistant client.
- System Architecture: High-level component breakdown and data flow.
- The Testing Pyramid: Multi-tiered validation strategy (Mocking vs. Virtualizing).
- Unified Node Abstraction: Functional execution model for models and hardware.
- Reactive Flow Engine: Declarative graph orchestration and transport.
- Operational Concepts: Behavioral templates, triggers, and stateless turn logic.
- Model Physics: VRAM management, KV cache scaling, and calibration theory.
- Persona & Tone: Philosophical stance on assistant behavior and honesty.
- API Reference: HTTP endpoints and JSON schemas.
- Reporting & Data: CLI flags, JSON schemas, and GDrive structure.
- Configuration:
config.yamldictionary and environment variables. - Hardware Matrix: GPU and library compatibility guide (RTX 5090).
- Calibration Database: Physics YAML schemas and evidence store.
For AI-assisted development instructions, see GEMINI.MD.