zarvent · google-labs-jules · Jan 19, 2026
diff --git a/apps/backend/AGENTS.md b/apps/backend/AGENTS.md
@@ -1,204 +1,113 @@
 # Voice2Machine Backend
 
-AI agent instructions for the core daemon and backend services.
+**Mission**: Provide a low-latency, privacy-first AI backend that orchestrates audio processing and intelligence locally using State of the Art 2026 standards.
 
-**Architecture**: Hexagonal (Ports & Adapters) + 4-Phase Performance Pipeline
-**Language**: Python 3.12+ (Asyncio-native, uvloop)
-**Privacy**: Local-first, no telemetry
+---
+
+## Tech Stack
+
+| Component | Version/Tool |
+|-----------|--------------|
+| Language | Python 3.12+ (Asyncio-native) |
+| Runtime | `uv` (Package Manager), `uvloop` (Event Loop) |
+| Validation | Pydantic V2 (Strict Schema) |
+| Linting | Ruff (SOTA 2026) |
+| Testing | Pytest + `pytest-asyncio` |
+| Audio/VAD | Rust `v2m_engine` (Primary) |
+| ML/AI | `faster-whisper`, Google GenAI, Ollama |
 
 ---
 
 ## Commands (File-Scoped)
 
-Prioritize these over full-project runs.
+**Performance Rule**: Always prefer file-scoped commands over full-project builds to reduce feedback latency.
 
 ```bash
-# Lint single file
+# Lint & Fix single file
 ruff check src/v2m/path/to/file.py --fix
 
 # Format single file
 ruff format src/v2m/path/to/file.py
 
-# Type check (via LSP or)
-# ruff integrates type checking
-
 # Test single file
-venv/bin/pytest tests/unit/path/to/test_file.py -v
+pytest tests/unit/path/to/test_file.py -v
 
-# Run daemon
+# Run Daemon (Dev Mode)
 python -m v2m.main --daemon
 ```
 
-> **Full builds only on explicit request.**
-
----
-
-## Tech Stack
-
-| Component | Version/Tool |
-|-----------|--------------|
-| Language | Python 3.12+ with `asyncio` |
-| Event Loop | `uvloop` (installed on daemon startup) |
-| Validation | Pydantic V2 |
-| Linting | Ruff (SOTA 2026) |
-| Testing | Pytest + `pytest-asyncio` |
-| Serialization | `orjson` (3-10x faster than stdlib) |
-| Audio | Rust `v2m_engine` (primary), `sounddevice` (fallback) |
-| ML | `faster-whisper`, Google GenAI (Gemini) |
-
 ---
 
 ## Project Structure
 
 ```
 src/v2m/
-├── domain/          # Entities & Protocols. ZERO external deps (except Pydantic)
-├── application/     # Handlers, use cases. Orchestrates domain logic
-├── infrastructure/  # Adapters: Whisper, Audio, LLM, FileSystem
-│   ├── audio/       # AudioRecorder (Rust/Python hybrid)
-│   ├── persistent_model.py      # Whisper "always warm" worker
-│   └── streaming_transcriber.py # Real-time inference loop
-├── core/            # DI container, IPC protocol, logging
-│   ├── di/container.py
-│   ├── ipc_protocol.py
-│   └── client_session.py
+├── domain/          # Pure Entities & Protocols. ZERO external deps.
+├── application/     # Use Cases, Command Handlers. Orchestration.
+├── infrastructure/  # Adapters: Audio, LLM, Filesystem, Notifications.
+│   ├── audio/       # AudioRecorder
+│   └── system_monitor.py # Rust-accelerated monitoring
+├── core/            # Framework services (DI, Logging, IPC).
+│   ├── cqrs/        # Command/Query Buses
+│   ├── providers/   # Dependency Injection Providers
+│   ├── logging.py   # Structured JSON Logging
+│   └── ipc_protocol.py
 └── main.py          # Entry point
 ```
 
 ---
 
-## Performance Architecture (4 Phases)
-
-### Phase 1: Rust-Python Bridge
-- Audio capture via `v2m_engine` (lock-free ring buffer, GIL-free)
-- `RustAudioStream` implements `AsyncIterator`
-- `wait_for_data()` is awaitable—no polling
+## Observability (SOTA 2026)
 
-### Phase 2: Persistent Model Worker
-- `PersistentWhisperWorker` keeps model in VRAM ("always warm")
-- GPU ops isolated in dedicated `ThreadPoolExecutor`
-- Memory pressure detection via `psutil` (>90% triggers unload)
+### 1. Structured Logging
+All logs are emitted as JSON via `v2m.core.logging`.
+- **Format**: `{"asctime": "...", "name": "v2m", "levelname": "INFO", "message": "..."}`
+- **Usage**:
+  ```python
+  from v2m.core.logging import logger
+  logger.info("process_started", extra={"job_id": 123})
+  ```
 
-### Phase 3: Streaming Inference
-- `StreamingTranscriber` emits provisional text every 500ms
-- `ClientSessionManager` handles event push to clients
-- Protocol: `status="event"` (provisional) → `status="success"` (final)
-
-### Phase 4: Async Hygiene
-- `uvloop.install()` on daemon startup
-- `orjson` for fast IPC serialization
-- No sync I/O in hot paths
+### 2. System Monitor
+Real-time resource tracking via `v2m.infrastructure.system_monitor`.
+- **Layer 1**: Rust `v2m_engine` (No GIL, instant RAM/CPU/Temp metrics).
+- **Layer 2**: `psutil`/`torch` fallback.
+- **Optimization**: Static info is cached; GPU check uses memoized `torch` reference.
 
 ---
 
 ## Code Standards
 
-### Hexagonal Boundaries
-- **Inward pointing**: Domain knows nothing about Infrastructure
-- **Protocols over Classes**: Use `typing.Protocol` in `domain/`
-
-### Async Non-Blocking
-```python
-# ❌ NEVER
-time.sleep(1)
-open("file.txt").read()
+### Hexagonal Architecture
+- **Dependency Rule**: `domain` -> `application` -> `infrastructure`. Never the reverse.
+- **Protocols**: Define interfaces in `domain` or `core/interfaces.py`.
 
-# ✅ ALWAYS
-await asyncio.sleep(1)
-await aiofiles.open("file.txt")
-
-# GPU/CPU intensive → offload to executor
-await asyncio.to_thread(heavy_computation)
-await loop.run_in_executor(self._executor, func)
-```
-
-### Concrete Example: Domain Entity
-```python
-# src/v2m/domain/entities.py
-from pydantic import BaseModel, ConfigDict
-
-class Transcription(BaseModel):
-    model_config = ConfigDict(frozen=True)  # Immutable
-    text: str
-    confidence: float
-    language: str
-```
-
-### Concrete Example: Async Handler
-```python
-# src/v2m/application/command_handlers.py
-async def handle(self, command: StopRecordingCommand) -> str | None:
-    # Async service call—no blocking
-    transcription = await self.transcription_service.stop_and_transcribe()
-    self.clipboard_service.copy(transcription)
-    return transcription
-```
-
----
-
-## Testing Guidelines
-
-- **Unit Tests**: Mock ALL infrastructure adapters
-- **Behavioral**: Verify "what the system does", not implementation details
-- **Coverage**: Target >80% for domain/application logic
-- **Async Tests**: Use `@pytest.mark.asyncio` decorator
-
-```bash
-# Run all unit tests
-venv/bin/pytest tests/unit/ -v
-
-# Run with coverage
-venv/bin/pytest tests/unit/ --cov=src/v2m --cov-report=term-missing
-```
-
----
-
-## Git & PR Standards
-
-- **Commit**: `[scope]: behavior` (e.g., `infra/whisper: fix VAD sensitivity`)
-- **PR Check**: `ruff check` + `ruff format` must pass
-- **Diff**: Small, focused changes with brief summaries
+### Async Hygiene
+- **Blocking I/O**: 🚫 Forbidden in `async def`. Use `asyncio.to_thread`.
+- **Files**: Use `aiofiles`.
+- **Sleep**: `await asyncio.sleep()`.
 
 ---
 
 ## Boundaries
 
-### ✅ Always do
-- Read `domain/` protocols before implementing adapters
-- Verify `ruff` passes on every modified file
-- Use `logger.info/debug` for trace-level info
-- Run single-file tests before committing
-
-### ⚠️ Ask first
-- Adding dependencies to `pyproject.toml`
-- Modifying DI container or Event Bus
-- Changing `config.toml` schema
-- Full project builds
-
-### 🚫 Never do
-- **Commit secrets**: No API keys, tokens, or credentials in code
-- **Hardcode paths**: Use `v2m.utils.paths` or `get_secure_runtime_dir()`
-- **Block the loop**: No sync I/O in async handlers
-- **Delete node_modules/venv**: Ask first
-- **Push to main**: Always use PRs
+### ✅ Always
+- Run `ruff check` on modified files.
+- Use `logger.info` with structured `extra={}` data.
+- Verify `v2m_engine` integration when touching audio logic.
 
----
+### ⚠️ Ask First
+- Adding new `pip` dependencies.
+- Modifying `config.toml` structure.
+- Changing IPC protocol headers.
 
-## Security Considerations
-
-- **No telemetry**: All processing is local
-- **Secrets**: Use environment variables (`GEMINI_API_KEY`)
-- **IPC**: Unix socket with 1MB payload limit (DoS protection)
-- **Config**: Validate with Pydantic before use
+### 🚫 Never
+- Commit secrets (API keys).
+- Use `print()` (Use `logger`).
+- Hardcode absolute paths (Use `v2m.utils.paths`).
 
 ---
 
-## Common Pitfalls
-
-| Pitfall | Fix |
-|---------|-----|
-| Pydantic V1 syntax | Use V2 exclusively (`model_config`, `ConfigDict`) |
-| Circular imports | Import from `domain/` into `application/`, never vice-versa |
-| CUDA context | Prefer `faster-whisper` abstractions over raw PyTorch |
-| Sync in async | Offload blocking calls to `asyncio.to_thread` |
-| MagicMock for async | Use `AsyncMock` for async methods |
+## Git Workflow
+- **Commit**: `scope: description` (e.g., `infra/monitor: fix gpu cache`).
+- **PRs**: Atomic changes. Verify tests pass.
diff --git a/apps/backend/README.md b/apps/backend/README.md
@@ -1,6 +1,8 @@
-# Backend Voice2Machine (Python Core)
+# Voice2Machine Backend (Python Core)
 
-The "brain" of the system. Handles business logic, audio processing, and AI inference.
+The "brain" of the system. Handles business logic, audio processing, and AI inference using State of the Art 2026 standards.
+
+**⚠️ AI Agents & Developers**: Please refer to `AGENTS.md` for strict coding standards, mission, and boundaries.
 
 ## 🚀 Quick Start (Dev Mode)
 
@@ -11,80 +13,56 @@ Run the installer from **anywhere** in the project:
 ```bash
 # From project root OR from scripts/
 ./scripts/install.sh
-
-# The installer will:
-# 1. Detect Python 3.12+ automatically
-# 2. Install uv (10-100x faster than pip)
-# 3. Create venv and install dependencies
-# 4. Verify GPU/CUDA availability
 ```
 
 ### Manual Development Setup
 
+We use `uv` for blazing fast package management.
+
 ```bash
 # 1. Navigate to backend
 cd apps/backend
 
-# 2. Activate virtual environment
-source venv/bin/activate
+# 2. Create virtualenv
+uv venv
+
+# 3. Activate virtual environment
+source .venv/bin/activate
 
-# 3. Install in editable mode (useful for dev)
-uv pip install -e .  # or: pip install -e .
+# 4. Install dependencies
+uv pip install -e .
 
-# 4. Launch the Daemon (Server)
-# This will keep the process alive listening on /tmp/v2m.sock
+# 5. Launch the Daemon (Server)
 python -m v2m.main --daemon
 ```
 
 ## 🏗️ Development Commands
 
-We use modern tools to ensure code quality.
-
-### Testing (Pytest)
-
-```bash
-# Fast unit tests
-pytest tests/unit/
-
-# Integration tests (requires GPU/Audio)
-pytest tests/integration/
-```
-
-### Linting & Formatting (Ruff)
-
-We use `ruff` (the fastest linter in the West) to replace flake8, isort, and black.
+See `AGENTS.md` for the preferred file-scoped commands.
 
 ```bash
-# Check and autofix
-ruff check src/ --fix
+# Check all
+ruff check src/
 
-# Format
-ruff format src/
+# Test all
+pytest tests/unit/
 ```
 
 ## 📦 Project Structure
 
 ```
-apps/backend/
-├── src/v2m/
-│   ├── application/    # Use cases (Commands/Handlers)
-│   ├── core/           # Command bus and global configuration
-│   ├── domain/         # Pure entities and exceptions
-│   ├── infrastructure/ # Real implementations (Whisper, Gemini, Audio)
-│   └── main.py         # Entrypoint
-├── config.toml         # Default configuration
-└── pyproject.toml      # Build and tooling configuration
+apps/backend/src/v2m/
+├── domain/          # Entities & Protocols
+├── application/     # Use Cases & Handlers
+├── infrastructure/  # Adapters (Audio, AI, OS)
+├── core/            # Framework (DI, CQRS, Logging, IPC)
+└── main.py          # Entrypoint
 ```
 
 ## 🔌 Socket API
 
 The backend exposes a Unix Socket at `$XDG_RUNTIME_DIR/v2m/v2m.sock` (typically `/run/user/<uid>/v2m/v2m.sock`).
 
-> **Note**: The socket location follows the XDG Base Directory Specification for secure, user-isolated runtime files.
-
 **Protocol:**
-
-1.  **Header**: 4 bytes (Big Endian) indicating message length.
-2.  **Body**: JSON string encoded in UTF-8.
-
-_Message example:_ `{"type": "toggle_recording"}`
+1.  **Header**: 4 bytes (Big Endian) indicating payload size.
+2.  **Body**: JSON string (UTF-8).