Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
219 changes: 64 additions & 155 deletions apps/backend/AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,204 +1,113 @@
# Voice2Machine Backend

AI agent instructions for the core daemon and backend services.
**Mission**: Provide a low-latency, privacy-first AI backend that orchestrates audio processing and intelligence locally using State of the Art 2026 standards.

**Architecture**: Hexagonal (Ports & Adapters) + 4-Phase Performance Pipeline
**Language**: Python 3.12+ (Asyncio-native, uvloop)
**Privacy**: Local-first, no telemetry
---

## Tech Stack

| Component | Version/Tool |
|-----------|--------------|
| Language | Python 3.12+ (Asyncio-native) |
| Runtime | `uv` (Package Manager), `uvloop` (Event Loop) |
| Validation | Pydantic V2 (Strict Schema) |
| Linting | Ruff (SOTA 2026) |
| Testing | Pytest + `pytest-asyncio` |
| Audio/VAD | Rust `v2m_engine` (Primary) |
| ML/AI | `faster-whisper`, Google GenAI, Ollama |

---

## Commands (File-Scoped)

Prioritize these over full-project runs.
**Performance Rule**: Always prefer file-scoped commands over full-project builds to reduce feedback latency.

```bash
# Lint single file
# Lint & Fix single file
ruff check src/v2m/path/to/file.py --fix

# Format single file
ruff format src/v2m/path/to/file.py

# Type check (via LSP or)
# ruff integrates type checking

# Test single file
venv/bin/pytest tests/unit/path/to/test_file.py -v
pytest tests/unit/path/to/test_file.py -v

# Run daemon
# Run Daemon (Dev Mode)
python -m v2m.main --daemon
```

> **Full builds only on explicit request.**

---

## Tech Stack

| Component | Version/Tool |
|-----------|--------------|
| Language | Python 3.12+ with `asyncio` |
| Event Loop | `uvloop` (installed on daemon startup) |
| Validation | Pydantic V2 |
| Linting | Ruff (SOTA 2026) |
| Testing | Pytest + `pytest-asyncio` |
| Serialization | `orjson` (3-10x faster than stdlib) |
| Audio | Rust `v2m_engine` (primary), `sounddevice` (fallback) |
| ML | `faster-whisper`, Google GenAI (Gemini) |

---

## Project Structure

```
src/v2m/
├── domain/ # Entities & Protocols. ZERO external deps (except Pydantic)
├── application/ # Handlers, use cases. Orchestrates domain logic
├── infrastructure/ # Adapters: Whisper, Audio, LLM, FileSystem
│ ├── audio/ # AudioRecorder (Rust/Python hybrid)
── persistent_model.py # Whisper "always warm" worker
│ └── streaming_transcriber.py # Real-time inference loop
├── core/ # DI container, IPC protocol, logging
│ ├── di/container.py
│ ├── ipc_protocol.py
│ └── client_session.py
├── domain/ # Pure Entities & Protocols. ZERO external deps.
├── application/ # Use Cases, Command Handlers. Orchestration.
├── infrastructure/ # Adapters: Audio, LLM, Filesystem, Notifications.
│ ├── audio/ # AudioRecorder
── system_monitor.py # Rust-accelerated monitoring
├── core/ # Framework services (DI, Logging, IPC).
├── cqrs/ # Command/Query Buses
│ ├── providers/ # Dependency Injection Providers
│ ├── logging.py # Structured JSON Logging
│ └── ipc_protocol.py
└── main.py # Entry point
```

---

## Performance Architecture (4 Phases)

### Phase 1: Rust-Python Bridge
- Audio capture via `v2m_engine` (lock-free ring buffer, GIL-free)
- `RustAudioStream` implements `AsyncIterator`
- `wait_for_data()` is awaitable—no polling
## Observability (SOTA 2026)

### Phase 2: Persistent Model Worker
- `PersistentWhisperWorker` keeps model in VRAM ("always warm")
- GPU ops isolated in dedicated `ThreadPoolExecutor`
- Memory pressure detection via `psutil` (>90% triggers unload)
### 1. Structured Logging
All logs are emitted as JSON via `v2m.core.logging`.
- **Format**: `{"asctime": "...", "name": "v2m", "levelname": "INFO", "message": "..."}`
- **Usage**:
```python
from v2m.core.logging import logger
logger.info("process_started", extra={"job_id": 123})
```

### Phase 3: Streaming Inference
- `StreamingTranscriber` emits provisional text every 500ms
- `ClientSessionManager` handles event push to clients
- Protocol: `status="event"` (provisional) → `status="success"` (final)

### Phase 4: Async Hygiene
- `uvloop.install()` on daemon startup
- `orjson` for fast IPC serialization
- No sync I/O in hot paths
### 2. System Monitor
Real-time resource tracking via `v2m.infrastructure.system_monitor`.
- **Layer 1**: Rust `v2m_engine` (No GIL, instant RAM/CPU/Temp metrics).
- **Layer 2**: `psutil`/`torch` fallback.
- **Optimization**: Static info is cached; GPU check uses memoized `torch` reference.

---

## Code Standards

### Hexagonal Boundaries
- **Inward pointing**: Domain knows nothing about Infrastructure
- **Protocols over Classes**: Use `typing.Protocol` in `domain/`

### Async Non-Blocking
```python
# ❌ NEVER
time.sleep(1)
open("file.txt").read()
### Hexagonal Architecture
- **Dependency Rule**: `domain` -> `application` -> `infrastructure`. Never the reverse.
- **Protocols**: Define interfaces in `domain` or `core/interfaces.py`.

# ✅ ALWAYS
await asyncio.sleep(1)
await aiofiles.open("file.txt")

# GPU/CPU intensive → offload to executor
await asyncio.to_thread(heavy_computation)
await loop.run_in_executor(self._executor, func)
```

### Concrete Example: Domain Entity
```python
# src/v2m/domain/entities.py
from pydantic import BaseModel, ConfigDict

class Transcription(BaseModel):
model_config = ConfigDict(frozen=True) # Immutable
text: str
confidence: float
language: str
```

### Concrete Example: Async Handler
```python
# src/v2m/application/command_handlers.py
async def handle(self, command: StopRecordingCommand) -> str | None:
# Async service call—no blocking
transcription = await self.transcription_service.stop_and_transcribe()
self.clipboard_service.copy(transcription)
return transcription
```

---

## Testing Guidelines

- **Unit Tests**: Mock ALL infrastructure adapters
- **Behavioral**: Verify "what the system does", not implementation details
- **Coverage**: Target >80% for domain/application logic
- **Async Tests**: Use `@pytest.mark.asyncio` decorator

```bash
# Run all unit tests
venv/bin/pytest tests/unit/ -v

# Run with coverage
venv/bin/pytest tests/unit/ --cov=src/v2m --cov-report=term-missing
```

---

## Git & PR Standards

- **Commit**: `[scope]: behavior` (e.g., `infra/whisper: fix VAD sensitivity`)
- **PR Check**: `ruff check` + `ruff format` must pass
- **Diff**: Small, focused changes with brief summaries
### Async Hygiene
- **Blocking I/O**: 🚫 Forbidden in `async def`. Use `asyncio.to_thread`.
- **Files**: Use `aiofiles`.
- **Sleep**: `await asyncio.sleep()`.

---

## Boundaries

### ✅ Always do
- Read `domain/` protocols before implementing adapters
- Verify `ruff` passes on every modified file
- Use `logger.info/debug` for trace-level info
- Run single-file tests before committing

### ⚠️ Ask first
- Adding dependencies to `pyproject.toml`
- Modifying DI container or Event Bus
- Changing `config.toml` schema
- Full project builds

### 🚫 Never do
- **Commit secrets**: No API keys, tokens, or credentials in code
- **Hardcode paths**: Use `v2m.utils.paths` or `get_secure_runtime_dir()`
- **Block the loop**: No sync I/O in async handlers
- **Delete node_modules/venv**: Ask first
- **Push to main**: Always use PRs
### ✅ Always
- Run `ruff check` on modified files.
- Use `logger.info` with structured `extra={}` data.
- Verify `v2m_engine` integration when touching audio logic.

---
### ⚠️ Ask First
- Adding new `pip` dependencies.
- Modifying `config.toml` structure.
- Changing IPC protocol headers.

## Security Considerations

- **No telemetry**: All processing is local
- **Secrets**: Use environment variables (`GEMINI_API_KEY`)
- **IPC**: Unix socket with 1MB payload limit (DoS protection)
- **Config**: Validate with Pydantic before use
### 🚫 Never
- Commit secrets (API keys).
- Use `print()` (Use `logger`).
- Hardcode absolute paths (Use `v2m.utils.paths`).

---

## Common Pitfalls

| Pitfall | Fix |
|---------|-----|
| Pydantic V1 syntax | Use V2 exclusively (`model_config`, `ConfigDict`) |
| Circular imports | Import from `domain/` into `application/`, never vice-versa |
| CUDA context | Prefer `faster-whisper` abstractions over raw PyTorch |
| Sync in async | Offload blocking calls to `asyncio.to_thread` |
| MagicMock for async | Use `AsyncMock` for async methods |
## Git Workflow
- **Commit**: `scope: description` (e.g., `infra/monitor: fix gpu cache`).
- **PRs**: Atomic changes. Verify tests pass.
76 changes: 27 additions & 49 deletions apps/backend/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Backend Voice2Machine (Python Core)
# Voice2Machine Backend (Python Core)

The "brain" of the system. Handles business logic, audio processing, and AI inference.
The "brain" of the system. Handles business logic, audio processing, and AI inference using State of the Art 2026 standards.

**⚠️ AI Agents & Developers**: Please refer to `AGENTS.md` for strict coding standards, mission, and boundaries.

## 🚀 Quick Start (Dev Mode)

Expand All @@ -11,80 +13,56 @@ Run the installer from **anywhere** in the project:
```bash
# From project root OR from scripts/
./scripts/install.sh

# The installer will:
# 1. Detect Python 3.12+ automatically
# 2. Install uv (10-100x faster than pip)
# 3. Create venv and install dependencies
# 4. Verify GPU/CUDA availability
```

### Manual Development Setup

We use `uv` for blazing fast package management.

```bash
# 1. Navigate to backend
cd apps/backend

# 2. Activate virtual environment
source venv/bin/activate
# 2. Create virtualenv
uv venv

# 3. Activate virtual environment
source .venv/bin/activate

# 3. Install in editable mode (useful for dev)
uv pip install -e . # or: pip install -e .
# 4. Install dependencies
uv pip install -e .

# 4. Launch the Daemon (Server)
# This will keep the process alive listening on /tmp/v2m.sock
# 5. Launch the Daemon (Server)
python -m v2m.main --daemon
```

## 🏗️ Development Commands

We use modern tools to ensure code quality.

### Testing (Pytest)

```bash
# Fast unit tests
pytest tests/unit/

# Integration tests (requires GPU/Audio)
pytest tests/integration/
```

### Linting & Formatting (Ruff)

We use `ruff` (the fastest linter in the West) to replace flake8, isort, and black.
See `AGENTS.md` for the preferred file-scoped commands.

```bash
# Check and autofix
ruff check src/ --fix
# Check all
ruff check src/

# Format
ruff format src/
# Test all
pytest tests/unit/
```

## 📦 Project Structure

```
apps/backend/
├── src/v2m/
│ ├── application/ # Use cases (Commands/Handlers)
│ ├── core/ # Command bus and global configuration
│ ├── domain/ # Pure entities and exceptions
│ ├── infrastructure/ # Real implementations (Whisper, Gemini, Audio)
│ └── main.py # Entrypoint
├── config.toml # Default configuration
└── pyproject.toml # Build and tooling configuration
apps/backend/src/v2m/
├── domain/ # Entities & Protocols
├── application/ # Use Cases & Handlers
├── infrastructure/ # Adapters (Audio, AI, OS)
├── core/ # Framework (DI, CQRS, Logging, IPC)
└── main.py # Entrypoint
```

## 🔌 Socket API

The backend exposes a Unix Socket at `$XDG_RUNTIME_DIR/v2m/v2m.sock` (typically `/run/user/<uid>/v2m/v2m.sock`).

> **Note**: The socket location follows the XDG Base Directory Specification for secure, user-isolated runtime files.

**Protocol:**

1. **Header**: 4 bytes (Big Endian) indicating message length.
2. **Body**: JSON string encoded in UTF-8.

_Message example:_ `{"type": "toggle_recording"}`
1. **Header**: 4 bytes (Big Endian) indicating payload size.
2. **Body**: JSON string (UTF-8).