nethesis · Amygos · Feb 13, 2026 · Feb 13, 2026 · Feb 13, 2026 · Feb 13, 2026
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,63 @@
+# Copilot Instructions — Satellite
+
+## Build & Test
+
+```bash
+# Install
+pip install -r requirements.txt
+pip install -r requirements-dev.txt   # test deps (pytest, httpx, etc.)
+
+# Run all tests (with coverage)
+pytest
+
+# Run a single test file / single test
+pytest tests/test_api.py
+pytest tests/test_api.py::test_get_transcription_success -k "test_get_transcription_success"
+
+# Run the app
+python main.py
+```
+
+Python 3.12+. No linter configured in CI — only `pytest` runs in the build workflow.
+Container image uses `Containerfile` (multi-stage, python:slim base).
+
+## Architecture
+
+Satellite bridges Asterisk PBX ↔ transcription providers (Deepgram or VoxTral), publishing results over MQTT.
+
+### Runtime components (all in one process)
+
+| Module | Role |
+|---|---|
+| `main.py` | Entrypoint — starts the asyncio event loop for the real-time pipeline and a background thread running the FastAPI/Uvicorn HTTP server |
+| `asterisk_bridge.py` | ARI WebSocket client — listens for Stasis events, creates snoop channels + external media, manages per-call lifecycle |
+| `rtp_server.py` | UDP server — receives RTP audio, strips headers, routes packets to per-channel async queues by source port |
+| `deepgram_connector.py` | Streams audio to Deepgram via WebSocket — interleaves two RTP channels into stereo for multichannel transcription; aggregates final transcript on hangup (real-time path only, Deepgram-only for now) |
+| `mqtt_client.py` | Publishes interim/final transcription JSON to MQTT topics (`{prefix}/transcription`, `{prefix}/final`) |
+| `transcription/` | **Provider abstraction** — `base.py` defines interface; `deepgram.py` and `voxtral.py` implement REST API clients; `__init__.py` factory selects provider via env var or per-request override |
+| `api.py` | FastAPI app — `POST /api/get_transcription` accepts WAV uploads, calls transcription provider REST API, optionally persists to Postgres |
+| `call_processor.py` | **Runs as a subprocess** (invoked from api.py via `subprocess.run`) — reads JSON from stdin, calls AI enrichment, writes results to DB |
+| `ai.py` | LangChain + OpenAI — cleans transcript, generates summary + sentiment score (0-10) |
+| `db.py` | PostgreSQL + pgvector — schema auto-init with threading lock; stores transcripts, state machine (`progress` → `summarizing` → `done` / `failed`), and text-embedding-3-small chunks |
+
+### Key data flows
+
+1. **Real-time path:** Asterisk → ARI WebSocket → snoop channel → RTP → `rtp_server` → `deepgram_connector` (stereo WebSocket stream) → Deepgram → `mqtt_client` (Deepgram-only for now)
+2. **REST/batch path:** WAV upload → `api.py` → `transcription/<provider>` REST API (Deepgram or VoxTral) → (optionally) `db.py` persist → (optionally) `call_processor.py` subprocess → `ai.py` → `db.py` update
+
+### Non-obvious details
+
+- Two RTP streams per call (one per direction) are interleaved into a single stereo buffer for Deepgram's multichannel mode (real-time path only).
+- `asterisk_bridge` detects if Asterisk swapped the RTP source ports and adjusts speaker labels accordingly.
+- `call_processor` is deliberately a **subprocess** (not async task) — isolates OpenAI calls with independent timeout/logging, avoids blocking the event loop.
+- DB schema initialization is guarded by a **threading lock** (not asyncio lock) because `psycopg` sync connections are used alongside the async FastAPI server.
+- **Multi-provider support:** REST/batch path supports Deepgram and VoxTral. Select provider via `TRANSCRIPTION_PROVIDER` env var (default: `deepgram`) or per-request `provider=` parameter. Real-time path remains Deepgram-only.
+
+## Conventions
+
+- **Config:** Exclusively via environment variables (loaded from `.env` by `python-dotenv`). No config files or CLI args.
+- **Logging:** One logger per module (`logging.getLogger(__name__)`), level controlled by `LOG_LEVEL` env var.
+- **Async:** `asyncio` throughout the real-time pipeline; `asyncio.Lock` for connector close logic, `asyncio.Queue` for RTP buffer routing. Reconnection uses exponential backoff.
+- **Testing:** `pytest-asyncio` with `asyncio_mode = auto`. Tests monkeypatch env vars and mock external services (Deepgram, MQTT, psycopg). A conftest auto-fixture resets `db._schema_initialized` between tests.
+- **Auth:** Optional static bearer token (`API_TOKEN` env var) for `/api/*` endpoints. Accepts `Authorization: Bearer <token>` or `X-API-Token: <token>`.
+- **Validation:** `uniqueid` must match `\d+\.\d+` (Asterisk format).
diff --git a/Containerfile b/Containerfile
@@ -15,6 +15,7 @@ COPY requirements.txt /tmp/requirements.txt
 # Copy application files
 COPY *.py /tmp/
 COPY README.md /tmp/
+COPY transcription /tmp/transcription
 
 # Install dependencies
 RUN pip install --no-cache-dir --no-warn-script-location --user -r /tmp/requirements.txt
@@ -36,6 +37,7 @@ COPY --from=builder /root/.local /root/.local
 # Copy application files
 COPY --from=builder /tmp/*.py /app/
 COPY --from=builder /tmp/README.md /app/
+COPY --from=builder /tmp/transcription /app/transcription
 
 # Make sure scripts in .local are usable
 ENV PATH=/root/.local/bin:$PATH
@@ -55,7 +57,9 @@ ENV ASTERISK_URL="http://127.0.0.1:8088" \
     MQTT_USERNAME="satellite" \
     SATELLITE_MQTT_PASSWORD="dummypassword" \
     HTTP_PORT="8000" \
+    TRANSCRIPTION_PROVIDER="deepgram" \
     DEEPGRAM_API_KEY="" \
+    MISTRAL_API_KEY="" \
     LOG_LEVEL="INFO" \
     PYTHONUNBUFFERED="1"
 

diff --git a/README.md b/README.md
@@ -49,9 +49,16 @@ RTP_HEADER_SIZE=12
 MQTT_URL=mqtt://127.0.0.1:1883
 MQTT_TOPIC_PREFIX=satellite
 
-# Deepgram API Key
+# Transcription Provider (optional, default: deepgram)
+# Options: deepgram, voxtral
+TRANSCRIPTION_PROVIDER=deepgram
+
+# Deepgram API Key (required for Deepgram provider)
 DEEPGRAM_API_KEY=your_deepgram_api_key
 
+# Mistral API Key (required for VoxTral provider)
+MISTRAL_API_KEY=your_mistral_api_key
+
 # REST API (optional)
 HTTP_PORT=8000
 
@@ -92,8 +99,10 @@ PGVECTOR_DATABASE=satellite
 - `MQTT_URL`: URL of the MQTT broker
 - `MQTT_TOPIC_PREFIX`: Prefix for MQTT topics
 
-#### Deepgram Configuration
-- `DEEPGRAM_API_KEY`: Your Deepgram API key
+#### Transcription Configuration
+- `TRANSCRIPTION_PROVIDER`: Choose the transcription provider (`deepgram` or `voxtral`, default: `deepgram`)
+- `DEEPGRAM_API_KEY`: Your Deepgram API key (required for Deepgram provider)
+- `MISTRAL_API_KEY`: Your Mistral API key (required for VoxTral provider)
 
 #### Rest API Configuration
 - `HTTP_PORT`: Port for the HTTP server (default: 8000)
@@ -125,28 +134,38 @@ This requires the `vector` extension (pgvector) in your Postgres instance.
 
 #### `POST /api/get_transcription`
 
-Accepts a WAV upload and returns a Deepgram transcription.
+Accepts a WAV upload and returns a transcription from the configured provider (Deepgram or VoxTral).
 
 Request requirements:
 - Content type: multipart form upload with a `file` field (`audio/wav` or `audio/x-wav`)
 
 Optional fields (query string or multipart form fields):
+- `provider`: Override the transcription provider (`deepgram` or `voxtral`). If not set, uses `TRANSCRIPTION_PROVIDER` env var (default: `deepgram`)
 - `uniqueid`: Asterisk-style uniqueid like `1234567890.1234` (required only when `persist=true`)
 - `persist`: `true|false` (default `false`) — persist raw transcript to Postgres (requires `PGVECTOR_*` env vars)
 - `summary`: `true|false` (default `false`) — run AI enrichment (requires `OPENAI_API_KEY` and also `persist=true` so there is a DB record to update)
-- `channel0_name`, `channel1_name`: rename diarization labels in the returned transcript (replaces `Channel 0:` / `Channel 1:`)
+- `channel0_name`, `channel1_name`: rename diarization labels in the returned transcript (replaces `Channel 0:` / `Channel 1:` or `Speaker 0:` / `Speaker 1:`)
 
-Deepgram parameters:
-- Most Deepgram `/v1/listen` parameters may be provided as query/form fields and are passed through to Deepgram.
+Provider-specific parameters:
+- **Deepgram**: Most Deepgram `/v1/listen` parameters may be provided as query/form fields (e.g., `model`, `language`, `diarize`, `punctuate`)
+- **VoxTral**: Supports `model` (default: `voxtral-mini-latest`), `language`, `diarize`, `temperature`, `context_bias`, `timestamp_granularities`
 
 Example:
 ```
+# Using default provider (from TRANSCRIPTION_PROVIDER env var)
 curl -X POST http://127.0.0.1:8000/api/get_transcription \
     -H 'Authorization: Bearer YOUR_TOKEN' \
     -F uniqueid=1234567890.1234 \
     -F persist=true \
     -F summary=true \
     -F file=@call.wav;type=audio/wav
+
+# Override provider to use VoxTral
+curl -X POST http://127.0.0.1:8000/api/get_transcription \
+    -H 'Authorization: Bearer YOUR_TOKEN' \
+    -F provider=voxtral \
+    -F diarize=true \
+    -F file=@call.wav;type=audio/wav
 ```
 
 Authentication: