Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Copilot Instructions — Satellite

## Build & Test

```bash
# Install
pip install -r requirements.txt
pip install -r requirements-dev.txt # test deps (pytest, httpx, etc.)

# Run all tests (with coverage)
pytest

# Run a single test file / single test
pytest tests/test_api.py
pytest tests/test_api.py::test_get_transcription_success -k "test_get_transcription_success"

# Run the app
python main.py
```

Python 3.12+. No linter configured in CI — only `pytest` runs in the build workflow.
Container image uses `Containerfile` (multi-stage, python:slim base).

## Architecture

Satellite bridges Asterisk PBX ↔ transcription providers (Deepgram or VoxTral), publishing results over MQTT.

### Runtime components (all in one process)

| Module | Role |
|---|---|
| `main.py` | Entrypoint — starts the asyncio event loop for the real-time pipeline and a background thread running the FastAPI/Uvicorn HTTP server |
| `asterisk_bridge.py` | ARI WebSocket client — listens for Stasis events, creates snoop channels + external media, manages per-call lifecycle |
| `rtp_server.py` | UDP server — receives RTP audio, strips headers, routes packets to per-channel async queues by source port |
| `deepgram_connector.py` | Streams audio to Deepgram via WebSocket — interleaves two RTP channels into stereo for multichannel transcription; aggregates final transcript on hangup (real-time path only, Deepgram-only for now) |
| `mqtt_client.py` | Publishes interim/final transcription JSON to MQTT topics (`{prefix}/transcription`, `{prefix}/final`) |
| `transcription/` | **Provider abstraction** — `base.py` defines interface; `deepgram.py` and `voxtral.py` implement REST API clients; `__init__.py` factory selects provider via env var or per-request override |
| `api.py` | FastAPI app — `POST /api/get_transcription` accepts WAV uploads, calls transcription provider REST API, optionally persists to Postgres |
| `call_processor.py` | **Runs as a subprocess** (invoked from api.py via `subprocess.run`) — reads JSON from stdin, calls AI enrichment, writes results to DB |
| `ai.py` | LangChain + OpenAI — cleans transcript, generates summary + sentiment score (0-10) |
| `db.py` | PostgreSQL + pgvector — schema auto-init with threading lock; stores transcripts, state machine (`progress` → `summarizing` → `done` / `failed`), and text-embedding-3-small chunks |

### Key data flows

1. **Real-time path:** Asterisk → ARI WebSocket → snoop channel → RTP → `rtp_server` → `deepgram_connector` (stereo WebSocket stream) → Deepgram → `mqtt_client` (Deepgram-only for now)
2. **REST/batch path:** WAV upload → `api.py` → `transcription/<provider>` REST API (Deepgram or VoxTral) → (optionally) `db.py` persist → (optionally) `call_processor.py` subprocess → `ai.py` → `db.py` update

### Non-obvious details

- Two RTP streams per call (one per direction) are interleaved into a single stereo buffer for Deepgram's multichannel mode (real-time path only).
- `asterisk_bridge` detects if Asterisk swapped the RTP source ports and adjusts speaker labels accordingly.
- `call_processor` is deliberately a **subprocess** (not async task) — isolates OpenAI calls with independent timeout/logging, avoids blocking the event loop.
- DB schema initialization is guarded by a **threading lock** (not asyncio lock) because `psycopg` sync connections are used alongside the async FastAPI server.
- **Multi-provider support:** REST/batch path supports Deepgram and VoxTral. Select provider via `TRANSCRIPTION_PROVIDER` env var (default: `deepgram`) or per-request `provider=` parameter. Real-time path remains Deepgram-only.

## Conventions

- **Config:** Exclusively via environment variables (loaded from `.env` by `python-dotenv`). No config files or CLI args.
- **Logging:** One logger per module (`logging.getLogger(__name__)`), level controlled by `LOG_LEVEL` env var.
- **Async:** `asyncio` throughout the real-time pipeline; `asyncio.Lock` for connector close logic, `asyncio.Queue` for RTP buffer routing. Reconnection uses exponential backoff.
- **Testing:** `pytest-asyncio` with `asyncio_mode = auto`. Tests monkeypatch env vars and mock external services (Deepgram, MQTT, psycopg). A conftest auto-fixture resets `db._schema_initialized` between tests.
- **Auth:** Optional static bearer token (`API_TOKEN` env var) for `/api/*` endpoints. Accepts `Authorization: Bearer <token>` or `X-API-Token: <token>`.
- **Validation:** `uniqueid` must match `\d+\.\d+` (Asterisk format).
4 changes: 4 additions & 0 deletions Containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ COPY requirements.txt /tmp/requirements.txt
# Copy application files
COPY *.py /tmp/
COPY README.md /tmp/
COPY transcription /tmp/transcription

# Install dependencies
RUN pip install --no-cache-dir --no-warn-script-location --user -r /tmp/requirements.txt
Comment on lines +18 to 21
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Container builds install only requirements.txt, but the new transcription providers (and api.py) import/use httpx directly. httpx is currently only listed in requirements-dev.txt, so the runtime image may rely on a transitive dependency (or fail if that transitive dep changes). Add httpx to requirements.txt to make the container/runtime dependency explicit.

Copilot uses AI. Check for mistakes.
Expand All @@ -36,6 +37,7 @@ COPY --from=builder /root/.local /root/.local
# Copy application files
COPY --from=builder /tmp/*.py /app/
COPY --from=builder /tmp/README.md /app/
COPY --from=builder /tmp/transcription /app/transcription

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH
Expand All @@ -55,7 +57,9 @@ ENV ASTERISK_URL="http://127.0.0.1:8088" \
MQTT_USERNAME="satellite" \
SATELLITE_MQTT_PASSWORD="dummypassword" \
HTTP_PORT="8000" \
TRANSCRIPTION_PROVIDER="deepgram" \
DEEPGRAM_API_KEY="" \
MISTRAL_API_KEY="" \
LOG_LEVEL="INFO" \
PYTHONUNBUFFERED="1"

Expand Down
33 changes: 26 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,16 @@ RTP_HEADER_SIZE=12
MQTT_URL=mqtt://127.0.0.1:1883
MQTT_TOPIC_PREFIX=satellite

# Deepgram API Key
# Transcription Provider (optional, default: deepgram)
# Options: deepgram, voxtral
TRANSCRIPTION_PROVIDER=deepgram

# Deepgram API Key (required for Deepgram provider)
DEEPGRAM_API_KEY=your_deepgram_api_key

# Mistral API Key (required for VoxTral provider)
MISTRAL_API_KEY=your_mistral_api_key

# REST API (optional)
HTTP_PORT=8000

Expand Down Expand Up @@ -92,8 +99,10 @@ PGVECTOR_DATABASE=satellite
- `MQTT_URL`: URL of the MQTT broker
- `MQTT_TOPIC_PREFIX`: Prefix for MQTT topics

#### Deepgram Configuration
- `DEEPGRAM_API_KEY`: Your Deepgram API key
#### Transcription Configuration
- `TRANSCRIPTION_PROVIDER`: Choose the transcription provider (`deepgram` or `voxtral`, default: `deepgram`)
- `DEEPGRAM_API_KEY`: Your Deepgram API key (required for Deepgram provider)
- `MISTRAL_API_KEY`: Your Mistral API key (required for VoxTral provider)

#### Rest API Configuration
- `HTTP_PORT`: Port for the HTTP server (default: 8000)
Expand Down Expand Up @@ -125,28 +134,38 @@ This requires the `vector` extension (pgvector) in your Postgres instance.

#### `POST /api/get_transcription`

Accepts a WAV upload and returns a Deepgram transcription.
Accepts a WAV upload and returns a transcription from the configured provider (Deepgram or VoxTral).

Request requirements:
- Content type: multipart form upload with a `file` field (`audio/wav` or `audio/x-wav`)

Optional fields (query string or multipart form fields):
- `provider`: Override the transcription provider (`deepgram` or `voxtral`). If not set, uses `TRANSCRIPTION_PROVIDER` env var (default: `deepgram`)
- `uniqueid`: Asterisk-style uniqueid like `1234567890.1234` (required only when `persist=true`)
- `persist`: `true|false` (default `false`) — persist raw transcript to Postgres (requires `PGVECTOR_*` env vars)
- `summary`: `true|false` (default `false`) — run AI enrichment (requires `OPENAI_API_KEY` and also `persist=true` so there is a DB record to update)
- `channel0_name`, `channel1_name`: rename diarization labels in the returned transcript (replaces `Channel 0:` / `Channel 1:`)
- `channel0_name`, `channel1_name`: rename diarization labels in the returned transcript (replaces `Channel 0:` / `Channel 1:` or `Speaker 0:` / `Speaker 1:`)

Deepgram parameters:
- Most Deepgram `/v1/listen` parameters may be provided as query/form fields and are passed through to Deepgram.
Provider-specific parameters:
- **Deepgram**: Most Deepgram `/v1/listen` parameters may be provided as query/form fields (e.g., `model`, `language`, `diarize`, `punctuate`)
- **VoxTral**: Supports `model` (default: `voxtral-mini-latest`), `language`, `diarize`, `temperature`, `context_bias`, `timestamp_granularities`
Comment on lines +147 to +151
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs say channel0_name/channel1_name renames diarization labels by replacing Channel 0/1: or Speaker 0/1:. VoxTral currently formats diarization labels as speaker_1:/speaker_2: (lowercase + underscore), so these overrides won’t apply for VoxTral unless labels are normalized or the replacement logic is expanded. Please clarify provider-specific behavior here (and/or adjust the code) so users aren’t misled.

Copilot uses AI. Check for mistakes.

Example:
```
# Using default provider (from TRANSCRIPTION_PROVIDER env var)
curl -X POST http://127.0.0.1:8000/api/get_transcription \
-H 'Authorization: Bearer YOUR_TOKEN' \
-F uniqueid=1234567890.1234 \
-F persist=true \
-F summary=true \
-F file=@call.wav;type=audio/wav

# Override provider to use VoxTral
curl -X POST http://127.0.0.1:8000/api/get_transcription \
-H 'Authorization: Bearer YOUR_TOKEN' \
-F provider=voxtral \
-F diarize=true \
-F file=@call.wav;type=audio/wav
```

Authentication:
Expand Down
Loading
Loading