-
Notifications
You must be signed in to change notification settings - Fork 3
Multi-Provider Transcription Support (REST/Batch Path) #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
b5889b6
79e6d09
9184d06
c4aef6b
5e5f4e4
8ceafa3
c708860
d6b07da
c29afed
3cfa107
1a7e13e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| # Copilot Instructions — Satellite | ||
|
|
||
| ## Build & Test | ||
|
|
||
| ```bash | ||
| # Install | ||
| pip install -r requirements.txt | ||
| pip install -r requirements-dev.txt # test deps (pytest, httpx, etc.) | ||
|
|
||
| # Run all tests (with coverage) | ||
| pytest | ||
|
|
||
| # Run a single test file / single test | ||
| pytest tests/test_api.py | ||
| pytest tests/test_api.py::test_get_transcription_success -k "test_get_transcription_success" | ||
|
|
||
| # Run the app | ||
| python main.py | ||
| ``` | ||
|
|
||
| Python 3.12+. No linter configured in CI — only `pytest` runs in the build workflow. | ||
| Container image uses `Containerfile` (multi-stage, python:slim base). | ||
|
|
||
| ## Architecture | ||
|
|
||
| Satellite bridges Asterisk PBX ↔ transcription providers (Deepgram or VoxTral), publishing results over MQTT. | ||
|
|
||
| ### Runtime components (all in one process) | ||
|
|
||
| | Module | Role | | ||
| |---|---| | ||
| | `main.py` | Entrypoint — starts the asyncio event loop for the real-time pipeline and a background thread running the FastAPI/Uvicorn HTTP server | | ||
| | `asterisk_bridge.py` | ARI WebSocket client — listens for Stasis events, creates snoop channels + external media, manages per-call lifecycle | | ||
| | `rtp_server.py` | UDP server — receives RTP audio, strips headers, routes packets to per-channel async queues by source port | | ||
| | `deepgram_connector.py` | Streams audio to Deepgram via WebSocket — interleaves two RTP channels into stereo for multichannel transcription; aggregates final transcript on hangup (real-time path only, Deepgram-only for now) | | ||
| | `mqtt_client.py` | Publishes interim/final transcription JSON to MQTT topics (`{prefix}/transcription`, `{prefix}/final`) | | ||
| | `transcription/` | **Provider abstraction** — `base.py` defines interface; `deepgram.py` and `voxtral.py` implement REST API clients; `__init__.py` factory selects provider via env var or per-request override | | ||
| | `api.py` | FastAPI app — `POST /api/get_transcription` accepts WAV uploads, calls transcription provider REST API, optionally persists to Postgres | | ||
| | `call_processor.py` | **Runs as a subprocess** (invoked from api.py via `subprocess.run`) — reads JSON from stdin, calls AI enrichment, writes results to DB | | ||
| | `ai.py` | LangChain + OpenAI — cleans transcript, generates summary + sentiment score (0-10) | | ||
| | `db.py` | PostgreSQL + pgvector — schema auto-init with threading lock; stores transcripts, state machine (`progress` → `summarizing` → `done` / `failed`), and text-embedding-3-small chunks | | ||
|
|
||
| ### Key data flows | ||
|
|
||
| 1. **Real-time path:** Asterisk → ARI WebSocket → snoop channel → RTP → `rtp_server` → `deepgram_connector` (stereo WebSocket stream) → Deepgram → `mqtt_client` (Deepgram-only for now) | ||
| 2. **REST/batch path:** WAV upload → `api.py` → `transcription/<provider>` REST API (Deepgram or VoxTral) → (optionally) `db.py` persist → (optionally) `call_processor.py` subprocess → `ai.py` → `db.py` update | ||
|
|
||
| ### Non-obvious details | ||
|
|
||
| - Two RTP streams per call (one per direction) are interleaved into a single stereo buffer for Deepgram's multichannel mode (real-time path only). | ||
| - `asterisk_bridge` detects if Asterisk swapped the RTP source ports and adjusts speaker labels accordingly. | ||
| - `call_processor` is deliberately a **subprocess** (not async task) — isolates OpenAI calls with independent timeout/logging, avoids blocking the event loop. | ||
| - DB schema initialization is guarded by a **threading lock** (not asyncio lock) because `psycopg` sync connections are used alongside the async FastAPI server. | ||
| - **Multi-provider support:** REST/batch path supports Deepgram and VoxTral. Select provider via `TRANSCRIPTION_PROVIDER` env var (default: `deepgram`) or per-request `provider=` parameter. Real-time path remains Deepgram-only. | ||
|
|
||
| ## Conventions | ||
|
|
||
| - **Config:** Exclusively via environment variables (loaded from `.env` by `python-dotenv`). No config files or CLI args. | ||
| - **Logging:** One logger per module (`logging.getLogger(__name__)`), level controlled by `LOG_LEVEL` env var. | ||
| - **Async:** `asyncio` throughout the real-time pipeline; `asyncio.Lock` for connector close logic, `asyncio.Queue` for RTP buffer routing. Reconnection uses exponential backoff. | ||
| - **Testing:** `pytest-asyncio` with `asyncio_mode = auto`. Tests monkeypatch env vars and mock external services (Deepgram, MQTT, psycopg). A conftest auto-fixture resets `db._schema_initialized` between tests. | ||
| - **Auth:** Optional static bearer token (`API_TOKEN` env var) for `/api/*` endpoints. Accepts `Authorization: Bearer <token>` or `X-API-Token: <token>`. | ||
| - **Validation:** `uniqueid` must match `\d+\.\d+` (Asterisk format). |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -49,9 +49,16 @@ RTP_HEADER_SIZE=12 | |
| MQTT_URL=mqtt://127.0.0.1:1883 | ||
| MQTT_TOPIC_PREFIX=satellite | ||
|
|
||
| # Deepgram API Key | ||
| # Transcription Provider (optional, default: deepgram) | ||
| # Options: deepgram, voxtral | ||
| TRANSCRIPTION_PROVIDER=deepgram | ||
|
|
||
| # Deepgram API Key (required for Deepgram provider) | ||
| DEEPGRAM_API_KEY=your_deepgram_api_key | ||
|
|
||
| # Mistral API Key (required for VoxTral provider) | ||
| MISTRAL_API_KEY=your_mistral_api_key | ||
|
|
||
| # REST API (optional) | ||
| HTTP_PORT=8000 | ||
|
|
||
|
|
@@ -92,8 +99,10 @@ PGVECTOR_DATABASE=satellite | |
| - `MQTT_URL`: URL of the MQTT broker | ||
| - `MQTT_TOPIC_PREFIX`: Prefix for MQTT topics | ||
|
|
||
| #### Deepgram Configuration | ||
| - `DEEPGRAM_API_KEY`: Your Deepgram API key | ||
| #### Transcription Configuration | ||
| - `TRANSCRIPTION_PROVIDER`: Choose the transcription provider (`deepgram` or `voxtral`, default: `deepgram`) | ||
| - `DEEPGRAM_API_KEY`: Your Deepgram API key (required for Deepgram provider) | ||
| - `MISTRAL_API_KEY`: Your Mistral API key (required for VoxTral provider) | ||
|
|
||
| #### Rest API Configuration | ||
| - `HTTP_PORT`: Port for the HTTP server (default: 8000) | ||
|
|
@@ -125,28 +134,38 @@ This requires the `vector` extension (pgvector) in your Postgres instance. | |
|
|
||
| #### `POST /api/get_transcription` | ||
|
|
||
| Accepts a WAV upload and returns a Deepgram transcription. | ||
| Accepts a WAV upload and returns a transcription from the configured provider (Deepgram or VoxTral). | ||
|
|
||
| Request requirements: | ||
| - Content type: multipart form upload with a `file` field (`audio/wav` or `audio/x-wav`) | ||
|
|
||
| Optional fields (query string or multipart form fields): | ||
| - `provider`: Override the transcription provider (`deepgram` or `voxtral`). If not set, uses `TRANSCRIPTION_PROVIDER` env var (default: `deepgram`) | ||
| - `uniqueid`: Asterisk-style uniqueid like `1234567890.1234` (required only when `persist=true`) | ||
| - `persist`: `true|false` (default `false`) — persist raw transcript to Postgres (requires `PGVECTOR_*` env vars) | ||
| - `summary`: `true|false` (default `false`) — run AI enrichment (requires `OPENAI_API_KEY` and also `persist=true` so there is a DB record to update) | ||
| - `channel0_name`, `channel1_name`: rename diarization labels in the returned transcript (replaces `Channel 0:` / `Channel 1:`) | ||
| - `channel0_name`, `channel1_name`: rename diarization labels in the returned transcript (replaces `Channel 0:` / `Channel 1:` or `Speaker 0:` / `Speaker 1:`) | ||
|
|
||
| Deepgram parameters: | ||
| - Most Deepgram `/v1/listen` parameters may be provided as query/form fields and are passed through to Deepgram. | ||
| Provider-specific parameters: | ||
| - **Deepgram**: Most Deepgram `/v1/listen` parameters may be provided as query/form fields (e.g., `model`, `language`, `diarize`, `punctuate`) | ||
| - **VoxTral**: Supports `model` (default: `voxtral-mini-latest`), `language`, `diarize`, `temperature`, `context_bias`, `timestamp_granularities` | ||
|
Comment on lines
+147
to
+151
|
||
|
|
||
| Example: | ||
| ``` | ||
| # Using default provider (from TRANSCRIPTION_PROVIDER env var) | ||
| curl -X POST http://127.0.0.1:8000/api/get_transcription \ | ||
| -H 'Authorization: Bearer YOUR_TOKEN' \ | ||
| -F uniqueid=1234567890.1234 \ | ||
| -F persist=true \ | ||
| -F summary=true \ | ||
| -F file=@call.wav;type=audio/wav | ||
|
|
||
| # Override provider to use VoxTral | ||
| curl -X POST http://127.0.0.1:8000/api/get_transcription \ | ||
| -H 'Authorization: Bearer YOUR_TOKEN' \ | ||
| -F provider=voxtral \ | ||
| -F diarize=true \ | ||
| -F file=@call.wav;type=audio/wav | ||
| ``` | ||
|
|
||
| Authentication: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Container builds install only
requirements.txt, but the new transcription providers (andapi.py) import/usehttpxdirectly.httpxis currently only listed inrequirements-dev.txt, so the runtime image may rely on a transitive dependency (or fail if that transitive dep changes). Addhttpxtorequirements.txtto make the container/runtime dependency explicit.