Real-Time AI Debate Coach with Predictive Turn-Taking
A full-stack web application that listens to a user debating in real time, transcribes their speech, predicts natural turn boundaries, and responds with AI-generated coaching feedback β all over a single WebSocket connection.
Built as a final-year project exploring low-latency humanβAI spoken interaction.
| Feature | Detail |
|---|---|
| Streaming STT | Deepgram Nova-3 via WebSocket β real-time interim + final transcripts |
| Predictive turn-taking | Hybrid VAD (RMS energy + optional Silero) with adaptive silence thresholds |
| LLM coaching | Groq-hosted models respond to user arguments with debate feedback |
| TTS playback | AI responses synthesised to audio and streamed back to the browser |
| Google OAuth | One-tap sign-in, JWT session tokens |
| Session history | Past debates stored in PostgreSQL, browseable from the dashboard |
| Latency metrics | End-to-end pipeline timing (STT β LLM β TTS) tracked per session |
| Single-container prod | Multi-stage Docker build bundles the Vite SPA into the FastAPI server |
- Python 3.12 / FastAPI / Uvicorn
- WebSockets β persistent bidirectional audio + control channel
- Deepgram SDK β streaming speech-to-text (Nova-3)
- OpenAI Python SDK β Groq β LLM inference
- SQLAlchemy 2 + asyncpg β async PostgreSQL ORM
- Alembic β database migrations
- PyJWT + bcrypt β authentication
- React 18 / TypeScript / Vite
- Zustand β state management
- React Router 7 β client-side routing
- Web Audio API β microphone capture, 48 kHz β 16 kHz resampling, PCM16 encoding
- Google OAuth (
@react-oauth/google)
- Docker Compose β dev (3 containers) and prod (2 containers) configurations
- PostgreSQL 16 (Alpine)
- GitHub Actions β CI/CD with test gate β SSH deploy
- Azure VM β production host
finalyear/
βββ backend/
β βββ Dockerfile # Multi-stage: Node build β Python runtime
β βββ requirements.txt
β βββ alembic/ # Database migrations
β βββ app/
β β βββ main.py # FastAPI app, CORS, SPA static mount
β β βββ config.py # Pydantic settings (env vars)
β β βββ db/ # SQLAlchemy models, session, init
β β βββ routers/
β β β βββ api.py # REST endpoints (sessions, health)
β β β βββ auth.py # Google OAuth + JWT auth routes
β β β βββ ws_handler.py # WebSocket: audio β STT β turn-taking β LLM β TTS
β β βββ schemas/ # Pydantic message schemas
β β βββ services/
β β βββ stt_service.py # Deepgram / Groq / local STT
β β βββ llm_service.py # LLM coaching responses
β β βββ tts_service.py # Text-to-speech synthesis
β β βββ turn_taking_service.py # Hybrid VAD + silence detection
β β βββ session_service.py # DB session CRUD
β β βββ auth_service.py # JWT + Google token verification
β β βββ metrics_service.py # Pipeline latency tracking
β β βββ latency_tracker.py # Per-stage timing
β βββ tests/ # Unit + integration tests (pytest)
βββ web/
β βββ src/
β β βββ components/ # Layout, Transcript, TurnIndicator, etc.
β β βββ hooks/ # useAudioCapture, useWebSocket
β β βββ pages/ # Auth, Debate, Dashboard, History, etc.
β β βββ stores/ # Zustand stores (auth, app, debate)
β βββ public/
βββ shared/types/ # TypeScript message type definitions
βββ docker-compose.yml # Development stack
βββ docker-compose.prod.yml # Production stack
βββ Makefile # Convenience commands
βββ .github/workflows/
βββ deploy-azure.yml # CI/CD: test β deploy over SSH
- Docker and Docker Compose v2+
- A Deepgram API key (free $200 credit)
- A Groq API key (free tier)
- A Google OAuth Client ID (from Google Cloud Console)
git clone https://github.com/nekumartins/finalyear.git
cd finalyearcp .env.example .env # or create manuallyRequired variables:
# API Keys
DEEPGRAM_API_KEY=your_deepgram_key
GROQ_API_KEY=your_groq_key
# Auth
SECRET_KEY=some-random-secret-string
VITE_GOOGLE_CLIENT_ID=your_google_oauth_client_id.apps.googleusercontent.com
# STT provider: deepgram (default), groq, or faster-whisper
STT_PROVIDER=deepgram
# CORS (production domain, or * for dev)
ALLOWED_ORIGINS=*make dev
# or: docker compose up --buildThis starts:
| Service | URL |
|---|---|
| Frontend (Vite HMR) | http://localhost:3000 |
| Backend (FastAPI) | http://localhost:8000 |
| PostgreSQL | localhost:5432 |
make prod
# or: docker compose -f docker-compose.prod.yml up --build -dIn production, the Vite SPA is compiled into the backend container and served from FastAPI at http://localhost:8000.
| Command | Description |
|---|---|
make dev |
Start dev stack (hot-reload, foreground) |
make dev-d |
Start dev stack (detached) |
make prod |
Build + start production stack |
make down |
Stop dev stack |
make prod-down |
Stop production stack |
make logs |
Tail all logs (make logs s=backend for one service) |
make ps |
Show running containers |
make shell |
Open bash in backend container |
make migrate |
Run Alembic migrations |
# With conda/venv (from repo root):
pip install -r backend/requirements.txt
python -m pytest backend/tests/ -x --tb=short
# Inside the backend container:
make shell
pytest backend/tests/ -x --tb=shortThe GitHub Actions workflow (.github/workflows/deploy-azure.yml) runs on every push to master:
- Test β installs dependencies and runs
pytest(unit tests, no integration/edge) - Deploy β SSHs into the production VM, pulls the latest code, rebuilds Docker containers
| Secret | Value |
|---|---|
AZURE_SSH_PRIVATE_KEY |
SSH private key for the production server |
AZURE_SSH_HOST |
Server hostname / IP |
AZURE_SSH_USER |
SSH username |
Optional: AZURE_SSH_PORT (default 22), AZURE_SSH_KNOWN_HOSTS, AZURE_APP_DIR.
Browser Server (Azure VM)
ββββββββββββ WebSocket ββββββββββββββββββββββββββββββββββ
β React βββββββββββββββββββββΊβ FastAPI ws_handler.py β
β App β audio_chunk (b64) β β
β β transcripts β ββββββββββββ βββββββββββββ β
β Audio β ai_response β β Deepgram β β Groq LLM β β
β Capture β tts_audio β β STT (WS) β β (REST) β β
β 16kHz β state changes β ββββββββββββ βββββββββββββ β
ββββββββββββ β β
β ββββββββββββ βββββββββββββ β
β β Turn- β β TTS β β
β β Taking β β Service β β
β β (VAD) β β β β
β ββββββββββββ βββββββββββββ β
β β
β ββββββββββββββββββββββββββββ β
β β PostgreSQL (sessions, β β
β β users, metrics) β β
β ββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββ
Audio pipeline: Mic β ScriptProcessorNode (48 kHz) β resample to 16 kHz β PCM16 LE β base64 β WebSocket β Deepgram streaming β interim/final transcripts β hybrid turn-taking β Groq LLM β TTS β audio back to browser.
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://... |
Async Postgres connection string |
SECRET_KEY |
Random per-process | JWT signing key (set explicitly in prod) |
DEEPGRAM_API_KEY |
β | Deepgram API key for streaming STT |
GROQ_API_KEY |
β | Groq API key for LLM inference |
STT_PROVIDER |
deepgram |
STT engine: deepgram, groq, or faster-whisper |
VITE_GOOGLE_CLIENT_ID |
β | Google OAuth client ID (build-time) |
ALLOWED_ORIGINS |
β | Comma-separated CORS origins, or * |
DEBUG |
false |
Enable debug logging |
VAD_THRESHOLD |
0.5 |
Silero VAD confidence threshold |
This project is part of a final-year academic submission. All rights reserved.