Skip to content

nekumartins/finalyear

Repository files navigation

πŸŽ™οΈ AI Debate Coach

Real-Time AI Debate Coach with Predictive Turn-Taking

A full-stack web application that listens to a user debating in real time, transcribes their speech, predicts natural turn boundaries, and responds with AI-generated coaching feedback β€” all over a single WebSocket connection.

Built as a final-year project exploring low-latency human–AI spoken interaction.


Features

Feature Detail
Streaming STT Deepgram Nova-3 via WebSocket β€” real-time interim + final transcripts
Predictive turn-taking Hybrid VAD (RMS energy + optional Silero) with adaptive silence thresholds
LLM coaching Groq-hosted models respond to user arguments with debate feedback
TTS playback AI responses synthesised to audio and streamed back to the browser
Google OAuth One-tap sign-in, JWT session tokens
Session history Past debates stored in PostgreSQL, browseable from the dashboard
Latency metrics End-to-end pipeline timing (STT β†’ LLM β†’ TTS) tracked per session
Single-container prod Multi-stage Docker build bundles the Vite SPA into the FastAPI server

Tech Stack

Backend

  • Python 3.12 / FastAPI / Uvicorn
  • WebSockets β€” persistent bidirectional audio + control channel
  • Deepgram SDK β€” streaming speech-to-text (Nova-3)
  • OpenAI Python SDK β†’ Groq β€” LLM inference
  • SQLAlchemy 2 + asyncpg β€” async PostgreSQL ORM
  • Alembic β€” database migrations
  • PyJWT + bcrypt β€” authentication

Frontend

  • React 18 / TypeScript / Vite
  • Zustand β€” state management
  • React Router 7 β€” client-side routing
  • Web Audio API β€” microphone capture, 48 kHz β†’ 16 kHz resampling, PCM16 encoding
  • Google OAuth (@react-oauth/google)

Infrastructure

  • Docker Compose β€” dev (3 containers) and prod (2 containers) configurations
  • PostgreSQL 16 (Alpine)
  • GitHub Actions β€” CI/CD with test gate β†’ SSH deploy
  • Azure VM β€” production host

Project Structure

finalyear/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ Dockerfile              # Multi-stage: Node build β†’ Python runtime
β”‚   β”œβ”€β”€ requirements.txt
β”‚   β”œβ”€β”€ alembic/                # Database migrations
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ main.py             # FastAPI app, CORS, SPA static mount
β”‚   β”‚   β”œβ”€β”€ config.py           # Pydantic settings (env vars)
β”‚   β”‚   β”œβ”€β”€ db/                 # SQLAlchemy models, session, init
β”‚   β”‚   β”œβ”€β”€ routers/
β”‚   β”‚   β”‚   β”œβ”€β”€ api.py          # REST endpoints (sessions, health)
β”‚   β”‚   β”‚   β”œβ”€β”€ auth.py         # Google OAuth + JWT auth routes
β”‚   β”‚   β”‚   └── ws_handler.py   # WebSocket: audio β†’ STT β†’ turn-taking β†’ LLM β†’ TTS
β”‚   β”‚   β”œβ”€β”€ schemas/            # Pydantic message schemas
β”‚   β”‚   └── services/
β”‚   β”‚       β”œβ”€β”€ stt_service.py          # Deepgram / Groq / local STT
β”‚   β”‚       β”œβ”€β”€ llm_service.py          # LLM coaching responses
β”‚   β”‚       β”œβ”€β”€ tts_service.py          # Text-to-speech synthesis
β”‚   β”‚       β”œβ”€β”€ turn_taking_service.py  # Hybrid VAD + silence detection
β”‚   β”‚       β”œβ”€β”€ session_service.py      # DB session CRUD
β”‚   β”‚       β”œβ”€β”€ auth_service.py         # JWT + Google token verification
β”‚   β”‚       β”œβ”€β”€ metrics_service.py      # Pipeline latency tracking
β”‚   β”‚       └── latency_tracker.py      # Per-stage timing
β”‚   └── tests/                  # Unit + integration tests (pytest)
β”œβ”€β”€ web/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/         # Layout, Transcript, TurnIndicator, etc.
β”‚   β”‚   β”œβ”€β”€ hooks/              # useAudioCapture, useWebSocket
β”‚   β”‚   β”œβ”€β”€ pages/              # Auth, Debate, Dashboard, History, etc.
β”‚   β”‚   └── stores/             # Zustand stores (auth, app, debate)
β”‚   └── public/
β”œβ”€β”€ shared/types/               # TypeScript message type definitions
β”œβ”€β”€ docker-compose.yml          # Development stack
β”œβ”€β”€ docker-compose.prod.yml     # Production stack
β”œβ”€β”€ Makefile                    # Convenience commands
└── .github/workflows/
    └── deploy-azure.yml        # CI/CD: test β†’ deploy over SSH

Getting Started

Prerequisites

1. Clone the repo

git clone https://github.com/nekumartins/finalyear.git
cd finalyear

2. Create your .env file

cp .env.example .env   # or create manually

Required variables:

# API Keys
DEEPGRAM_API_KEY=your_deepgram_key
GROQ_API_KEY=your_groq_key

# Auth
SECRET_KEY=some-random-secret-string
VITE_GOOGLE_CLIENT_ID=your_google_oauth_client_id.apps.googleusercontent.com

# STT provider: deepgram (default), groq, or faster-whisper
STT_PROVIDER=deepgram

# CORS (production domain, or * for dev)
ALLOWED_ORIGINS=*

3. Run the dev stack

make dev
# or: docker compose up --build

This starts:

Service URL
Frontend (Vite HMR) http://localhost:3000
Backend (FastAPI) http://localhost:8000
PostgreSQL localhost:5432

4. Run the production stack

make prod
# or: docker compose -f docker-compose.prod.yml up --build -d

In production, the Vite SPA is compiled into the backend container and served from FastAPI at http://localhost:8000.


Makefile Commands

Command Description
make dev Start dev stack (hot-reload, foreground)
make dev-d Start dev stack (detached)
make prod Build + start production stack
make down Stop dev stack
make prod-down Stop production stack
make logs Tail all logs (make logs s=backend for one service)
make ps Show running containers
make shell Open bash in backend container
make migrate Run Alembic migrations

Running Tests

# With conda/venv (from repo root):
pip install -r backend/requirements.txt
python -m pytest backend/tests/ -x --tb=short

# Inside the backend container:
make shell
pytest backend/tests/ -x --tb=short

CI/CD

The GitHub Actions workflow (.github/workflows/deploy-azure.yml) runs on every push to master:

  1. Test β€” installs dependencies and runs pytest (unit tests, no integration/edge)
  2. Deploy β€” SSHs into the production VM, pulls the latest code, rebuilds Docker containers

Required GitHub Secrets

Secret Value
AZURE_SSH_PRIVATE_KEY SSH private key for the production server
AZURE_SSH_HOST Server hostname / IP
AZURE_SSH_USER SSH username

Optional: AZURE_SSH_PORT (default 22), AZURE_SSH_KNOWN_HOSTS, AZURE_APP_DIR.


Architecture

Browser                          Server (Azure VM)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    WebSocket       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  React   │◄──────────────────►│  FastAPI  ws_handler.py        β”‚
β”‚  App     β”‚  audio_chunk (b64) β”‚                                β”‚
β”‚          β”‚  transcripts       β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  Audio   β”‚  ai_response       β”‚  β”‚ Deepgram β”‚  β”‚ Groq LLM  β”‚  β”‚
β”‚  Capture β”‚  tts_audio         β”‚  β”‚ STT (WS) β”‚  β”‚ (REST)    β”‚  β”‚
β”‚  16kHz   β”‚  state changes     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚                                β”‚
                                β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                                β”‚  β”‚ Turn-    β”‚  β”‚ TTS       β”‚  β”‚
                                β”‚  β”‚ Taking   β”‚  β”‚ Service   β”‚  β”‚
                                β”‚  β”‚ (VAD)    β”‚  β”‚           β”‚  β”‚
                                β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                                β”‚                                β”‚
                                β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                                β”‚  β”‚ PostgreSQL (sessions,    β”‚  β”‚
                                β”‚  β”‚ users, metrics)          β”‚  β”‚
                                β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Audio pipeline: Mic β†’ ScriptProcessorNode (48 kHz) β†’ resample to 16 kHz β†’ PCM16 LE β†’ base64 β†’ WebSocket β†’ Deepgram streaming β†’ interim/final transcripts β†’ hybrid turn-taking β†’ Groq LLM β†’ TTS β†’ audio back to browser.


Environment Variables

Variable Default Description
DATABASE_URL postgresql+asyncpg://... Async Postgres connection string
SECRET_KEY Random per-process JWT signing key (set explicitly in prod)
DEEPGRAM_API_KEY β€” Deepgram API key for streaming STT
GROQ_API_KEY β€” Groq API key for LLM inference
STT_PROVIDER deepgram STT engine: deepgram, groq, or faster-whisper
VITE_GOOGLE_CLIENT_ID β€” Google OAuth client ID (build-time)
ALLOWED_ORIGINS β€” Comma-separated CORS origins, or *
DEBUG false Enable debug logging
VAD_THRESHOLD 0.5 Silero VAD confidence threshold

License

This project is part of a final-year academic submission. All rights reserved.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors