AI-powered real-time interview assistant with dual-agent voice recognition and smart Q&A generation.
What it does: Sits as a transparent overlay on top of your video call (Zoom, Teams, Google Meet). Listens to the interviewer's questions in real time, searches your resume and prep materials, and suggests smart answers β all within seconds.
- π€ Real-time voice capture from system audio + microphone
- π£οΈ Speaker diarization β automatically separates interviewer from candidate
- π Live transcription powered by Whisper
- π§ RAG-based answer generation using your resume, job description, and notes
- π‘ Multi-LLM support (OpenAI, Anthropic, Google, DeepSeek)
- π₯οΈ Electron overlay β transparent, always-on-top, floats over any video call
- β‘ WebSocket-based β low-latency real-time streaming
- Python 3.11+
- Node.js 18+
- An LLM API key (OpenAI, Anthropic, etc.)
git clone https://github.com/SangJieGe/Interview-Ace.git
cd Interview-Ace
cp .env.example .env
# Edit .env with your API keyscd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reloadcd frontend
npm install
npm run devcd frontend
npm run electron:devbash scripts/download_models.shdocker-compose up --buildThis starts backend, frontend, and ChromaDB vector store.
ββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
β Video Call β β Interview Ace β
β (Zoom/Teams) ββββββΆβ β
β β β Voice Agent βββ Knowledge Agent β
β Interviewer β β (listen + (search + β
β asks questionβ β transcribe) generate answer) β
β β β β β β
ββββββββββββββββ β βΌ βΌ β
β π Transcript π‘ Answer β
β shown live shown live β
ββββββββββββββββββββββββββββββββββββ
- Voice Agent (Agent 2) captures audio, detects speech, identifies who's speaking, and transcribes using Whisper
- Knowledge Agent (Agent 1) takes the transcribed question, searches your documents via RAG, and generates a contextual answer using an LLM
- Both transcript and answer appear in real-time on the overlay
All configuration is via environment variables. See .env.example for the full list.
| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER |
LLM provider (openai, anthropic, google, deepseek) |
openai |
LLM_API_KEY |
API key for the LLM provider | β |
LLM_MODEL |
Model to use | gpt-4o |
WHISPER_MODEL |
Whisper model size (tiny, base, small, medium, large-v3) |
base |
AUDIO_DEVICE_INDEX |
Audio input device index | 0 |
VECTOR_DB |
Vector database (chromadb, pinecone) |
chromadb |
| Layer | Technology |
|---|---|
| Backend | Python, FastAPI, WebSocket |
| Speech-to-Text | OpenAI Whisper |
| Voice Activity Detection | Silero VAD |
| Speaker Diarization | Embedding-based comparison |
| RAG | ChromaDB + sentence-transformers |
| LLM | OpenAI / Anthropic / Google / DeepSeek |
| Frontend | React, TypeScript, Tailwind CSS |
| Desktop | Electron (transparent overlay) |
Interview-Ace/
βββ backend/ # Python FastAPI server
β βββ api/ # REST + WebSocket routes
β βββ agents/ # Agent 1 (Knowledge) + Agent 2 (Voice)
β βββ core/ # Config, audio utilities
β βββ models/ # Pydantic schemas
β βββ rag/ # RAG retrieval engine
βββ frontend/ # React + Electron app
β βββ electron/ # Electron main process
β βββ src/ # React components + hooks
βββ scripts/ # Setup & utility scripts
βββ docs/ # Architecture & design docs
βββ docker-compose.yml # Docker deployment
See docs/architecture.md for the detailed system design.
- Project scaffolding & architecture
- Audio capture implementation (system + mic)
- Voice Activity Detection (Silero VAD integration)
- Whisper transcription pipeline
- Speaker diarization (voice profile matching)
- RAG knowledge base (document upload + retrieval)
- Knowledge Agent (multi-LLM answer generation)
- WebSocket real-time streaming
- Electron overlay UI
- Voice profile creation wizard
- Session recording & review
- Multi-language support
See CONTRIBUTING.md for development setup and guidelines.
MIT License β see LICENSE for details.