Skip to content

SangJieGe/Interview-Ace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Interview Ace πŸŽ™οΈ

AI-powered real-time interview assistant with dual-agent voice recognition and smart Q&A generation.

What it does: Sits as a transparent overlay on top of your video call (Zoom, Teams, Google Meet). Listens to the interviewer's questions in real time, searches your resume and prep materials, and suggests smart answers β€” all within seconds.

Features

  • 🎀 Real-time voice capture from system audio + microphone
  • πŸ—£οΈ Speaker diarization β€” automatically separates interviewer from candidate
  • πŸ“ Live transcription powered by Whisper
  • 🧠 RAG-based answer generation using your resume, job description, and notes
  • πŸ’‘ Multi-LLM support (OpenAI, Anthropic, Google, DeepSeek)
  • πŸ–₯️ Electron overlay β€” transparent, always-on-top, floats over any video call
  • ⚑ WebSocket-based β€” low-latency real-time streaming

Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • An LLM API key (OpenAI, Anthropic, etc.)

1. Clone & Configure

git clone https://github.com/SangJieGe/Interview-Ace.git
cd Interview-Ace
cp .env.example .env
# Edit .env with your API keys

2. Backend

cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload

3. Frontend

cd frontend
npm install
npm run dev

4. Electron Overlay (optional)

cd frontend
npm run electron:dev

5. Download ML Models

bash scripts/download_models.sh

Docker (Alternative)

docker-compose up --build

This starts backend, frontend, and ChromaDB vector store.

How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Video Call   β”‚     β”‚         Interview Ace             β”‚
β”‚  (Zoom/Teams) │────▢│                                  β”‚
β”‚               β”‚     β”‚  Voice Agent ──→ Knowledge Agent  β”‚
β”‚  Interviewer  β”‚     β”‚  (listen +      (search +        β”‚
β”‚  asks questionβ”‚     β”‚   transcribe)    generate answer) β”‚
β”‚               β”‚     β”‚       β”‚                β”‚          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚       β–Ό                β–Ό          β”‚
                      β”‚  πŸ“ Transcript   πŸ’‘ Answer        β”‚
                      β”‚  shown live      shown live       β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. Voice Agent (Agent 2) captures audio, detects speech, identifies who's speaking, and transcribes using Whisper
  2. Knowledge Agent (Agent 1) takes the transcribed question, searches your documents via RAG, and generates a contextual answer using an LLM
  3. Both transcript and answer appear in real-time on the overlay

Configuration

All configuration is via environment variables. See .env.example for the full list.

Variable Description Default
LLM_PROVIDER LLM provider (openai, anthropic, google, deepseek) openai
LLM_API_KEY API key for the LLM provider β€”
LLM_MODEL Model to use gpt-4o
WHISPER_MODEL Whisper model size (tiny, base, small, medium, large-v3) base
AUDIO_DEVICE_INDEX Audio input device index 0
VECTOR_DB Vector database (chromadb, pinecone) chromadb

Tech Stack

Layer Technology
Backend Python, FastAPI, WebSocket
Speech-to-Text OpenAI Whisper
Voice Activity Detection Silero VAD
Speaker Diarization Embedding-based comparison
RAG ChromaDB + sentence-transformers
LLM OpenAI / Anthropic / Google / DeepSeek
Frontend React, TypeScript, Tailwind CSS
Desktop Electron (transparent overlay)

Project Structure

Interview-Ace/
β”œβ”€β”€ backend/              # Python FastAPI server
β”‚   β”œβ”€β”€ api/              # REST + WebSocket routes
β”‚   β”œβ”€β”€ agents/           # Agent 1 (Knowledge) + Agent 2 (Voice)
β”‚   β”œβ”€β”€ core/             # Config, audio utilities
β”‚   β”œβ”€β”€ models/           # Pydantic schemas
β”‚   └── rag/              # RAG retrieval engine
β”œβ”€β”€ frontend/             # React + Electron app
β”‚   β”œβ”€β”€ electron/         # Electron main process
β”‚   └── src/              # React components + hooks
β”œβ”€β”€ scripts/              # Setup & utility scripts
β”œβ”€β”€ docs/                 # Architecture & design docs
└── docker-compose.yml    # Docker deployment

See docs/architecture.md for the detailed system design.

Roadmap

  • Project scaffolding & architecture
  • Audio capture implementation (system + mic)
  • Voice Activity Detection (Silero VAD integration)
  • Whisper transcription pipeline
  • Speaker diarization (voice profile matching)
  • RAG knowledge base (document upload + retrieval)
  • Knowledge Agent (multi-LLM answer generation)
  • WebSocket real-time streaming
  • Electron overlay UI
  • Voice profile creation wizard
  • Session recording & review
  • Multi-language support

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

MIT License β€” see LICENSE for details.

About

πŸŽ™οΈ AI-powered real-time interview assistant with dual-agent voice recognition and smart Q&A generation.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors