Skip to content

AadiXD200/coffeechat-ai

Repository files navigation

CoffeeChat AI

A live multimodal networking practice app powered by the Gemini Live API. Pick a persona, have a real spoken conversation with an AI that can see and hear you in real time, then get professional feedback on your networking skills when you're done.

Gemini React TypeScript


What it does

You choose who you want to practice with:

Persona Who they are
Alex — Recruiter Senior recruiter at a top tech firm doing a vibe-check coffee chat
Sarah — Mentor Veteran software architect giving career advice to a junior dev
Jamie — Peer Product designer networking casually, like you're in a real café

The AI sees your webcam feed and hears your mic in real time via the Gemini Live API. It responds with voice, comments on your body language and setting, and keeps the conversation natural with human-like filler ("um", "ah"). When you end the chat, a second Gemini call analyses the full transcript and scores your performance.


Features

  • Live audio + video — real-time bidirectional voice conversation with webcam streaming to the model
  • Three distinct personas — each with a different tone, role, and observation style
  • Live transcription panel — see both sides of the conversation transcribed as you speak
  • Post-chat feedback — overall score (0–100), strengths, areas for improvement, and actionable tips generated from the full transcript
  • Multimodal context — the AI can see you and react to your environment, attire, and engagement

Stack

Layer Technology
AI (live conversation) Gemini 2.5 Flash Native Audio (gemini-2.5-flash-native-audio-preview)
AI (feedback analysis) Gemini Flash (gemini-3-flash-preview)
SDK @google/genai Live API
Audio pipeline Web Audio API — PCM capture at 16kHz, playback at 24kHz
Video pipeline getUserMedia → canvas frame capture → base64 JPEG → Live API
Frontend React 18, TypeScript, Vite, Tailwind CSS

How it works

  1. Connect — opens a persistent WebSocket session to the Gemini Live API with audio + video modalities
  2. Stream — mic audio is captured as raw PCM at 16kHz and sent continuously; webcam frames are captured every second as JPEG and sent alongside
  3. Respond — model replies in audio (24kHz PCM), played back via Web Audio API with precise scheduling to avoid gaps
  4. Transcribe — both input and output transcriptions arrive as server events and are shown live
  5. Feedback — on session end, the full transcript is sent to a second Gemini call which returns a structured JSON analysis

Getting Started

Prerequisites

Setup

# 1. Clone
git clone https://github.com/AadiXD200/liveapigemeni.git
cd liveapigemeni

# 2. Install dependencies
npm install

# 3. Set your API key
echo "GEMINI_API_KEY=your_key_here" > .env.local

# 4. Run
npm run dev

Open http://localhost:5173, allow camera and microphone access, pick a persona, and start talking.


Project Structure

App.tsx               — layout and header
LiveStreamDemo.tsx    — raw live API demo (audio + video streaming)
components/
  CoffeeChat.tsx      — persona chat session with live audio/video
  FeedbackSummary.tsx — post-chat score and analysis UI
services/
  audioUtils.ts       — PCM encode/decode, Web Audio helpers
types.ts              — AppState, PersonaType, FeedbackData interfaces

About

Live multimodal AI networking coach — talk to a recruiter/mentor/peer persona via Gemini Live API (real-time voice + webcam), get scored feedback on your networking skills

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors