A live multimodal networking practice app powered by the Gemini Live API. Pick a persona, have a real spoken conversation with an AI that can see and hear you in real time, then get professional feedback on your networking skills when you're done.
You choose who you want to practice with:
| Persona | Who they are |
|---|---|
| Alex — Recruiter | Senior recruiter at a top tech firm doing a vibe-check coffee chat |
| Sarah — Mentor | Veteran software architect giving career advice to a junior dev |
| Jamie — Peer | Product designer networking casually, like you're in a real café |
The AI sees your webcam feed and hears your mic in real time via the Gemini Live API. It responds with voice, comments on your body language and setting, and keeps the conversation natural with human-like filler ("um", "ah"). When you end the chat, a second Gemini call analyses the full transcript and scores your performance.
- Live audio + video — real-time bidirectional voice conversation with webcam streaming to the model
- Three distinct personas — each with a different tone, role, and observation style
- Live transcription panel — see both sides of the conversation transcribed as you speak
- Post-chat feedback — overall score (0–100), strengths, areas for improvement, and actionable tips generated from the full transcript
- Multimodal context — the AI can see you and react to your environment, attire, and engagement
| Layer | Technology |
|---|---|
| AI (live conversation) | Gemini 2.5 Flash Native Audio (gemini-2.5-flash-native-audio-preview) |
| AI (feedback analysis) | Gemini Flash (gemini-3-flash-preview) |
| SDK | @google/genai Live API |
| Audio pipeline | Web Audio API — PCM capture at 16kHz, playback at 24kHz |
| Video pipeline | getUserMedia → canvas frame capture → base64 JPEG → Live API |
| Frontend | React 18, TypeScript, Vite, Tailwind CSS |
- Connect — opens a persistent WebSocket session to the Gemini Live API with audio + video modalities
- Stream — mic audio is captured as raw PCM at 16kHz and sent continuously; webcam frames are captured every second as JPEG and sent alongside
- Respond — model replies in audio (24kHz PCM), played back via Web Audio API with precise scheduling to avoid gaps
- Transcribe — both input and output transcriptions arrive as server events and are shown live
- Feedback — on session end, the full transcript is sent to a second Gemini call which returns a structured JSON analysis
- Node.js 18+
- A Gemini API key — get one at aistudio.google.com
# 1. Clone
git clone https://github.com/AadiXD200/liveapigemeni.git
cd liveapigemeni
# 2. Install dependencies
npm install
# 3. Set your API key
echo "GEMINI_API_KEY=your_key_here" > .env.local
# 4. Run
npm run devOpen http://localhost:5173, allow camera and microphone access, pick a persona, and start talking.
App.tsx — layout and header
LiveStreamDemo.tsx — raw live API demo (audio + video streaming)
components/
CoffeeChat.tsx — persona chat session with live audio/video
FeedbackSummary.tsx — post-chat score and analysis UI
services/
audioUtils.ts — PCM encode/decode, Web Audio helpers
types.ts — AppState, PersonaType, FeedbackData interfaces