Skip to content

tr00x/Kiosk-AI-Agent-Avatar

Repository files navigation

Kiosk AI Agent Avatar

Real-time AI receptionist with lip-synced avatar for patient self-service

Python Pipecat OpenAI WebRTC Simli License


Patients walk up to a kiosk, tap "Start", and talk to Emma — an AI receptionist who sees them, hears them, and handles everything from identity verification to appointment booking. No waiting in line. No front desk bottleneck.


Features · Architecture · Tech Stack · Setup · Contact




What it does

Voice Conversation

Continuous, natural speech interaction powered by OpenAI Realtime STT and GPT-4o. Local Fish Speech TTS for minimal latency. Local Silero VAD — voice activity never leaves the device.

Lip-Synced Avatar

Photorealistic avatar by Simli AI — lip-syncs in real-time to speech output. 512x512 @ 30 FPS streamed via WebRTC. Zero plugins, runs in any browser.

Patient Verification

"Hi, my name is John Smith, born March 15, 1985" — Emma looks up the patient in Open Dental database, verifies identity, and unlocks their records. Natural date parsing, 3 retry attempts, HIPAA audit logging.

Account & Appointments

Check balance (net of insurance estimates), view upcoming visits with provider/procedure/room details, book new appointments (auto-generated confirmation numbers), send SMS reminders.

Manual Check-In

Staff-accessible sidebar for non-voice check-in. Search by last name + DOB, view patient card, 30-second auto-reset for kiosk security.

Session Control

Single-session enforcement, 60-second silence timeout, graceful WebRTC teardown. Multilingual UI — English, Spanish, Russian.




Architecture

  Browser (Kiosk)                          Backend (Python / aiohttp)
 ┌────────────────┐                       ┌──────────────────────────────────┐
 │                │   WebRTC audio/video  │                                  │
 │  Avatar Video  │◄─────────────────────►│  Pipecat Pipeline                │
 │  Transcript    │                       │                                  │
 │  Info Panels   │   WebSocket events    │  Mic ► Silero VAD ► OpenAI STT   │
 │  Manual Sidebar│◄─────────────────────►│       ► GPT-4o (+ functions)     │
 │                │                       │       ► Fish Speech TTS          │
 └────────────────┘                       │       ► Simli AI Avatar          │
                                          │       ► WebRTC Out               │
                                          │              │                   │
                                          │        ┌─────▼──────┐            │
                                          │        │Open Dental │            │
                                          │        │  MySQL DB  │            │
                                          │        └────────────┘            │
                                          └──────────────────────────────────┘

Conversation Flow

greeting ► verify_dob ► main_menu ──► check_balance ──► main_menu / goodbye
                │              ├────► view_appointments ► main_menu / goodbye
                │              ├────► start_booking ────► main_menu / goodbye
                │              └────► send_reminder ────► main_menu / goodbye
                │
                └──► not_found (retry up to 3x) ► see_receptionist

Each node carries its own system prompt (Emma's persona), task instructions, LLM function schemas, and pre/post actions.




Tech Stack

Layer Technology Why
AI Framework Pipecat Modular real-time pipeline for voice AI agents
LLM OpenAI GPT-4o Function calling for structured DB queries
Speech-to-Text OpenAI Realtime STT gpt-4o-transcribe — fast, multilingual
Text-to-Speech Fish Speech v1.4 Runs locally — zero network latency, voice cloning support
Avatar Simli AI Real-time lip-sync from audio stream
VAD Silero VAD On-device voice detection, no cloud dependency
Transport WebRTC (aiortc) Sub-second peer-to-peer audio/video
Server aiohttp Async Python, handles WebRTC signaling + WebSocket events
Database MySQL (Open Dental) Direct queries against dental practice management system
Frontend Vanilla HTML/CSS/JS Single-page kiosk app, dark glassmorphism theme



UI & Design

Element Details
Layout Full-screen kiosk, designed for touch displays
Theme Dark (#0a0a0a) with glassmorphism — backdrop blur, semi-transparent panels
Accent Teal #288d89 with green/red status indicators
Transcript User speech in green italic, bot in white — auto-fades after 5s
Info panels Slide-in from top-right (balance, appointments, confirmations) — auto-hide after 8s
Status dot Green = listening, Blue = processing, Amber = error
Controls "Tap to Start" button, red stop circle, language toggle (EN/ES/RU)



WebSocket Events

Real-time UI updates via /events:

Event Payload Description
call_started Session began
call_ended { reason } Session terminated
user_transcript { text } Live user speech
bot_transcript { text } Bot response text
patient_verified { name, id } Identity confirmed
balance { amount, insurance } Account balance
appointments [{ date, time, provider, procedure }] Upcoming visits
booking_confirmed { confirmation_number } New booking created
error { message } Error notification



API Endpoints

Method Path Description
GET / Kiosk UI
POST /api/offer WebRTC SDP exchange
POST /api/ice-candidate ICE candidate handling
GET /events WebSocket for real-time events
GET /health Health check



Setup

Prerequisites

  • Python 3.12+
  • MySQL with Open Dental database
  • Fish Speech v1.4 server
  • API keys: OpenAI, Simli AI

Environment

OPENAI_API_KEY=sk-...
SIMLI_API_KEY=...
SIMLI_FACE_ID=...

DB_HOST=...
DB_PORT=3306
DB_USER=...
DB_PASSWORD=...
DB_NAME=opendental

# Optional
FISH_SPEECH_URL=http://localhost:8090
FISH_SPEECH_REF=<speaker-reference-id>

Install & Run

# Install
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Terminal 1 — TTS server
python tools/api.py --listen 127.0.0.1:8090 --device mps --mode tts

# Terminal 2 — Kiosk
python main.py
# → http://localhost:8080



Project Structure

├── main.py            Web server, sessions, WebRTC signaling
├── agent.py           Pipecat pipeline (STT → LLM → TTS → Avatar)
├── flow.py            Conversation nodes (greeting → verify → menu)
├── tools.py           DB tools (patient lookup, balance, appointments, booking)
├── fish_tts.py        Custom TTS service for Fish Speech v1.4
├── db.py              MySQL connection pool with retry
├── index.html         Kiosk frontend (single-page dark theme app)
├── requirements.txt   Python dependencies
└── .env               API keys & config (not committed)



Security & Compliance

Measure Implementation
HIPAA audit trail Every data access logged to kiosk_audit_log with timestamp, action, patient ID
Identity verification Name + DOB required before any data is shown
Phone masking Displayed as +1***XXXX in UI and logs
Privacy-safe search Returns error if multiple patients match — prevents data leakage
Session isolation One active conversation at a time
Auto-timeout 60 seconds of silence = session ends



Avatar Idle Video

The idle loop video (15 MB) is not included in this repo. Download and place in the project root:

Download idle video (MP4)

Simli AI provides 50 free minutes/month — more than enough for testing.




Hire Me

I built this entire system solo — real-time voice pipeline, avatar integration, database layer, HIPAA compliance, and kiosk UI.

If your company needs production-grade conversational AI but lacks the engineering team to build it — I'm your guy.

Contract work · Consulting · Full builds


Email   Telegram



All rights reserved.

About

AI-powered kiosk with real-time lip-synced avatar receptionist — voice conversation, patient verification, appointments & more. Built with Pipecat, GPT-4o, Simli AI, WebRTC.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors