Kiosk AI Agent Avatar

Real-time AI receptionist with lip-synced avatar for patient self-service

Patients walk up to a kiosk, tap "Start", and talk to Emma — an AI receptionist who sees them, hears them, and handles everything from identity verification to appointment booking. No waiting in line. No front desk bottleneck.

Features · Architecture · Tech Stack · Setup · Contact

What it does

Voice Conversation

Continuous, natural speech interaction powered by OpenAI Realtime STT and GPT-4o. Local Fish Speech TTS for minimal latency. Local Silero VAD — voice activity never leaves the device.

Lip-Synced Avatar

Photorealistic avatar by Simli AI — lip-syncs in real-time to speech output. 512x512 @ 30 FPS streamed via WebRTC. Zero plugins, runs in any browser.

Patient Verification

"Hi, my name is John Smith, born March 15, 1985" — Emma looks up the patient in Open Dental database, verifies identity, and unlocks their records. Natural date parsing, 3 retry attempts, HIPAA audit logging.

Account & Appointments

Check balance (net of insurance estimates), view upcoming visits with provider/procedure/room details, book new appointments (auto-generated confirmation numbers), send SMS reminders.

Manual Check-In

Staff-accessible sidebar for non-voice check-in. Search by last name + DOB, view patient card, 30-second auto-reset for kiosk security.

Session Control

Single-session enforcement, 60-second silence timeout, graceful WebRTC teardown. Multilingual UI — English, Spanish, Russian.

Architecture

  Browser (Kiosk)                          Backend (Python / aiohttp)
 ┌────────────────┐                       ┌──────────────────────────────────┐
 │                │   WebRTC audio/video  │                                  │
 │  Avatar Video  │◄─────────────────────►│  Pipecat Pipeline                │
 │  Transcript    │                       │                                  │
 │  Info Panels   │   WebSocket events    │  Mic ► Silero VAD ► OpenAI STT   │
 │  Manual Sidebar│◄─────────────────────►│       ► GPT-4o (+ functions)     │
 │                │                       │       ► Fish Speech TTS          │
 └────────────────┘                       │       ► Simli AI Avatar          │
                                          │       ► WebRTC Out               │
                                          │              │                   │
                                          │        ┌─────▼──────┐            │
                                          │        │Open Dental │            │
                                          │        │  MySQL DB  │            │
                                          │        └────────────┘            │
                                          └──────────────────────────────────┘

Conversation Flow

greeting ► verify_dob ► main_menu ──► check_balance ──► main_menu / goodbye
                │              ├────► view_appointments ► main_menu / goodbye
                │              ├────► start_booking ────► main_menu / goodbye
                │              └────► send_reminder ────► main_menu / goodbye
                │
                └──► not_found (retry up to 3x) ► see_receptionist

Each node carries its own system prompt (Emma's persona), task instructions, LLM function schemas, and pre/post actions.

Tech Stack

Layer	Technology	Why
AI Framework	Pipecat	Modular real-time pipeline for voice AI agents
LLM	OpenAI GPT-4o	Function calling for structured DB queries
Speech-to-Text	OpenAI Realtime STT	`gpt-4o-transcribe` — fast, multilingual
Text-to-Speech	Fish Speech v1.4	Runs locally — zero network latency, voice cloning support
Avatar	Simli AI	Real-time lip-sync from audio stream
VAD	Silero VAD	On-device voice detection, no cloud dependency
Transport	WebRTC (aiortc)	Sub-second peer-to-peer audio/video
Server	aiohttp	Async Python, handles WebRTC signaling + WebSocket events
Database	MySQL (Open Dental)	Direct queries against dental practice management system
Frontend	Vanilla HTML/CSS/JS	Single-page kiosk app, dark glassmorphism theme

UI & Design

Element	Details
Layout	Full-screen kiosk, designed for touch displays
Theme	Dark (`#0a0a0a`) with glassmorphism — backdrop blur, semi-transparent panels
Accent	Teal `#288d89` with green/red status indicators
Transcript	User speech in green italic, bot in white — auto-fades after 5s
Info panels	Slide-in from top-right (balance, appointments, confirmations) — auto-hide after 8s
Status dot	Green = listening, Blue = processing, Amber = error
Controls	"Tap to Start" button, red stop circle, language toggle (EN/ES/RU)

WebSocket Events

Real-time UI updates via /events:

Event	Payload	Description
`call_started`	—	Session began
`call_ended`	`{ reason }`	Session terminated
`user_transcript`	`{ text }`	Live user speech
`bot_transcript`	`{ text }`	Bot response text
`patient_verified`	`{ name, id }`	Identity confirmed
`balance`	`{ amount, insurance }`	Account balance
`appointments`	`[{ date, time, provider, procedure }]`	Upcoming visits
`booking_confirmed`	`{ confirmation_number }`	New booking created
`error`	`{ message }`	Error notification

API Endpoints

Method	Path	Description
`GET`	`/`	Kiosk UI
`POST`	`/api/offer`	WebRTC SDP exchange
`POST`	`/api/ice-candidate`	ICE candidate handling
`GET`	`/events`	WebSocket for real-time events
`GET`	`/health`	Health check

Setup

Prerequisites

Python 3.12+
MySQL with Open Dental database
Fish Speech v1.4 server
API keys: OpenAI, Simli AI

Environment

OPENAI_API_KEY=sk-...
SIMLI_API_KEY=...
SIMLI_FACE_ID=...

DB_HOST=...
DB_PORT=3306
DB_USER=...
DB_PASSWORD=...
DB_NAME=opendental

# Optional
FISH_SPEECH_URL=http://localhost:8090
FISH_SPEECH_REF=<speaker-reference-id>

Install & Run

# Install
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Terminal 1 — TTS server
python tools/api.py --listen 127.0.0.1:8090 --device mps --mode tts

# Terminal 2 — Kiosk
python main.py
# → http://localhost:8080

Project Structure

├── main.py            Web server, sessions, WebRTC signaling
├── agent.py           Pipecat pipeline (STT → LLM → TTS → Avatar)
├── flow.py            Conversation nodes (greeting → verify → menu)
├── tools.py           DB tools (patient lookup, balance, appointments, booking)
├── fish_tts.py        Custom TTS service for Fish Speech v1.4
├── db.py              MySQL connection pool with retry
├── index.html         Kiosk frontend (single-page dark theme app)
├── requirements.txt   Python dependencies
└── .env               API keys & config (not committed)

Security & Compliance

Measure	Implementation
HIPAA audit trail	Every data access logged to `kiosk_audit_log` with timestamp, action, patient ID
Identity verification	Name + DOB required before any data is shown
Phone masking	Displayed as `+1***XXXX` in UI and logs
Privacy-safe search	Returns error if multiple patients match — prevents data leakage
Session isolation	One active conversation at a time
Auto-timeout	60 seconds of silence = session ends

Avatar Idle Video

The idle loop video (15 MB) is not included in this repo. Download and place in the project root:

Download idle video (MP4)

Simli AI provides 50 free minutes/month — more than enough for testing.

Hire Me

I built this entire system solo — real-time voice pipeline, avatar integration, database layer, HIPAA compliance, and kiosk UI.

If your company needs production-grade conversational AI but lacks the engineering team to build it — I'm your guy.

Contract work · Consulting · Full builds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kiosk AI Agent Avatar

Real-time AI receptionist with lip-synced avatar for patient self-service

What it does

Voice Conversation

Lip-Synced Avatar

Patient Verification

Account & Appointments

Manual Check-In

Session Control

Architecture

Conversation Flow

Tech Stack

UI & Design

WebSocket Events

API Endpoints

Setup

Prerequisites

Environment

Install & Run

Project Structure

Security & Compliance

Avatar Idle Video

Hire Me

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
db.py		db.py
fish_tts.py		fish_tts.py
flow.py		flow.py
index.html		index.html
logo.jpg		logo.jpg
main.py		main.py
requirements.txt		requirements.txt
test_mic.py		test_mic.py
test_tools.py		test_tools.py
test_voice.py		test_voice.py
tools.py		tools.py

Folders and files

Latest commit

History

Repository files navigation

Kiosk AI Agent Avatar

Real-time AI receptionist with lip-synced avatar for patient self-service

What it does

Voice Conversation

Lip-Synced Avatar

Patient Verification

Account & Appointments

Manual Check-In

Session Control

Architecture

Conversation Flow

Tech Stack

UI & Design

WebSocket Events

API Endpoints

Setup

Prerequisites

Environment

Install & Run

Project Structure

Security & Compliance

Avatar Idle Video

Hire Me

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages