An AI-powered multimodal assistant that understands natural language, generates intelligent responses, produces realistic 3D human motion, and speaks β all in one unified pipeline.
Virtual Verbal Assistant is a multi-service research platform that combines:
- π€ AgenticRAG β A double-RAG conversational AI powered by Google Gemini with persistent memory, document understanding, and clinical knowledge retrieval.
- π Text-to-Motion (DART) β A diffusion-based motion synthesis engine that generates realistic 3D human animations from natural language descriptions. (ICLR 2025 Spotlight)
- π£οΈ SpeechLLm β Voice I/O with emotion-aware dialogue using local Small Language Models (Whisper + Phi-3/Mistral via Ollama).
- π¬ ECA UI 2.0 β A modern chat interface with real-time motion visualization, TTS audio playback, and exercise cards.
Example: Ask "My neck is stiff from coding all day" β get clinically-safe exercise advice + corresponding 3D motion animation + spoken audio response.
ββββββββββββββββββββββββ
β ECA Official UI β
β (Port 3000) β
ββββββββββββ¬ββββββββββββ
β
ββββββββββββΌββββββββββββ
β Unified Gateway β
β (Port 8000) β
β /process_query β
β /tasks/{id} β
β /download/{file} β
ββββββββββββ¬ββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββ
β β β
ββββββββββββΌβββββββββββ ββββββββΌβββββββ βββββββββββΌββββββββββ
β AgenticRAG β β DART β β SpeechLLm β
β Orchestrator β β (WSL) β β (Optional) β
β (Port 8080) β β (Port 5001) β β (Port 5000) β
β β β β β β
β β’ Gemini LLM β β β’ Diffusion β β β’ Whisper STT β
β β’ ChromaDB Memory β β β’ CLIP β β β’ Emotion Det. β
β β’ Document RAG β β β’ SMPL-X β β β’ TTS Output β
β β’ Clinical Safety β β β’ 30fps β β β’ Ollama SLM β
βββββββββββββββββββββββ βββββββββββββββ βββββββββββββββββββββ
| Requirement | Notes |
|---|---|
| Windows + WSL2 | DART runs inside WSL with CUDA |
| Conda | firstconda (Windows), DART (WSL), tts (SpeechLLm) |
| Redis | Via Docker or system install |
| ffmpeg | In PATH for motion video rendering |
| GEMINI_API_KEY | Set in agenticRAG/agentic_rag_gemini/.env |
conda activate firstconda
python run_stack.pyThis starts all services automatically: Redis, Celery, AgenticRAG API, Orchestrator, DART (WSL), SpeechLLm, ECA UI, and Streamlit UI.
| Interface | URL | Description |
|---|---|---|
| ECA Official UI | localhost:3000 | Default frontend β chat + motion viewer |
| Streamlit UI | localhost:8501 | Alternate chat interface |
| API Gateway | localhost:8000 | Unified REST API for all clients |
curl http://localhost:8000/healthAll browser and client traffic routes through the unified gateway on Port 8000.
| Endpoint | Method | Purpose |
|---|---|---|
/process_query |
POST |
Submit an async query task |
/tasks/{task_id} |
GET |
Poll task progress and results |
/download/{file} |
GET |
Proxy DART motion artifacts |
/history/{user_id} |
GET |
Retrieve chat history |
/health |
GET |
Per-service health status |
{
"task_id": "abc123",
"status": "processing | completed | failed",
"progress_stage": "queued β rag_query β motion_generation β tts β completed",
"result": {
"text_answer": "...",
"exercises": [{"name": "Chin tuck"}, {"name": "Shoulder roll"}],
"motion": {"motion_file_url": "/download/motion_abc123.npz", "fps": 30},
"audio_url": "/static/tts_abc123.wav"
},
"error": null
}Design principle: Internal service ports (
5001,8080) are never exposed to clients. Motion URLs from DART are rewritten to gateway-safe paths through Port 8000.
For remote demonstrations, tunnel the UI and API:
# Terminal 1 β UI tunnel
ngrok http 3000
# Terminal 2 β API tunnel
ngrok http 8000Then open:
https://<ui-tunnel>.ngrok-free.app/?api_base=https://<api-tunnel>.ngrok-free.app
Do not expose internal ports
5001or8080directly.
Virtual-Verbal-Assistant/
βββ run_stack.py # One-command stack launcher
βββ README.md # β You are here
βββ README_DEV.md # Full developer & architecture docs
βββ QUICKSTART.md # Quick reference startup guide
β
βββ agenticRAG/agentic_rag_gemini/ # π€ AgenticRAG + Orchestrator
β βββ api_server.py # REST API (Port 8000)
β βββ main_api.py # Orchestrator (Port 8080)
β βββ agents/ # Query routing (Gemini)
β βββ retrieval/ # RAG pipeline
β βββ memory/ # ChromaDB vector store
β βββ config/config.yaml # All tuneable settings
β
βββ text-to-motion/DART/ # π DART motion synthesis (WSL)
β βββ api_server.py # REST API (Port 5001)
β βββ mld/ # Motion Latent Diffusion
β βββ diffusion/ # Gaussian diffusion
β βββ model/ # Denoiser + VAE
β βββ data/outputs/ # Generated .npz files
β
βββ SpeechLLm/ # π£οΈ Voice I/O + Emotion (Port 5000)
β βββ api_server.py
β βββ src/ # STT, LLM, TTS, emotion stages
β
βββ ECA_UI/ # π¬ Official Web Interface (Port 3000)
β βββ index.html # Main dashboard
β βββ api.js # API client
β
βββ test-ui/ # π§ͺ Developer test interface
βββ index.html
| Document | Description |
|---|---|
| QUICKSTART.md | Prerequisites, ports, one-command startup |
| README_DEV.md | Full architecture, subsystem deep-dives, troubleshooting |
| AgenticRAG Developers | Internal AgenticRAG architecture |
| DART Architecture | Motion synthesis internals & integration |
| SpeechLLm | Voice pipeline documentation |
# Check if all ports are free
python check_ports.py --ports 3000 5001 6379 8000 8080
# Verify ffmpeg is available
Get-Command ffmpeg
# Check ChromaDB container
docker compose ps
# Kill a stuck port
Get-NetTCPConnection -LocalPort 8000 | Select-Object OwningProcess
taskkill /PID <PID> /FFor detailed troubleshooting, see README_DEV.md Β§ Troubleshooting.
Built with Gemini Β· DART Β· ChromaDB Β· SMPL-X Β· Whisper