Skip to content

Beginner-Mon/Virtual-Verbal-Assistant

Repository files navigation

🧠 Virtual Verbal Assistant

An AI-powered multimodal assistant that understands natural language, generates intelligent responses, produces realistic 3D human motion, and speaks β€” all in one unified pipeline.

Quickstart Developer Docs AgenticRAG


✨ What is This?

Virtual Verbal Assistant is a multi-service research platform that combines:

  • πŸ€– AgenticRAG β€” A double-RAG conversational AI powered by Google Gemini with persistent memory, document understanding, and clinical knowledge retrieval.
  • πŸƒ Text-to-Motion (DART) β€” A diffusion-based motion synthesis engine that generates realistic 3D human animations from natural language descriptions. (ICLR 2025 Spotlight)
  • πŸ—£οΈ SpeechLLm β€” Voice I/O with emotion-aware dialogue using local Small Language Models (Whisper + Phi-3/Mistral via Ollama).
  • πŸ’¬ ECA UI 2.0 β€” A modern chat interface with real-time motion visualization, TTS audio playback, and exercise cards.

Example: Ask "My neck is stiff from coding all day" β†’ get clinically-safe exercise advice + corresponding 3D motion animation + spoken audio response.


πŸ—οΈ Architecture

                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚   ECA Official UI     β”‚
                         β”‚     (Port 3000)       β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚   Unified Gateway     β”‚
                         β”‚     (Port 8000)       β”‚
                         β”‚  /process_query       β”‚
                         β”‚  /tasks/{id}          β”‚
                         β”‚  /download/{file}     β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚                    β”‚                    β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   AgenticRAG        β”‚  β”‚    DART     β”‚  β”‚    SpeechLLm      β”‚
    β”‚   Orchestrator      β”‚  β”‚  (WSL)     β”‚  β”‚    (Optional)     β”‚
    β”‚   (Port 8080)       β”‚  β”‚ (Port 5001) β”‚  β”‚    (Port 5000)    β”‚
    β”‚                     β”‚  β”‚             β”‚  β”‚                   β”‚
    β”‚  β€’ Gemini LLM       β”‚  β”‚ β€’ Diffusion β”‚  β”‚  β€’ Whisper STT    β”‚
    β”‚  β€’ ChromaDB Memory  β”‚  β”‚ β€’ CLIP      β”‚  β”‚  β€’ Emotion Det.   β”‚
    β”‚  β€’ Document RAG     β”‚  β”‚ β€’ SMPL-X    β”‚  β”‚  β€’ TTS Output     β”‚
    β”‚  β€’ Clinical Safety  β”‚  β”‚ β€’ 30fps     β”‚  β”‚  β€’ Ollama SLM     β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

Requirement Notes
Windows + WSL2 DART runs inside WSL with CUDA
Conda firstconda (Windows), DART (WSL), tts (SpeechLLm)
Redis Via Docker or system install
ffmpeg In PATH for motion video rendering
GEMINI_API_KEY Set in agenticRAG/agentic_rag_gemini/.env

One-Command Launch

conda activate firstconda
python run_stack.py

This starts all services automatically: Redis, Celery, AgenticRAG API, Orchestrator, DART (WSL), SpeechLLm, ECA UI, and Streamlit UI.

Access Points

Interface URL Description
ECA Official UI localhost:3000 Default frontend β€” chat + motion viewer
Streamlit UI localhost:8501 Alternate chat interface
API Gateway localhost:8000 Unified REST API for all clients

Verify

curl http://localhost:8000/health

πŸ”Œ API Reference

All browser and client traffic routes through the unified gateway on Port 8000.

Endpoints

Endpoint Method Purpose
/process_query POST Submit an async query task
/tasks/{task_id} GET Poll task progress and results
/download/{file} GET Proxy DART motion artifacts
/history/{user_id} GET Retrieve chat history
/health GET Per-service health status

Task Lifecycle

{
  "task_id": "abc123",
  "status": "processing | completed | failed",
  "progress_stage": "queued β†’ rag_query β†’ motion_generation β†’ tts β†’ completed",
  "result": {
    "text_answer": "...",
    "exercises": [{"name": "Chin tuck"}, {"name": "Shoulder roll"}],
    "motion": {"motion_file_url": "/download/motion_abc123.npz", "fps": 30},
    "audio_url": "/static/tts_abc123.wav"
  },
  "error": null
}

Design principle: Internal service ports (5001, 8080) are never exposed to clients. Motion URLs from DART are rewritten to gateway-safe paths through Port 8000.


🌐 Remote Access (Ngrok)

For remote demonstrations, tunnel the UI and API:

# Terminal 1 β€” UI tunnel
ngrok http 3000

# Terminal 2 β€” API tunnel
ngrok http 8000

Then open:

https://<ui-tunnel>.ngrok-free.app/?api_base=https://<api-tunnel>.ngrok-free.app

Do not expose internal ports 5001 or 8080 directly.


πŸ“ Project Structure

Virtual-Verbal-Assistant/
β”œβ”€β”€ run_stack.py                        # One-command stack launcher
β”œβ”€β”€ README.md                          # ← You are here
β”œβ”€β”€ README_DEV.md                      # Full developer & architecture docs
β”œβ”€β”€ QUICKSTART.md                      # Quick reference startup guide
β”‚
β”œβ”€β”€ agenticRAG/agentic_rag_gemini/     # πŸ€– AgenticRAG + Orchestrator
β”‚   β”œβ”€β”€ api_server.py                  #    REST API (Port 8000)
β”‚   β”œβ”€β”€ main_api.py                    #    Orchestrator (Port 8080)
β”‚   β”œβ”€β”€ agents/                        #    Query routing (Gemini)
β”‚   β”œβ”€β”€ retrieval/                     #    RAG pipeline
β”‚   β”œβ”€β”€ memory/                        #    ChromaDB vector store
β”‚   └── config/config.yaml             #    All tuneable settings
β”‚
β”œβ”€β”€ text-to-motion/DART/               # πŸƒ DART motion synthesis (WSL)
β”‚   β”œβ”€β”€ api_server.py                  #    REST API (Port 5001)
β”‚   β”œβ”€β”€ mld/                           #    Motion Latent Diffusion
β”‚   β”œβ”€β”€ diffusion/                     #    Gaussian diffusion
β”‚   β”œβ”€β”€ model/                         #    Denoiser + VAE
β”‚   └── data/outputs/                  #    Generated .npz files
β”‚
β”œβ”€β”€ SpeechLLm/                         # πŸ—£οΈ Voice I/O + Emotion (Port 5000)
β”‚   β”œβ”€β”€ api_server.py
β”‚   └── src/                           #    STT, LLM, TTS, emotion stages
β”‚
β”œβ”€β”€ ECA_UI/                            # πŸ’¬ Official Web Interface (Port 3000)
β”‚   β”œβ”€β”€ index.html                     #    Main dashboard
β”‚   └── api.js                         #    API client
β”‚
└── test-ui/                           # πŸ§ͺ Developer test interface
    └── index.html

πŸ“š Documentation

Document Description
QUICKSTART.md Prerequisites, ports, one-command startup
README_DEV.md Full architecture, subsystem deep-dives, troubleshooting
AgenticRAG Developers Internal AgenticRAG architecture
DART Architecture Motion synthesis internals & integration
SpeechLLm Voice pipeline documentation

πŸ› οΈ Troubleshooting

# Check if all ports are free
python check_ports.py --ports 3000 5001 6379 8000 8080

# Verify ffmpeg is available
Get-Command ffmpeg

# Check ChromaDB container
docker compose ps

# Kill a stuck port
Get-NetTCPConnection -LocalPort 8000 | Select-Object OwningProcess
taskkill /PID <PID> /F

For detailed troubleshooting, see README_DEV.md Β§ Troubleshooting.


Built with Gemini Β· DART Β· ChromaDB Β· SMPL-X Β· Whisper

About

An AI-powered chatbot assistant integrating text-to-motion, text-to-speech, and agentic RAG for interactive and multimodal user experiences.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors