George Avatar Chatbot (Dynavap RAG)
Overview
- Web chatbot that feels like chatting with “George,” the Dynavap expert from YouTube.
- Uses RAG over transcripts from George’s YouTube videos. Transcripts are generated using OpenAI speech-to-text (Whisper via
gpt-4o-mini-transcribe). - Starts with animated avatar + text responses; optional in-browser TTS.
Project Structure
server/— FastAPI backend with RAG endpoint and transcript ingestion.web/— Minimal static chat UI with animated avatar, optional TTS, and source links under responses.data/— Transcripts and vector index (created after ingest).
Quick Start
- Prereqs
- Python 3.10+
- An OpenAI API key in
OPENAI_API_KEY - Optional: YouTube Data API key in
YOUTUBE_API_KEY(only for listing videos by channel; transcription is handled by OpenAI, not the YouTube API)
- Install deps
python -m venv .venv
source .venv/bin/activate
pip install -r server/requirements.txt- Ingest transcripts
Option A — Provide a list of video URLs/IDs in data/videos.txt (one per line):
python server/ingest.py --videos-file data/videos.txt --max-videos 50Option B — Use a YouTube channel ID (needs YOUTUBE_API_KEY):
export YOUTUBE_API_KEY=YOUR_KEY
python server/ingest.py --channel-id UCxxxxxxxxxxxx --max-videos 50This downloads audio via yt-dlp, transcribes it with OpenAI, then creates data/transcripts.jsonl and a FAISS index in data/index/.
- Run the server
export OPENAI_API_KEY=YOUR_OPENAI_KEY
uvicorn server.app:app --reload --port 8000Open http://localhost:8000 to use the chatbot.
Notes
- Persona: The assistant responds in George’s friendly, knowledgeable style focused on Dynavap. You can customize the prompt in
server/app.py. - TTS: The web client uses the browser’s
speechSynthesisas a fallback. A server TTS provider can be added later. - Assets: The avatar is a lightweight CSS/SVG animation; you can replace it with a Lottie or video later.
- Avatar image: Place your George picture as
goerge.png(orgeorge.png) in the project root. The web UI loads it from/avatarand displays it as-is. Replace anytime. - Transcription model: Set
OPENAI_TRANSCRIBE_MODELto change the model (defaultgpt-4o-mini-transcribe).
Troubleshooting downloads/transcription
- Some videos require cookies or get rate-limited. Provide a cookies file (Netscape format, exported from your browser) and/or a proxy. These are passed to
yt-dlpfor the audio download:
python server/ingest.py --videos-file data/videos.txt \
--cookies /path/to/youtube_cookies.txt \
--proxy http://127.0.0.1:8080- You can also set env vars:
YT_COOKIESandYT_PROXY. - If transcription returns empty, the script will error for visibility. Share the console output if you need help.