MrMrsGaoRAG

A project to get all info from YouTube channel "Mr & Mrs Gao" and build a RAG (Retrieval-Augmented Generation) application.

🌟 Features

Audio Transcription - Convert YouTube videos to text using Whisper ASR
Vector Search - FAISS index with M3E embeddings for semantic search
Multi-LLM Support - Choose between:
- Local models via Ollama
- Cloud APIs (GPT-4/DeepSeek)
Interactive UI - Gradio-based interface with RAG capabilities

🛠️ Installation

# Clone repository
git clone https://github.com/yourusername/MrMrsGaoRAG.git
cd MrMrsGaoRAG

# Install dependencies
pip install -r requirements.txt

# Install Whisper (requires FFmpeg)
brew install ffmpeg  # macOS
# or
sudo apt update && sudo apt install ffmpeg  # Linux

📁 Project Structure

.
├── audios/          # Raw audio files
├── transcribe/      # Whisper JSON transcripts
├── faiss/           # Vector index & metadata
├── query.py         # Main RAG interface
├── transcribe.py    # Audio processing
└── insert_faiss.py  # Index builder

🧠 How It Works

RAG Pipeline

Query Processing User input → M3E embedding → FAISS similarity search
Context Augmentation Top-K relevant segments + Original query → LLM prompt
Response Generation Local/Cloud LLM generates final answer with video sources

Technical Stack

Component	Technology
Transcription	Whisper Medium (Chinese)
Embeddings	M3E-base
Vector Store	FAISS
LLM Interface	LangChain + Ollama
UI Framework	Gradio

⚠️ Notes

First run will download ~3GB Whisper model
GPU recommended for faster transcription
Edit split_text() in insert_faiss.py to adjust chunk sizes
Add API credentials in Gradio UI when using cloud models

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
api_providers		api_providers
app		app
configs		configs
faiss		faiss
pic		pic
rag_core		rag_core
transcribe		transcribe
utils		utils
.DS_Store		.DS_Store
.env		.env
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
app.py		app.py
config.yaml		config.yaml
insert_faiss.py		insert_faiss.py
template.txt		template.txt
ui.py		ui.py
video_url.json		video_url.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MrMrsGaoRAG

🌟 Features

🛠️ Installation

📁 Project Structure

🧠 How It Works

RAG Pipeline

Technical Stack

⚠️ Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MrMrsGaoRAG

🌟 Features

🛠️ Installation

📁 Project Structure

🧠 How It Works

RAG Pipeline

Technical Stack

⚠️ Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages