🧠 Local Brain RAG

Privacy-First Code & Knowledge Retrieval — Powered by Your Own GPU

100% Local Inference · Zero Data Leakage · Built for Developers
An industrial-grade Retrieval-Augmented Generation (RAG) system that turns your
entire codebase, technical docs, and architecture diagrams into a conversational second brain.

_{English · 中文}

✨ Highlights

🔒 Privacy Isolation All embeddings and reranking run on your local GPU. Code snippets, documents, and conversations never leave your machine. ChromaDB multi-collection architecture provides physical isolation per project space.	⚡ Hybrid Retrieval BM25 (code-aware tokenizer) + Vector (BGE) + RRF fusion. The custom `code_aware_tokenize` combines jieba Chinese segmentation with regex code identifier extraction — hitting class names, method signatures, and stack traces with surgical precision.
🚀 Hardware-Optimized Tuned for NVIDIA RTX 5060 · PyTorch 2.6 · CUDA 12.8. Concurrent batch ingestion with thread-pool workers, incremental mtime-based scanning, and BM25 instance caching that avoids O(N) vocabulary rebuilds on every chat turn.	📄 Multi-Format Parsing PyMuPDF for PDF, python-docx for Word, Gemini Vision for architecture diagrams/screenshots, and tree-sitter `CodeSplitter` for structural Java/Vue chunking that preserves class and method boundaries.
🛡️ Graceful Degradation LLM quota exhausted? The system auto-fails over from Gemini to Qwen (and back) mid-conversation without losing context. BM25 init failure? Falls back to pure vector retrieval. Resilient by design.	🪝 Live File Watching Watchdog monitors workspace directories with configurable debounce. Files are incrementally re-indexed on change; deleted files are pruned from the vector store. Ghost node cleanup runs on every startup.

🛠️ Tech Stack

Layer	Technology	Role
Orchestration	LlamaIndex 0.12+	Index pipeline, Condense+Context chat engine, RRF fusion
LLM	Gemini 2.5 Flash · Qwen-Plus/Max	Conversational generation with seamless failover
Embedding	`BAAI/bge-large-zh-v1.5` (HuggingFace)	Local GPU vector embeddings
Reranker	`BAAI/bge-reranker-v2-m3` (SentenceTransformer)	Local GPU cross-encoder reranking
Vector Store	ChromaDB (Persistent Client)	On-disk persistence, multi-collection isolation
Hybrid Search	BM25 + Vector + Reciprocal Rank Fusion	Code-aware tokenizer + semantic dual-pathway
UI	Streamlit 1.40+	Streaming chat, source trace panel, file upload, ZIP extraction
File Watching	Watchdog 6.0+	Debounced incremental indexing on filesystem events
Doc Parsing	PyMuPDF · python-docx · tree-sitter	PDF, Word, structural code splitting
Tokenization	Jieba + custom regex	Chinese semantic + code identifier extraction

📂 Project Structure

bendiRAG/
├── main.py                 # Entry point — loads .env, sets cache paths, launches Streamlit
├── app.py                  # Streamlit UI — streaming chat, space management, file upload
├── config.py               # Configuration — .env loading, workspace persistence
├── rag_engine.py           # RAG core — indexing, BM25+Vector retrieval, chat engine
├── watcher.py              # Watchdog daemon — debounced incremental file indexing
├── requirements.txt        # Python dependencies
├── .env.example            # Environment variable template (safe to commit)
├── .env                    # Local secrets (excluded from Git)
├── .gitignore
├── .chroma_second_brain/   # ChromaDB persistent storage
├── dynamic_workspace/      # Uploaded files & ZIP extraction scratch space
├── workspaces.json         # Per-space workspace path registry
└── .index_state_*.json     # Per-space mtime index state (auto-generated)

File	Responsibility
`main.py`	Boot: loads `.env`, syncs `HF_HOME` / cache env vars, delegates to `run_app()`
`app.py`	Full UI lifecycle: space switching, streaming output, source trace expander, chat persistence, file upload/ZIP extraction, workspace CRUD
`config.py`	`AppConfig` dataclass, `dotenv` loading, `workspaces.json` persistence, HTTP proxy injection
`rag_engine.py`	RAG infrastructure: GPU embedding/reranker init, BM25+Vector hybrid retrieval, ChromaDB collection management, ghost node cleanup, incremental scan, code-aware chunking, image captioning
`watcher.py`	Watchdog event handler: debounced batch upsert, file deletion sync to vector store

🚀 Quick Start

Prerequisites

Python 3.10+ · CUDA 12.8 · NVIDIA GPU (RTX 3060 or above recommended)
Google AI Studio API Key (free tier available)
(Optional) Aliyun DashScope API Key for Qwen failover

1. Environment Setup

# Create and activate conda environment
conda create -n bendirag python=3.11 -y
conda activate bendirag

# Install PyTorch 2.6 with CUDA 12.8 support
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

# Install project dependencies
pip install -r requirements.txt

2. Configuration

# Copy the template
cp .env.example .env

# Edit .env with your API keys
# GEMINI_API_KEY=...          (required)
# DASHSCOPE_API_KEY=...       (optional — Qwen failover)
# WORKSPACE_PATHS=...         (directories to index)

3. Launch

streamlit run main.py
# or: python main.py

Open http://localhost:8501 and start conversing with your codebase.

💡 Usage

First Run

On startup, the app scans all WORKSPACE_PATHS directories, chunks every indexable file (code, docs, images), and ingests them into the vector store. A progress bar shows batch insertion status in real time.

Project Spaces

Action	How	Effect
Switch space	Sidebar dropdown	Switches to an isolated ChromaDB collection + chat history
Create space	`＋` button	Spawns a blank collection
Destroy space	`🗑️` button	Physically deletes collection + index state + chat history

Knowledge Base

File upload: Drag & drop files or ZIP archives. ZIPs auto-extract (with Chinese filename encoding fix).
Watch directory: Enter a local folder path — one-click full scan + continuous watchdog monitoring.
Remove workspace: Click ✕ to purge all vector nodes under a directory.

Chat & Source Tracing

Ask natural-language questions like:

"Explain the overall architecture and tech stack of this project."
"What does the recent change to UserService do?"
"Map out the database table dependencies."

Each response includes a source trace panel showing the originating file path and a 300-character code snippet preview.

🔧 Model Switching & Failover

Model	Provider	Required Key
`gemini-2.5-flash`	Google	`GEMINI_API_KEY`
`qwen-plus`	Aliyun DashScope	`DASHSCOPE_API_KEY`
`qwen-max`	Aliyun DashScope	`DASHSCOPE_API_KEY`

Auto-failover: If the active model returns a quota/resource error (HTTP 429), the system automatically retries with the alternate provider — no manual intervention needed.

🧪 Under the Hood

Code-Aware Tokenizer

def code_aware_tokenize(text: str) -> list[str]:
    code_tokens = re.findall(r"[a-zA-Z0-9_]+", text)        # camelCase / snake_case
    chinese_tokens = [w for w in jieba.lcut(text) if ...]   # Chinese semantics
    return code_tokens + chinese_tokens

BM25 keeps identifiers like getUserById intact instead of splitting them into ["get", "User", "By", "Id"] — a game-changer for code search.

Data Lifecycle

File created   → incremental_upsert_file → update index_state
File modified  → delete_ref_doc → re-insert → update index_state mtime
File deleted   → watchdog on_deleted → delete_ref_doc
Workspace rm   → delete_workspace_nodes → batch purge via index_state
Space destroy  → delete_collection → wipe Chroma + state + chat history
Ghost cleanup  → cleanup_ghost_nodes → scan index_state for vanished files

BM25 Cache Strategy

BM25 retriever construction requires an O(N) scan over the full document vocabulary. The system caches (index_state_mtime, BM25Retriever) globally and only rebuilds when the space's index state file changes — reuse across chat turns, no per-message penalty.

📜 License

MIT License — use freely. Your data never leaves your machine.

🇨🇳 中文说明

Local Brain RAG（bendiRAG）

隐私优先的本地代码与知识库检索助手

核心定位：面向软件工程师的工业级 RAG 系统。所有 Embedding 和 Rerank 推理均在本地 GPU 上完成，代码片段绝不离开你的机器。

五大亮点：

隐私隔离 — ChromaDB 多 Collection 物理隔离，销毁空间即完整擦除
混合检索 — BM25（代码分词）+ Vector（BGE 语义）+ RRF 倒数排序融合
硬件压榨 — 针对 RTX 5060 / PyTorch 2.6 / CUDA 12.8 调优，并发批写入 + BM25 缓存复用
多格式解析 — PyMuPDF 解析 PDF、python-docx 解析 Word、Gemini 视觉模型描述截图、tree-sitter 结构化切分 Java/Vue
自动容灾 — Gemini 配额耗尽时自动切换千问接力，对话不中断

快速安装：

conda create -n bendirag python=3.11 -y && conda activate bendirag
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
pip install -r requirements.txt
cp .env.example .env   # 编辑填入 GEMINI_API_KEY
streamlit run main.py

技术栈：LlamaIndex 编排 · ChromaDB 向量存储 · Streamlit 界面 · HuggingFace BGE 本地 Embedding/Rerank · Gemini 2.5 Flash + 千问 Plus/Max 双引擎 · Watchdog 文件监听 · Jieba 中文分词

_{Built with ❤️ for developers who care about privacy and performance.}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
main.py		main.py
rag_engine.py		rag_engine.py
requirements.txt		requirements.txt
watcher.py		watcher.py
workspaces.json		workspaces.json

Folders and files

Latest commit

History

Repository files navigation

🧠 Local Brain RAG

Privacy-First Code & Knowledge Retrieval — Powered by Your Own GPU

✨ Highlights

🔒 Privacy Isolation

⚡ Hybrid Retrieval

🚀 Hardware-Optimized

📄 Multi-Format Parsing

🛡️ Graceful Degradation

🪝 Live File Watching

🛠️ Tech Stack

📂 Project Structure

🚀 Quick Start

Prerequisites

1. Environment Setup

2. Configuration

3. Launch

💡 Usage

First Run

Project Spaces

Knowledge Base

Chat & Source Tracing

🔧 Model Switching & Failover

🧪 Under the Hood

Code-Aware Tokenizer

Data Lifecycle

BM25 Cache Strategy

📜 License

🇨🇳 中文说明

Local Brain RAG（bendiRAG）

隐私优先的本地代码与知识库检索助手

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages