Skip to content

rrg1225/Local-Brain-Retrieval-Augmented-Generation

Repository files navigation

Python PyTorch CUDA RTX 5060 License Privacy

🧠 Local Brain RAG

Privacy-First Code & Knowledge Retrieval — Powered by Your Own GPU

100% Local Inference · Zero Data Leakage · Built for Developers
An industrial-grade Retrieval-Augmented Generation (RAG) system that turns your
entire codebase, technical docs, and architecture diagrams into a conversational second brain.

English · 中文


✨ Highlights

🔒 Privacy Isolation

All embeddings and reranking run on your local GPU. Code snippets, documents, and conversations never leave your machine. ChromaDB multi-collection architecture provides physical isolation per project space.

⚡ Hybrid Retrieval

BM25 (code-aware tokenizer) + Vector (BGE) + RRF fusion. The custom code_aware_tokenize combines jieba Chinese segmentation with regex code identifier extraction — hitting class names, method signatures, and stack traces with surgical precision.

🚀 Hardware-Optimized

Tuned for NVIDIA RTX 5060 · PyTorch 2.6 · CUDA 12.8. Concurrent batch ingestion with thread-pool workers, incremental mtime-based scanning, and BM25 instance caching that avoids O(N) vocabulary rebuilds on every chat turn.

📄 Multi-Format Parsing

PyMuPDF for PDF, python-docx for Word, Gemini Vision for architecture diagrams/screenshots, and tree-sitter CodeSplitter for structural Java/Vue chunking that preserves class and method boundaries.

🛡️ Graceful Degradation

LLM quota exhausted? The system auto-fails over from Gemini to Qwen (and back) mid-conversation without losing context. BM25 init failure? Falls back to pure vector retrieval. Resilient by design.

🪝 Live File Watching

Watchdog monitors workspace directories with configurable debounce. Files are incrementally re-indexed on change; deleted files are pruned from the vector store. Ghost node cleanup runs on every startup.


🛠️ Tech Stack

LlamaIndex ChromaDB Streamlit HuggingFace Gemini Qwen

Layer Technology Role
Orchestration LlamaIndex 0.12+ Index pipeline, Condense+Context chat engine, RRF fusion
LLM Gemini 2.5 Flash · Qwen-Plus/Max Conversational generation with seamless failover
Embedding BAAI/bge-large-zh-v1.5 (HuggingFace) Local GPU vector embeddings
Reranker BAAI/bge-reranker-v2-m3 (SentenceTransformer) Local GPU cross-encoder reranking
Vector Store ChromaDB (Persistent Client) On-disk persistence, multi-collection isolation
Hybrid Search BM25 + Vector + Reciprocal Rank Fusion Code-aware tokenizer + semantic dual-pathway
UI Streamlit 1.40+ Streaming chat, source trace panel, file upload, ZIP extraction
File Watching Watchdog 6.0+ Debounced incremental indexing on filesystem events
Doc Parsing PyMuPDF · python-docx · tree-sitter PDF, Word, structural code splitting
Tokenization Jieba + custom regex Chinese semantic + code identifier extraction

📂 Project Structure

bendiRAG/
├── main.py                 # Entry point — loads .env, sets cache paths, launches Streamlit
├── app.py                  # Streamlit UI — streaming chat, space management, file upload
├── config.py               # Configuration — .env loading, workspace persistence
├── rag_engine.py           # RAG core — indexing, BM25+Vector retrieval, chat engine
├── watcher.py              # Watchdog daemon — debounced incremental file indexing
├── requirements.txt        # Python dependencies
├── .env.example            # Environment variable template (safe to commit)
├── .env                    # Local secrets (excluded from Git)
├── .gitignore
├── .chroma_second_brain/   # ChromaDB persistent storage
├── dynamic_workspace/      # Uploaded files & ZIP extraction scratch space
├── workspaces.json         # Per-space workspace path registry
└── .index_state_*.json     # Per-space mtime index state (auto-generated)
File Responsibility
main.py Boot: loads .env, syncs HF_HOME / cache env vars, delegates to run_app()
app.py Full UI lifecycle: space switching, streaming output, source trace expander, chat persistence, file upload/ZIP extraction, workspace CRUD
config.py AppConfig dataclass, dotenv loading, workspaces.json persistence, HTTP proxy injection
rag_engine.py RAG infrastructure: GPU embedding/reranker init, BM25+Vector hybrid retrieval, ChromaDB collection management, ghost node cleanup, incremental scan, code-aware chunking, image captioning
watcher.py Watchdog event handler: debounced batch upsert, file deletion sync to vector store

🚀 Quick Start

Prerequisites

1. Environment Setup

# Create and activate conda environment
conda create -n bendirag python=3.11 -y
conda activate bendirag

# Install PyTorch 2.6 with CUDA 12.8 support
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

# Install project dependencies
pip install -r requirements.txt

2. Configuration

# Copy the template
cp .env.example .env

# Edit .env with your API keys
# GEMINI_API_KEY=...          (required)
# DASHSCOPE_API_KEY=...       (optional — Qwen failover)
# WORKSPACE_PATHS=...         (directories to index)

3. Launch

streamlit run main.py
# or: python main.py

Open http://localhost:8501 and start conversing with your codebase.


💡 Usage

First Run

On startup, the app scans all WORKSPACE_PATHS directories, chunks every indexable file (code, docs, images), and ingests them into the vector store. A progress bar shows batch insertion status in real time.

Project Spaces

Action How Effect
Switch space Sidebar dropdown Switches to an isolated ChromaDB collection + chat history
Create space button Spawns a blank collection
Destroy space 🗑️ button Physically deletes collection + index state + chat history

Knowledge Base

  • File upload: Drag & drop files or ZIP archives. ZIPs auto-extract (with Chinese filename encoding fix).
  • Watch directory: Enter a local folder path — one-click full scan + continuous watchdog monitoring.
  • Remove workspace: Click to purge all vector nodes under a directory.

Chat & Source Tracing

Ask natural-language questions like:

  • "Explain the overall architecture and tech stack of this project."
  • "What does the recent change to UserService do?"
  • "Map out the database table dependencies."

Each response includes a source trace panel showing the originating file path and a 300-character code snippet preview.


🔧 Model Switching & Failover

Model Provider Required Key
gemini-2.5-flash Google GEMINI_API_KEY
qwen-plus Aliyun DashScope DASHSCOPE_API_KEY
qwen-max Aliyun DashScope DASHSCOPE_API_KEY

Auto-failover: If the active model returns a quota/resource error (HTTP 429), the system automatically retries with the alternate provider — no manual intervention needed.


🧪 Under the Hood

Code-Aware Tokenizer

def code_aware_tokenize(text: str) -> list[str]:
    code_tokens = re.findall(r"[a-zA-Z0-9_]+", text)        # camelCase / snake_case
    chinese_tokens = [w for w in jieba.lcut(text) if ...]   # Chinese semantics
    return code_tokens + chinese_tokens

BM25 keeps identifiers like getUserById intact instead of splitting them into ["get", "User", "By", "Id"] — a game-changer for code search.

Data Lifecycle

File created   → incremental_upsert_file → update index_state
File modified  → delete_ref_doc → re-insert → update index_state mtime
File deleted   → watchdog on_deleted → delete_ref_doc
Workspace rm   → delete_workspace_nodes → batch purge via index_state
Space destroy  → delete_collection → wipe Chroma + state + chat history
Ghost cleanup  → cleanup_ghost_nodes → scan index_state for vanished files

BM25 Cache Strategy

BM25 retriever construction requires an O(N) scan over the full document vocabulary. The system caches (index_state_mtime, BM25Retriever) globally and only rebuilds when the space's index state file changes — reuse across chat turns, no per-message penalty.


📜 License

MIT License — use freely. Your data never leaves your machine.


🇨🇳 中文说明

Local Brain RAG(bendiRAG)

隐私优先的本地代码与知识库检索助手

核心定位:面向软件工程师的工业级 RAG 系统。所有 Embedding 和 Rerank 推理均在本地 GPU 上完成,代码片段绝不离开你的机器。

五大亮点

  • 隐私隔离 — ChromaDB 多 Collection 物理隔离,销毁空间即完整擦除
  • 混合检索 — BM25(代码分词)+ Vector(BGE 语义)+ RRF 倒数排序融合
  • 硬件压榨 — 针对 RTX 5060 / PyTorch 2.6 / CUDA 12.8 调优,并发批写入 + BM25 缓存复用
  • 多格式解析 — PyMuPDF 解析 PDF、python-docx 解析 Word、Gemini 视觉模型描述截图、tree-sitter 结构化切分 Java/Vue
  • 自动容灾 — Gemini 配额耗尽时自动切换千问接力,对话不中断

快速安装

conda create -n bendirag python=3.11 -y && conda activate bendirag
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
pip install -r requirements.txt
cp .env.example .env   # 编辑填入 GEMINI_API_KEY
streamlit run main.py

技术栈:LlamaIndex 编排 · ChromaDB 向量存储 · Streamlit 界面 · HuggingFace BGE 本地 Embedding/Rerank · Gemini 2.5 Flash + 千问 Plus/Max 双引擎 · Watchdog 文件监听 · Jieba 中文分词


Built with ❤️ for developers who care about privacy and performance.

About

Local Brain RAG is a privacy-first, ultra-optimized offline knowledge base. It enables secure querying of complex code and docs. Powered by a BM25+BGE-Large dual-engine, SBERT reranking, and PyTorch 2.6 GPU acceleration, it ensures precise, hallucination-free AI responses entirely locally.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages