A fully containerized, on-premise solution to perform Hybrid Search (Semantic + Keyword) on your local documents. This project allows you to ingest files and compare the search relevance and performance latency between Qdrant (using SPLADE) and Elasticsearch (using BM25).
- 100% Local & On-Premise: No Cloud APIs or API keys required. Models run locally on your CPU.
- Dual Vector Stores:
- Qdrant: Uses Dense Vectors (
bge-small) + Sparse Vectors (SPLADE) with Reciprocal Rank Fusion (RRF). - Elasticsearch: Uses Dense Vectors (
bge-small) + Keyword Search (BM25) with Linear Combination.
- Qdrant: Uses Dense Vectors (
- File Ingestion: Supports PDF, TXT, and CSV via Unstructured and LangChain.
- Performance Benchmarking: Real-time measurement of ingestion speed and search latency in milliseconds.
- Modern UI: React-based split-screen comparison to visually validate search results.
The solution uses a Microservices architecture orchestrated via Docker Compose.
| Service | Technology | Internal Port | External Port | Description |
|---|---|---|---|---|
| Frontend | React + Vite (Node 22) | 3000 | 4300 | The Web User Interface |
| Backend | FastAPI (Python 3.12) | 8000 | 4800 | Logic, Embeddings, OCR |
| Qdrant | Qdrant Vector DB | 6333 | 4633 | Vector Store 1 |
| Elastic | Elasticsearch 9.2 | 9200 | 4920 | Vector Store 2 |
- Docker Desktop (or Docker Engine + Compose plugin)
- 4GB+ RAM available (Elasticsearch and Embedding models require memory).
-
Clone the Repository
git clone Hybrid-Search cd Hybrid-Search -
Build and Run
docker-compose up --build
First run will take a few minutes to download the base images and the Embedding models.
-
Access the Application Open your browser to: http://localhost:4300
- Click "Data Ingestion".
- Drag and drop or select PDFs or Text files.
- Click "Start Ingestion Benchmark".
- The system will:
- OCR/Parse the files.
- Split text into chunks.
- Generate Dense Vectors (Semantics).
- Generate Sparse Vectors (Keywords).
- Write to both Qdrant and Elasticsearch.
- View the write latency comparison.
- Click "Hybrid Search".
- Type a query (e.g., "What are the termination conditions?").
- Click "Compare Performance".
- View side-by-side results:
- Qdrant (Pink): Results sorted by RRF fusion of Semantic + SPLADE scores.
- Elasticsearch (Teal): Results sorted by Linear Combination of Semantic + BM25 scores.
- Dense:
BAAI/bge-small-en-v1.5(384 dimensions). - Sparse:
prithivida/Splade_PP_en_v1(Learned Sparse Representations). - Models are cached locally in the
./models_cachefolder to prevent re-downloading.
-
Qdrant:
models.FusionQuery(fusion=models.Fusion.RRF)
Combines dense and sparse vector results using Reciprocal Rank Fusion.
-
Elasticsearch: Uses a standard Hybrid approach compatible with the Basic License:
{ "knn": { ...vector_search... }, "query": { "match": { "text": ...keyword_search... } } }
1. libGL.so.1: cannot open shared object file
- This indicates the Backend Docker container wasn't built correctly.
- Fix: Run
docker-compose up --buildto ensurelibgl1is installed viaapt-get.
2. Frontend connection error / Network Warning
- If you see "Network request failed", ensure you are accessing
http://localhost:4300. - Chrome may warn about "Private Network Access". This is normal for local development tools accessing local APIs.
3. Elasticsearch exits with code 137
- This means OOM (Out of Memory).
- Fix: Increase Docker Desktop memory limit to at least 4GB.
hybrid_search/
├── docker-compose.yml # Orchestration & Port Config
├── models_cache/ # Local storage for AI models
├── backend/
│ ├── Dockerfile # Python env with OCR libs
│ ├── requirements.txt
│ └── app/
│ ├── main.py # API Routes
│ ├── services.py # Business Logic (Ingest/Search)
│ ├── ml_engine.py # Model Loader
│ ├── database.py # Qdrant Setup
│ ├── elastic_db.py # Elastic Setup
│ └── config.py # Settings
└── frontend/
├── Dockerfile # Node Alpine with Cert fix
└── src/
└── App.jsx # React UI Components