Skip to content

ChiragArora31/VecSearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VectorDB

A vector database built from scratch with Python and NumPy. No FAISS, no sklearn, no external ML libraries.

Quick Start

from vectordb import VectorDB

db = VectorDB(index_type="hnsw", metric="cosine", M=16, ef_search=50)

db.add([0.1, 0.2, 0.3, 0.4], metadata={"label": "cat"})
db.add([0.4, 0.3, 0.2, 0.1], metadata={"label": "dog"})

results = db.search([0.1, 0.2, 0.3, 0.4], top_k=5)
for r in results:
    print(f"ID={r.id}, score={r.score:.4f}, meta={r.metadata}")

db.save("my_database")
loaded = VectorDB.load("my_database")

Index Types

Index Type Recall Query Cost Best For
brute_force Exact 100% O(N×D) Small datasets, ground truth
kdtree Exact 100% O(log N) avg Low dimensions (≤20D)
ivf Approximate Tunable O(n_probe × cluster_size) Large static datasets
hnsw Approximate > 90% O(log N × ef × M) Production workloads

Distance Metrics

  • cosine — Similarity (1 = identical, 0 = orthogonal)
  • dot — Raw dot product
  • euclidean — L2 distance (lower = closer)

REST API

python server/app.py
Method Endpoint Description
GET /health Health check
GET /api/v1/libraries List libraries
POST /api/v1/libraries Create library
DELETE /api/v1/libraries/<name> Delete library
POST /api/v1/libraries/<name>/documents Add document
POST /api/v1/libraries/<name>/search Semantic search
GET /api/v1/libraries/<name>/stats Library stats

Project Structure

vectordb/
├── db.py              # Main facade
├── store.py           # Vector storage
├── distance.py        # Distance metrics
├── persistence.py     # Save/load
├── explain.py         # Result explainability
├── text.py            # TF-IDF vectorizer
├── chunker.py         # Document chunking
└── indexes/
    ├── base.py        # Index interface
    ├── brute_force.py # Linear scan
    ├── kdtree.py      # KD-Tree
    ├── ivf.py         # IVF + k-means
    └── hnsw.py        # HNSW graph
server/
├── app.py             # Flask API
└── seed_data.py       # Demo dataset

Tests

pip install numpy pytest flask
pytest tests/ -v

Benchmarks

python benchmarks/benchmark.py --n 10000 --dim 64

About

A vector DB built from scratch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages