A vector database built from scratch with Python and NumPy. No FAISS, no sklearn, no external ML libraries.
from vectordb import VectorDB
db = VectorDB(index_type="hnsw", metric="cosine", M=16, ef_search=50)
db.add([0.1, 0.2, 0.3, 0.4], metadata={"label": "cat"})
db.add([0.4, 0.3, 0.2, 0.1], metadata={"label": "dog"})
results = db.search([0.1, 0.2, 0.3, 0.4], top_k=5)
for r in results:
print(f"ID={r.id}, score={r.score:.4f}, meta={r.metadata}")
db.save("my_database")
loaded = VectorDB.load("my_database")
| Index |
Type |
Recall |
Query Cost |
Best For |
brute_force |
Exact |
100% |
O(N×D) |
Small datasets, ground truth |
kdtree |
Exact |
100% |
O(log N) avg |
Low dimensions (≤20D) |
ivf |
Approximate |
Tunable |
O(n_probe × cluster_size) |
Large static datasets |
hnsw |
Approximate |
> 90% |
O(log N × ef × M) |
Production workloads |
cosine — Similarity (1 = identical, 0 = orthogonal)
dot — Raw dot product
euclidean — L2 distance (lower = closer)
| Method |
Endpoint |
Description |
| GET |
/health |
Health check |
| GET |
/api/v1/libraries |
List libraries |
| POST |
/api/v1/libraries |
Create library |
| DELETE |
/api/v1/libraries/<name> |
Delete library |
| POST |
/api/v1/libraries/<name>/documents |
Add document |
| POST |
/api/v1/libraries/<name>/search |
Semantic search |
| GET |
/api/v1/libraries/<name>/stats |
Library stats |
vectordb/
├── db.py # Main facade
├── store.py # Vector storage
├── distance.py # Distance metrics
├── persistence.py # Save/load
├── explain.py # Result explainability
├── text.py # TF-IDF vectorizer
├── chunker.py # Document chunking
└── indexes/
├── base.py # Index interface
├── brute_force.py # Linear scan
├── kdtree.py # KD-Tree
├── ivf.py # IVF + k-means
└── hnsw.py # HNSW graph
server/
├── app.py # Flask API
└── seed_data.py # Demo dataset
pip install numpy pytest flask
pytest tests/ -v
python benchmarks/benchmark.py --n 10000 --dim 64