NodeDB's vector engine is built for production semantic search — not vectors shoehorned through a SQL planner. It uses a custom HNSW index with multiple quantization levels and hardware-accelerated distance math.
- Semantic search over embeddings (text, images, audio)
- RAG pipelines for AI agents
- Recommendation systems
- Similarity matching at scale
- Indexes: HNSW (in-memory hierarchical) + Vamana / DiskANN (SSD-resident flat-beam, billion-scale on one node). The cost-model planner picks based on collection size + workload signals.
- Quantization frontier:
- SQ8 — Scalar quantization to 8-bit. ~4× memory reduction, minimal recall loss.
- PQ / IVF-PQ — Classic product quantization. ~4-8× / ~16 bytes/vector for very large indexes.
- OPQ — PQ + learned random rotation; minor accuracy bump over
pq. - Binary — Sign-bit Hamming for ultra-cold tiers (no rerank).
- Ternary — BitNet 1.58 trit-pack
{-1, 0, +1}with cold/hot pack and AVX-512 popcnt. - RaBitQ — 1-bit quantization with
O(1/√D)error bound (SIGMOD 2024). Frontier 1-bit primary path. - BBQ — Centroid-asymmetric 1-bit + 14-byte corrective factors + oversample rerank.
- Filtered traversal: NaviX adaptive-local (VLDB 2025) — per-hop switch between standard / directed / blind heuristics by local selectivity. Replaces classic ACORN-1.
- Workload routing: SIEVE — pre-built specialized HNSW subindices for stable predicates (e.g.
tenant_id); planner-routed filtered queries. - Multi-vector: MetaEmbed Matryoshka multivec + ColBERT MaxSim + budgeted PLAID (ICLR 2026);
meta_token_budgetquery option drives test-time scaling. - Adaptive-dim: Matryoshka coarse-to-fine on first-N dims of MRL embeddings via
query_dim. - Streaming updates: SPFresh + LIRE topology-aware local rebalancing (SOSP 2023) — no full-rebuild stalls.
- Adaptive pre-filtering: Roaring Bitmap with automatic pre / post / brute-force strategy selection by selectivity.
- Distance metrics: L2, cosine, negative inner product, Hamming/Jaccard for binary.
- Cross-engine fusion: Combine with graph (GraphRAG), BM25 (hybrid), spatial (geo-aware), array slices, or document predicates — all via roaring-bitmap intersection of cross-engine surrogate IDs.
-- Create a collection with a vector index (vector as side-index on a document collection)
CREATE COLLECTION articles;
CREATE VECTOR INDEX idx_articles_embedding ON articles METRIC cosine DIM 384;
-- Insert documents with embeddings
INSERT INTO articles {
title: 'Understanding Transformers',
embedding: [0.12, -0.34, 0.56, ...]
};
-- Nearest neighbor search
SEARCH articles USING VECTOR(embedding, ARRAY[0.1, 0.3, -0.2, ...], 10);
-- Filtered vector search (adaptive pre-filtering kicks in)
SELECT title, vector_distance(embedding, ARRAY[0.1, 0.3, -0.2, ...]) AS score
FROM articles
WHERE category = 'machine-learning'
AND id IN (
SEARCH articles USING VECTOR(embedding, ARRAY[0.1, 0.3, -0.2, ...], 10)
);
-- Hybrid vector + full-text search (RRF fusion)
SELECT title, rrf_score() AS score
FROM articles
WHERE id IN (
SEARCH articles USING VECTOR(embedding, ARRAY[0.1, 0.3, ...], 10)
)
AND text_match(body, 'transformer attention mechanism')
LIMIT 10;vector_distance(field, query, name => value, ...) accepts a closed set of typed named arguments. Validation is strict — unknown names, duplicate keys, positional 3rd args, or = instead of => all return typed errors.
| Argument | Type | Notes |
|---|---|---|
quantization |
string | none, sq8, pq, binary, ternary, rabitq, bbq, opq |
oversample |
u8 | Candidates fetched before rerank. Default 3. Final rerank set = oversample × ef_search. |
query_dim |
u32 | Coarse-to-fine on first-N dims of Matryoshka embeddings. |
meta_token_budget |
u8 | MetaEmbed multivec MaxSim / PLAID budget. |
ef_search |
u32 | HNSW / Vamana beam width. Default 64. |
target_recall |
f32 | Adaptive recall target — cost-model planner picks oversample and ef_search to hit it. |
Set target_recall and let the planner do the rest unless you need a hard latency ceiling.
By default, vectors are an index attached to a column on a normal collection — the document store is the source of truth. For pure-vector workloads (RAG corpora, recommendation memory, embedding stores) flip a collection into vector-primary mode where the vector index is the primary access path and the document store is a metadata sidecar:
CREATE COLLECTION corpus (
id UUID DEFAULT gen_uuid_v7(),
embedding FLOAT[384],
title TEXT,
tenant_id UUID,
created_at TIMESTAMP DEFAULT now()
) WITH (
primary='vector',
vector_field='embedding',
dim=384,
metric='cosine',
quantization='rabitq',
m=32,
ef_construction=200,
payload_indexes=['tenant_id', 'created_at']
);primary='vector' is purely an access-path hint — not a different engine. Cross-engine queries, CRDT sync, SQL semantics all keep working. payload_indexes are per-field equality / range / boolean indexes over the metadata sidecar for filtered ANN (replaces Pinecone metadata filters).
Default primary='document' is unchanged: classic CREATE VECTOR INDEX ON ... syntax continues to work.
| Codec | Bits/dim | Recall (typ.) | Best For |
|---|---|---|---|
none |
32 | 100% | Small index (< 1M vectors), latency not critical |
sq8 |
8 | ~99% | Balanced default for medium index sizes |
pq |
~2 | ~95% | Large memory-bound indexes; classic Product Quantization |
opq |
~2 | ~96% | PQ + learned random rotation |
rabitq |
1 | ~97% | Frontier 1-bit with O(1/√D) error bound |
bbq |
1 | ~98% | Centroid-asymmetric 1-bit + 14-byte corrective |
binary |
1 | ~85% | Hamming-only, ultra-cold tiers |
ternary |
1.58 | ~96% | BitNet {-1, 0, +1} cold/hot pack |
Vectors are indexed in the HNSW graph at full precision to maintain structural integrity. During search, the engine traverses quantized copies of the graph for speed, then re-ranks the top candidates against full-precision vectors for accuracy.
When a query includes metadata filters (e.g., WHERE category = 'ml'), the engine builds a Roaring Bitmap of matching document IDs and selects the optimal filtering strategy automatically — if the filter is highly selective, it pre-filters before HNSW traversal; if broad, it post-filters after.
The Vector engine is an index. It points at records; it does not carry temporal columns itself. To query embeddings at a point in time, attach a Vector index to a Document collection that has bitemporal = true. The Document collection holds the embedding payload and temporal columns; the Vector index returns candidate IDs; AS OF filtering happens at the Document layer.
See Bitemporal Queries — Bitemporal Vector Search for a worked example.
- Graph — GraphRAG combines vector similarity with graph traversal
- Full-Text Search — Hybrid vector + BM25 fusion
- Spatial — Hybrid spatial-vector search
- Bitemporal Queries — Temporal composition pattern