Skip to content

VectorLite is a Rust-native, in-process vector store that brings sub-millisecond search and local embeddings to your AI agents and edge systems.

License

Notifications You must be signed in to change notification settings

mmailhos/vectorlite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VectorLite

Crates.io docs.rs License: Apache 2.0 Rust Tests OpenAPI

A tiny, in-process Rust vector store with built-in embeddings for sub-millisecond semantic search.

VectorLite is a high-performance, in-memory vector database optimized for AI agent and edge workloads.
It co-locates model inference (via Candle) with a low-latency vector index, making it ideal for session-scoped, single-instance, or privacy-sensitive environments.

Why VectorLite?

Feature Description
Sub-millisecond search In-memory HNSW or flat search tuned for real-time agent loops.
Built-in embeddings Runs all-MiniLM-L6-v2 locally using Candle, or any other model of your choice. No external API calls.
Single-binary simplicity No dependencies, no servers to orchestrate. Start instantly via CLI or Docker.
Session-scoped collections Perfect for ephemeral agent sessions or sidecars
Thread-safe concurrency RwLock-based access and atomic ID generation for multi-threaded workloads.
Instant persistence Save or restore collections snapshots in one call.

VectorLite trades distributed scalability for deterministic performance, perfect for use cases where latency mattters more than millions of vectors.

When to Use It

Scenario Why VectorLite fits
AI agent sessions Keep short-lived embeddings per conversation. No network latency.
Edge or embedded AI Run fully offline with model + index in one binary.
Realtime search / personalization Sub-ms search for pre-computed embeddings.
Local prototyping & CI Rust-native, no external services.
Single-tenant microservices Lightweight sidecar for semantic capabilities.

Quick Start

Run from Source

cargo run --bin vectorlite -- --port 3001

# Start with preloaded collection
cargo run --bin vectorlite -- --filepath ./my_collection.vlc --port 3001

Run with Docker

With default settings:

docker build -t vectorlite .
docker run -p 3001:3001 vectorlite

With a different embeddings model and memory-optimized HNSW:

docker build \
  --build-arg MODEL_NAME="sentence-transformers/paraphrase-MiniLM-L3-v2" \
  --build-arg FEATURES="memory-optimized" \
  -t vectorlite-small .

HTTP API Overview

Operation Method & Endpoint Body
Health GET /health
List collections GET /collections
Create collection POST /collections {"name": "docs", "index_type": "hnsw", "metric": "cosine"}
Delete collection DELETE /collections/{name}
Add text POST /collections/{name}/text {"text": "Hello world", "metadata": {...}}
Search (text) POST /collections/{name}/search/text {"query": "hello", "k": 5}
Get vector GET /collections/{name}/vectors/{id}
Delete vector DELETE /collections/{name}/vectors/{id}
Save collection POST /collections/{name}/save {"file_path": "./collection.vlc"}
Load collection POST /collections/load {"file_path": "./collection.vlc", "collection_name": "restored"}

HTTP Clients

You can integrate with Vectorlite using the following HTTP client libraries:

Or simply generate your own from Open API specs.

Index Types

VectorLite supports 2 indexes: Flat and HNSW.

Index Search Complexity Insert Use Case
Flat O(n) O(1) Small datasets (<10K) or exact search
HNSW O(log n) O(log n) Larger datasets or approximate search

See Hierarchical Navigable Small World.

Configuration profiles for HNSW

Profile Features Use Case
default balanced general workloads
memory-optimized reduced precision, smaller graph constrained devices
high-accuracy higher recall, more memory offline re-ranking or research
cargo build --features memory-optimized

Similarity Metrics

A flat index is the most flexible as it allows for all search metric operations. On the other hand, the HNSW index is specifically optimised for a specific distance metric, which will be used for all search operations. When creating a HNSW index, provide a metric value with one of: cosine, euclidean, manhattan or dotproduct.

  • Cosine: Default for normalized embeddings, scale-invariant
  • Euclidean: Geometric distance, sensitive to vector magnitude
  • Manhattan: L1 norm, robust to outliers
  • Dot Product: Raw similarity, requires consistent vector scaling

Rust SDK Example

use vectorlite::{VectorLiteClient, EmbeddingGenerator, IndexType, SimilarityMetric};
use serde_json::json;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut client = VectorLiteClient::new(Box::new(EmbeddingGenerator::new()?));

    client.create_collection("quotes", IndexType::HNSW, Some(SimilarityMetric::Cosine))?;
    
    let id = client.add_text_to_collection(
        "quotes", 
        "I just want to lie on the beach and eat hot dogs",
        Some(json!({
            "author": "Kevin Malone",
            "tags": ["the-office", "s3:e23"],
            "year": 2005,
        }))
    )?;

    // Metric optional - auto-detected from HNSW index
    let results = client.search_text_in_collection(
        "quotes",
        "beach games",
        3,
        None,
    )?;

    for result in &results {
        println!("ID: {}, Score: {:.4}", result.id, result.score);
    }

    Ok(())
}

Testing

Run tests with mock embeddings (CI-friendly, no model files required):

cargo test --features mock-embeddings

Run tests with local models:

cargo test

Download ML Model

This downloads the BERT-based embedding model files needed for real embedding generation:

huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 --local-dir models/all-MiniLM-L6-v2

The model files must be present in the ./models/{model-name}/ directory with the required files:

  • config.json
  • pytorch_model.bin
  • tokenizer.json

Using a different model

You can override the default embedding model at compile time using the custom-model feature:

DEFAULT_EMBEDDING_MODEL="sentence-transformers/paraphrase-MiniLM-L3-v2" cargo build --features custom-model

DEFAULT_EMBEDDING_MODEL="sentence-transformers/paraphrase-MiniLM-L3-v2" cargo run --features custom-model

License

Apache 2.0 License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

VectorLite is a Rust-native, in-process vector store that brings sub-millisecond search and local embeddings to your AI agents and edge systems.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published