Compass

Embedded vector + full-text search engine. Single binary, zero external dependencies.

Built by Captain for high-throughput retrieval in on-prem enterprise deployments where customer data cannot leave their VPC.

What it does

Full-text search via Tantivy (BM25) with precomputed bitset faceting (microsecond facets)
Vector search via USearch HNSW (mmap-backed, disk-persistent)
Hybrid search via Reciprocal Rank Fusion (RRF, k=60)
Memory-mapped vector storage. Raw vectors live on disk, not in RAM. Zero-copy reads via mmap.
Disk-backed chunk metadata via redb (pure Rust embedded DB). Handles millions of documents without loading them all into memory.
Incremental HNSW indexing. Adding vectors appends to the index — no full rebuild required.
Named vector spaces ... run multiple embedding models on the same collection
One-click model upgrades with background re-embedding and atomic swap
Parent-child documents with relationship-aware scoring (TAMS video search compatible)
Query-time scoring: recency decay, metadata boosting, relationship boosting
Metadata filtering: exact match, numeric range (gte/lte), array contains, set membership. Typed values (string, int, float, bool, timestamp, string list)
Native query embedding via Candle BGE-small (Rust, no Python). GPU endpoint support for larger models.
Fully offline. No API calls. Model weights on disk. Data never leaves the machine.

Quick start

cargo build --release
./target/release/compass
# Listening on http://localhost:4001

Environment variables: PORT (default 4001), DATA_DIR (default ./data).

Examples

Basic: create, ingest, search

# Create a collection
curl -X POST localhost:4001/collections \
  -H 'Content-Type: application/json' \
  -d '{"name": "docs"}'

# Ingest chunks
curl -X POST localhost:4001/collections/docs/ingest \
  -H 'Content-Type: application/json' \
  -d '{"chunks": [{"file_id": "readme", "chunk_index": 0, "text": "Compass is a search engine"}]}'

# Search
curl -X POST localhost:4001/collections/docs/search \
  -H 'Content-Type: application/json' \
  -d '{"query": "search engine", "mode": "hybrid"}'

Named vector spaces

Create a collection with two embedding models (text + multimodal):

curl -X POST localhost:4001/collections \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "media",
    "vector_spaces": {
      "qwen3": {"dims": 1024, "model": "Qwen/Qwen3-Embedding-8B", "status": "active"},
      "qwen3-vl": {"dims": 896, "model": "Qwen/Qwen3-VL-Embedding-2B", "status": "active"}
    }
  }'

# Search a specific vector space
curl -X POST localhost:4001/collections/media/search \
  -H 'Content-Type: application/json' \
  -d '{"query": "sunset over ocean", "mode": "semantic", "vector_space": "qwen3-vl"}'

Multi-space retrieval + reranking

Embed every document into multiple vector spaces at ingest time. At query time, search one space, multiple spaces, or all of them. A cross-encoder reranker re-scores the merged candidates for maximum accuracy.

Query
  |
  +-- Tantivy BM25 -----------> FTS candidates
  +-- Harrier HNSW -----------> text semantic candidates
  +-- Qwen3-VL HNSW ----------> multimodal candidates
  |
  v
  RRF merge (all three)
  |
  v
  Reranker (cross-encoder re-scores top candidates)
  |
  v
  Filter -> Score (recency, boost, relationships)
  |
  v
  Return top_k

Pick the right retrieval path for the query:

Text query, text docs: search harrier space only
Text query, find images/video: search qwen3-vl space (cross-modal)
Mixed collection, best accuracy: search both spaces, RRF merge, rerank

# Search multiple vector spaces at once (merged via RRF, then reranked)
curl -X POST localhost:4001/collections/media/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "goal celebration slow motion",
    "mode": "hybrid",
    "vector_space": ["harrier", "qwen3-vl"],
    "top_k": 10
  }'

Three retrievers, one reranker, one scoring pipeline. The reranker doesn't care which retriever found the candidate. It just scores (query, text) relevance from scratch.

Recency bias + metadata boosting

Pick a preset to favor newer results. Older docs score lower but never disappear:

curl -X POST localhost:4001/collections/docs/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "quarterly report",
    "recency_preset": "mild",
    "recency_field": "created_at",
    "boosts": [
      {"field": "department", "value": "Legal", "weight": 2.0},
      {"field": "priority", "gte": 3, "weight": 1.5}
    ]
  }'

Four presets. How quickly old docs lose ranking:

  strong bias ◄───────────────► weak bias

  aggressive    recent     mild       archive
  ├── 3d ──┤  ├── 7d ──┤  ├── 30d ──┤  ├── 90d ──┤

Use case	Preset	Docs lose half their recency score after...	Old docs bottom out at...
Real-time alerts, live events, TAMS segments	`aggressive`	3 days	5%
News, feeds, support tickets	`recent`	7 days	20%
Docs, reports, meeting notes	`mild`	30 days	30%
Long-lived content, legal docs, compliance	`archive`	90 days	50%

For full control, use recency instead (overrides any preset):

curl -X POST localhost:4001/collections/docs/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "quarterly report",
    "recency": {"field": "created_at", "half_life_days": 30, "min_score": 0.1}
  }'

Recency decay formula: score *= max(min_score, 2^(-age_days / half_life_days)). A 30-day-old doc scores 0.5x with the default. The field is always user-controlled. Compass never assumes which metadata field represents "time".

Metadata filtering

Filters are hard constraints applied before scoring. Only matching documents are scored and returned.

# Exact match (string, bool, number)
curl -X POST localhost:4001/collections/docs/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "compliance",
    "filters": {"department": "Legal", "active": true}
  }'

# Numeric range (gte/lte)
curl -X POST localhost:4001/collections/docs/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "quarterly report",
    "filters": {"priority": {"gte": 3, "lte": 10}}
  }'

# Array contains
curl -X POST localhost:4001/collections/docs/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "highlights",
    "filters": {"tags": {"contains": "sports"}}
  }'

# Set membership (doc_type, category, etc.)
curl -X POST localhost:4001/collections/docs/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "meeting notes",
    "filters": {"doc_type": {"in": ["segment", "flow"]}}
  }'

All operators: exact match (backward compatible), gte/lte (numeric range), contains (array membership), in (set membership). Operators combine as AND across fields.

Parent-child documents + relationship boost

Ingest a document hierarchy using client_id and parent_ref to link chunks within a single batch:

curl -X POST localhost:4001/collections/media/ingest \
  -H 'Content-Type: application/json' \
  -d '{
    "chunks": [
      {
        "client_id": "src-001",
        "file_id": "video-001",
        "chunk_index": 0,
        "doc_type": "source",
        "text": "Premier League: Arsenal vs Chelsea",
        "metadata": {"asset_type": "video", "created_at": "2026-03-15T15:00:00Z"}
      },
      {
        "client_id": "seg-001",
        "file_id": "segment-001",
        "chunk_index": 0,
        "doc_type": "segment",
        "parent_ref": "src-001",
        "group_id": "src-001",
        "text": "Goal celebration, minute 34",
        "metadata": {"timerange_start": 2040.0, "timerange_end": 2055.0, "scene_type": "goal"}
      }
    ]
  }'

Then search with relationship boosting. Segments whose parent also matches the query get a score boost:

curl -X POST localhost:4001/collections/media/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "goal celebration",
    "relationship_boost": {"parent_weight": 0.3, "sibling_weight": 0.1, "mode": "max"}
  }'

TAMS time-range search

TAMS (Time-Addressable Media Store) is BBC R&D's open spec for media archives. Media is addressed by time, not by file. The data model: Source (logical content) → Flow (specific rendition) → Segment (time-bounded chunk with timerange_start/timerange_end).

Compass models this hierarchy via doc_type + parent_id + group_id. Ingest segments with time range metadata, then query by content and time:

curl -X POST localhost:4001/collections/media/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "goal celebration",
    "filters": {
      "doc_type": {"in": ["segment"]},
      "timerange_start": {"gte": 2040.0},
      "timerange_end": {"lte": 2100.0}
    },
    "relationship_boost": {"parent_weight": 0.3, "sibling_weight": 0.1}
  }'

This finds segments matching "goal celebration" within the 2040-2100 second window. Relationship boosting surfaces sibling segments and the parent flow alongside the match.

Hybrid score weights

Control how FTS and semantic scores blend in hybrid mode. Useful when one signal matters more for your use case:

curl -X POST localhost:4001/collections/docs/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "quarterly earnings",
    "mode": "hybrid",
    "score_weights": {"rrf_k": 60.0, "fts_weight": 2.0, "semantic_weight": 0.5}
  }'

rrf_k is the RRF constant (default 60). Lower values amplify top-rank differences. fts_weight and semantic_weight control relative contribution (default 1.0 each). Set fts_weight: 2.0 to favor keyword matches, or semantic_weight: 2.0 when meaning matters more than exact terms.

Facets

# Get facet counts for all metadata fields
curl 'localhost:4001/collections/docs/facets'

# Facet counts scoped to a text query
curl 'localhost:4001/collections/docs/facets?query=compliance'

One-click model upgrade

# 1. Add a new vector space
curl -X POST localhost:4001/collections/docs/vector-spaces \
  -H 'Content-Type: application/json' \
  -d '{"name": "qwen3", "dims": 1024, "model": "Qwen/Qwen3-Embedding-8B"}'

# 2. Trigger re-embedding (uses external GPU endpoint for speed)
curl -X POST localhost:4001/collections/docs/vector-spaces/qwen3/rebuild \
  -H 'Content-Type: application/json' \
  -d '{"embed_endpoint": "http://gpu-server:8080/embed"}'

# 3. Check progress
curl localhost:4001/collections/docs/vector-spaces/qwen3/status

# 4. Swap the default (zero downtime, old space stays for rollback)
curl -X PUT localhost:4001/collections/docs/default-vector-space \
  -H 'Content-Type: application/json' \
  -d '{"name": "qwen3"}'

# 5. (Optional) Delete old space when you're confident
curl -X DELETE localhost:4001/collections/docs/vector-spaces/default

Docker

docker build -t compass .
docker run -p 4001:4001 -v ./data:/app/data compass

Embedding models

Compass supports any embedding model via named vector spaces. Pick the right model for your use case:

Recommended open-weight models (May 2026)

For text search (documents, code, multilingual), use Harrier or Qwen3:

Model	Score	Benchmark	Dims	License	GPU	When to use
Harrier-OSS-v1-0.6B	~68	MTEB v2	768	MIT	8GB	Default for most deployments. Best quality-per-VRAM.
Qwen3-Embedding-8B	70.58	MTEB v2	32-7168	Apache 2.0	16GB+	When you need top-2 accuracy and have an A100/H100.
Harrier-OSS-v1-27B	74.3	MTEB v2	1024	MIT	48GB+	Maximum accuracy. Requires H100.

For multimodal (text queries finding images, video frames, PDFs):

Model	Score	Benchmark	Dims	License	GPU	When to use
Qwen3-VL-Embedding-2B	0.945	MMEB (cross-modal)	896+	Apache 2.0	8GB+	Best cross-modal accuracy. Handles text + image + video in one space.

For reranking (re-scoring top results after retrieval):

Model	Score	Benchmark	License	GPU	When to use
Qwen3-Reranker-8B	69.76	MTEB-R	Apache 2.0	16GB+	Best open-source reranker for multilingual + code.
Contextual AI Reranker v2	SOTA on QA	Various	Open source	8GB+	Best for Q&A-style retrieval.

CPU-only fallback (no GPU available, degraded mode):

Model	Score	Benchmark	Dims	License	When to use
BGE-small-en-v1.5	~63	MTEB	384	MIT	Local dev, CI/CD tests, or hardware with no GPU. Not recommended for production.

Typical setup

Most deployments need two vector spaces: one for text, one for multimodal (if applicable).

# Run HuggingFace TEI with the recommended text model
docker run -p 8080:80 --gpus all ghcr.io/huggingface/text-embeddings-inference \
  --model-id microsoft/harrier-oss-v1-0.6b

MTEB and MMEB are different benchmarks on different scales. MTEB scores are 0-100 (text tasks). MMEB scores are 0-1 (cross-modal retrieval). They cannot be compared directly.

Storage architecture

Compass keeps vector data and chunk metadata on disk, not in RAM.

data/{collection}/
├── collection.json                  # Collection metadata (name, dims, spaces)
├── relationships.bin                # Parent-child + sibling graph
├── tantivy/                         # BM25 inverted index (disk-backed)
└── vectors/
    ├── {space}.index                # USearch HNSW graph (mmap on read)
    ├── {space}.bin                  # Raw f32 vectors (mmap via MmapVectors)
    └── {space}.keymap               # HNSW key → chunk ID mapping

Vectors: Stored in a flat [u32 dims][u32 count][f32...] file, memory-mapped at query time. Adding vectors appends to the file and remaps — no full rewrite. At 1M vectors × 768 dims this is ~3GB on disk, near-zero RSS.

HNSW index: Built incrementally via USearch .add() + .save(). Loaded via .load() for mutation or .view() for read-only mmap. The graph structure is separate from the raw vectors.

Chunk metadata: Persisted via redb (pure Rust, ACID, MVCC). Point lookups by chunk ID during search result assembly. Batch inserts during ingestion.

Ingestion path: New vectors are appended to the mmap file, inserted into the HNSW graph incrementally, and chunk metadata is written to redb — all without cloning existing data.

Throughput and scaling

Query throughput. USearch HNSW serves around 15k QPS per instance on a 16-core box at p99 < 50ms for top-10 retrieval. For very high QPS workloads, shard collections across multiple Compass instances behind a load balancer.

Indexing throughput. Point embed_endpoint at a GPU-backed HuggingFace TEI or vLLM cluster. A single A10G handles around 1,500 docs/sec on Qwen3-Embedding-8B. Scale linearly by adding GPU replicas.

# Spin up TEI with the recommended text model
docker run -p 8080:80 --gpus all ghcr.io/huggingface/text-embeddings-inference \
  --model-id Qwen/Qwen3-Embedding-8B

# Point Compass at it during rebuild or ingestion
curl -X POST localhost:4001/collections/docs/vector-spaces/qwen3/rebuild \
  -d '{"embed_endpoint": "http://localhost:8080/embed"}'

API

POST   /collections                                    Create collection
GET    /collections                                    List collections
GET    /collections/:name                              Get collection info
DELETE /collections/:name                              Delete collection + data

POST   /collections/:name/ingest                       Bulk ingest chunks
POST   /collections/:name/search                       Search (fts|semantic|hybrid)
GET    /collections/:name/facets                       Facet counts

POST   /collections/:name/vector-spaces                Add a vector space
GET    /collections/:name/vector-spaces                List vector spaces
DELETE /collections/:name/vector-spaces/:space         Remove a vector space
POST   /collections/:name/vector-spaces/:space/rebuild Trigger re-embedding
GET    /collections/:name/vector-spaces/:space/status  Rebuild progress
PUT    /collections/:name/default-vector-space         Switch default space

GET    /health                                         Health check

Contributing

See CONTRIBUTING.md for development setup, PR guidelines, and commit conventions.

Security

To report a vulnerability, email security@runcaptain.com. See SECURITY.md for details.

License

Apache 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
crates		crates
docs		docs
examples/python		examples/python
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compass

What it does

Quick start

Examples

Basic: create, ingest, search

Named vector spaces

Multi-space retrieval + reranking

Recency bias + metadata boosting

Metadata filtering

Parent-child documents + relationship boost

TAMS time-range search

Hybrid score weights

Facets

One-click model upgrade

Docker

Embedding models

Recommended open-weight models (May 2026)

Typical setup

Storage architecture

Throughput and scaling

API

Contributing

Security

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Compass

What it does

Quick start

Examples

Basic: create, ingest, search

Named vector spaces

Multi-space retrieval + reranking

Recency bias + metadata boosting

Metadata filtering

Parent-child documents + relationship boost

TAMS time-range search

Hybrid score weights

Facets

One-click model upgrade

Docker

Embedding models

Recommended open-weight models (May 2026)

Typical setup

Storage architecture

Throughput and scaling

API

Contributing

Security

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages