English | 简体中文
Near-optimal online vector quantization for OpenClaw context compression, based on the TurboQuant algorithm from Google Research (arXiv:2504.19874).
| Component | Status | Description |
|---|---|---|
| Library API | ✅ Ready | Core quantization algorithms fully implemented |
| CLI | ✅ Ready | benchmark, compress, retrieve commands available |
| Agent Skill | ✅ Ready | CLI commands can be used independently by agents |
| Context Engine Plugin | 🚧 WIP | Interface defined, core integration logic not yet implemented |
TurboQuant achieves near-optimal distortion (within ~2.7× of the information-theoretic lower bound) using a simple two-stage pipeline:
- Random Rotation — Apply a random orthogonal matrix (Haar measure via QR decomposition) to spread information uniformly across coordinates.
- Scalar Quantization — Quantize each rotated coordinate independently using a Lloyd-Max codebook optimized for the Beta distribution of coordinates on the unit hypersphere.
Two quantization modes are provided:
| Mode | Use Case | Description |
|---|---|---|
| MSE | Reconstruction | Minimizes mean squared error via Lloyd-Max scalar quantization at b bits per coordinate |
| Product | Inner-product estimation | Uses MSE at (b−1) bits + 1-bit QJL (Quantized Johnson-Lindenstrauss) on the residual for unbiased inner-product estimation |
Requires Python ≥ 3.13 and uv.
# Clone the repository
git clone https://github.com/openclaw/openclaw-turboquant.git
cd openclaw-turboquant
# Install with uv
uv syncimport numpy as np
from openclaw_turboquant import TurboQuantMSE, TurboQuantProd
# MSE quantization (for reconstruction)
mse_q = TurboQuantMSE(dim=128, bit_width=4, seed=42)
x = np.random.randn(128)
compressed = mse_q.quantize(x)
reconstructed = mse_q.dequantize(compressed)
# Inner-product quantization
prod_q = TurboQuantProd(dim=128, bit_width=4, seed=42)
x, y = np.random.randn(128), np.random.randn(128)
cx, cy = prod_q.quantize(x), prod_q.quantize(y)
ip_estimate = prod_q.estimate_inner_product(cx, cy)from openclaw_turboquant.context_engine import ContextStore
store = ContextStore(dim=128, bit_width=4, seed=42)
store.ingest("key1", embedding, "Some text content", metadata={"source": "doc.md"})
# Retrieve top-k similar entries
results = store.retrieve_top_k(query_embedding, k=5)
# Assemble context within token budget
context = store.assemble_context(query_embedding, max_tokens=4096)
# Compact the store (keep 50% most relevant entries)
store.compact(keep_ratio=0.5, query_embedding=query_embedding)# Run benchmarks
openclaw-turboquant benchmark --dim 128 --bits 4
# Compress vectors from a .npy file
openclaw-turboquant compress --input vectors.npy --output compressed.npz --bits 4
# Retrieve similar vectors
openclaw-turboquant retrieve --store compressed.npz --query query.npy --top-k 5Note: The plugin interface is defined but the core integration logic (embedding API calls, Python CLI bridge) is not yet implemented. Contributions welcome!
The plugin/ directory contains a Context Engine plugin that compresses embeddings during the ingest → assemble → compact → afterTurn lifecycle:
plugin/openclaw.plugin.json— Plugin manifest (kind: context-engine)plugin/index.ts— TypeScript entry point registering theturboquant-engine
Configuration options (via plugin settings):
| Parameter | Default | Description |
|---|---|---|
bitWidth |
4 |
Bits per coordinate (1–8) |
embeddingDim |
128 |
Vector dimension |
topK |
10 |
Number of results for retrieval |
compactKeepRatio |
0.5 |
Fraction of entries kept during compaction |
The skills/turboquant/SKILL.md skill provides AI agents with instructions for using the TurboQuant CLI and library API.
After random rotation, each coordinate follows a Beta distribution:
The Lloyd-Max algorithm iteratively optimizes codebook centroids and decision boundaries to minimize expected distortion under this distribution.
For inner-product estimation, TurboQuant uses a 1-bit Quantized Johnson-Lindenstrauss projection:
where
Run with uv run pytest benchmarks/ --benchmark-only:
| Operation | Dimension | Mean |
|---|---|---|
| MSE quantize | 64 | ~4.6 µs |
| MSE dequantize | 64 | ~1.2 µs |
| MSE batch (100 vectors) | 64 | ~473 µs |
| MSE quantize | 256 | ~9.4 µs |
| Product quantize | 64 | ~11 µs |
| Product dequantize | 64 | ~3.6 µs |
| Product inner product | 64 | ~4.1 µs |
| QJL quantize | 64 | ~2.4 µs |
| QJL dequantize | 64 | ~1.2 µs |
| Context Store ingest | 64 | ~12 µs |
| Context Store retrieve (100 entries) | 64 | ~406 µs |
# Run tests
uv run pytest
# Run benchmarks
uv run pytest benchmarks/ --benchmark-only -v
# Lint & format
uv run ruff check src/ tests/ benchmarks/
uv run ruff format src/ tests/ benchmarks/
# Type check
uv run mypy src/MIT