bitloops-embeddings is a managed local embeddings runtime for Bitloops. It provides:
- a one-shot CLI for simple embedding requests
- a long-lived local HTTP server for repeated requests
- a long-lived stdio daemon for process-managed IPC
- release packaging for major desktop and server operating systems
The first release is intentionally operational rather than retrieval-quality-complete. It focuses on a stable interface, model bootstrapping, hello-world inference, and releasable artefacts.
The initial public model identifier is bge-m3.
- Public model id:
bge-m3 - Upstream model id:
BAAI/bge-m3 - Backend:
sentence-transformers - Device:
- Apple Silicon macOS:
mpswhen available, otherwise CPU - all other current targets: CPU
- Apple Silicon macOS:
- Provisioning: first-run download into a local cache directory
The command and HTTP layers are written against an internal backend registry so additional models or inference backends can be added later without changing the user-facing contracts.
Current hardware acceleration support is intentionally limited in v0.1.0:
aarch64-apple-darwin:- uses Apple Metal Performance Shaders (
mps) automatically when available - falls back to CPU if MPS is unavailable
- uses Apple Metal Performance Shaders (
x86_64-apple-darwin:- CPU only
x86_64-unknown-linux-gnu:- CPU only
aarch64-unknown-linux-gnu:- CPU only
x86_64-pc-windows-msvc:- CPU only
The current release does not expose CUDA, ROCm, DirectML, or Intel GPU acceleration paths yet.
- Python
3.11or3.12 pip
Create an environment and install the project with development dependencies:
python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"Run the test suite:
pytestShow the available commands:
bitloops-embeddings --helpGenerate a single embedding:
bitloops-embeddings embed --model bge-m3 --input "Hello World"Example response:
{
"model_id": "bge-m3",
"dimensions": 1024,
"embeddings": [[0.123, -0.456, 0.789]],
"runtime": {
"name": "bitloops-embeddings",
"version": "0.1.0"
}
}Write the same JSON response to a file as well:
bitloops-embeddings embed \
--model bge-m3 \
--input "Hello World" \
--output ./embedding.jsonInspect model metadata without loading the model:
bitloops-embeddings describe --model bge-m3Start the local server:
bitloops-embeddings serve --model bge-m3Defaults:
- host:
127.0.0.1 - port:
7719 - max batch size:
32
Override the bind target:
bitloops-embeddings serve --model bge-m3 --host 127.0.0.1 --port 7719Configure logging for long-lived modes:
bitloops-embeddings serve \
--model bge-m3 \
--log-level debug \
--log-file ./bitloops-embeddings.logHealth:
curl http://127.0.0.1:7719/healthEmbed:
curl -X POST http://127.0.0.1:7719/embed \
-H "content-type: application/json" \
-d '{"texts":["Hello World"]}'Response shape:
{
"model_id": "bge-m3",
"dimensions": 1024,
"embeddings": [[0.123, -0.456, 0.789]],
"runtime": {
"name": "bitloops-embeddings",
"version": "0.1.0"
}
}Error shape:
{
"error": {
"code": "runtime_error",
"message": "..."
}
}Start the stdio daemon:
bitloops-embeddings daemon --model bge-m3The daemon:
- loads the model once and keeps it warm
- reads newline-delimited JSON requests from
stdin - writes newline-delimited JSON protocol responses only to
stdout - writes logs and diagnostics to the configured log sink or, if needed, to
stderr
Use a custom log file:
bitloops-embeddings daemon \
--model bge-m3 \
--log-level info \
--log-file ./bitloops-embeddings-daemon.logReady event:
{"event":"ready","protocol":1,"capabilities":["embed","ping","health","shutdown"]}Example request:
{"id":"1","cmd":"embed","texts":["hello","world"],"model":"bge-m3"}Example response:
{"id":"1","ok":true,"vectors":[[0.12,0.98],[-0.44,0.07]],"model":"bge-m3"}Example error:
{"id":"7","ok":false,"error":{"code":"UNKNOWN_COMMAND","message":"unsupported cmd: frobnicate"}}The daemon exits cleanly on shutdown or when stdin reaches EOF.
Model cache resolution order:
--cache-dirBITLOOPS_EMBEDDINGS_CACHE_DIR- platform default cache directory via
platformdirs
Examples:
- macOS:
~/Library/Caches/bitloops-embeddings - Linux:
~/.cache/bitloops-embeddings - Windows:
%LOCALAPPDATA%/bitloops-embeddings/Cache
Release packaging uses PyInstaller --onedir bundles. Each archive contains:
- the launchable runtime bundle
README.mdLICENSE
Create a local packaged artefact:
python scripts/package_release.py --target x86_64-apple-darwinRun the real-model smoke test against an installed console script or packaged executable:
python scripts/real_backend_smoke.py --binary bitloops-embeddingsThe repository includes two workflows:
ci.yml- installs dependencies
- runs unit and integration tests
- runs compile checks
- validates the CLI help output
release.yml- builds native bundles for the target matrix
- packages archives
- uploads artefacts
- creates a GitHub Release for
v*.*.*tags
- The first
embedorserveinvocation downloads model files into the local cache. This can take a while on a cold machine. - The first
daemoninvocation also downloads model files into the local cache if they are not already present. - If model loading fails, check network access to Hugging Face and confirm the cache directory is writable.
- Long-lived modes support
--log-leveland--log-file. Without--log-file,serveanddaemonuse a best-effort OS log sink and fall back tostderrif the native sink is unavailable. - The runtime does not log input texts by default.