fix(ruvector): CLI works on fresh DBs via meta sidecar (#417)#435
Open
fix(ruvector): CLI works on fresh DBs via meta sidecar (#417)#435
Conversation
Six CLI commands crashed on every fresh database produced by
`ruvector create`:
$ ruvector create /tmp/x.db -d 384
$ ruvector insert /tmp/x.db /tmp/v.json
SyntaxError: Unexpected token 'r', "redb…" is not valid JSON
Root cause: `bin/cli.js` `insert`, `search`, `stats`, `export`, and
`import` all did `JSON.parse(fs.readFileSync(dbPath, 'utf8'))` to
recover the dimension. But `<dbPath>` is a redb (Rust binary) file
managed by `@ruvector/core` — not a JSON document. The first byte
("r") tripped the parser before any other code ran.
Compounding: the same handlers called methods that don't exist on
`VectorDBWrapper` (`db.load`, `db.save`, `db.stats`) and didn't
`await` the async wrapper methods that do exist (`insert`,
`insertBatch`, `search`, `len`).
Fix:
- Persist construction args (dimensions, metric, schema version)
in `<dbPath>.meta.json` from `create`. `insert`/`search`/`stats`
read the sidecar and pass them straight to the wrapper
constructor — no more JSON-parsing of redb bytes.
- Drop calls to the phantom `db.load`/`db.save`/`db.stats` API.
Persistence is automatic via `storagePath`; counting goes through
`await db.len()`.
- Make every CLI handler `async` and `await` the wrapper calls.
Includes `benchmark`, whose previously-dropped promises meant the
reported insert/search rates were just spinner timing.
- Coerce numeric ids to strings inside `insert` (the native binding
rejects integer ids).
- Surface a clear, actionable error when a DB exists without a
sidecar (e.g. created by an older CLI), instead of an opaque
parse failure.
Verified end-to-end with a new test on Node 22.22.2:
$ node test/cli-fresh-db.test.mjs
ok: `ruvector create` exits 0
ok: redb file exists at dbPath
ok: sidecar metadata file exists
ok: sidecar.dimensions = 8
ok: sidecar.metric = cosine
ok: `ruvector insert` exits 0
ok: insert does not crash JSON.parsing the redb binary
ok: `ruvector search` exits 0
ok: search prints `Found N results`
ok: search renders at least one hit row
ok: `ruvector stats` exits 0
ok: stats prints Vector Count
ok: stats fails fast on orphan DB without sidecar
ok: orphan-DB error message mentions sidecar
ruvector fresh-DB CLI smoke OK (issue #417)
Out of scope (deliberately): the `export`/`import` handlers also
called the same phantom API. Those need the wrapper to grow an
enumeration method (`db.entries()` or similar) before they can do
honest work — file-only metadata-export is misleading. Tracked in a
follow-up; the existing handlers are left untouched here.
The ONNX-bundle half of #417 ships in a separate PR (#354).
Closes #417
Co-Authored-By: claude-flow <ruv@ruv.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
<database>path are broken on freshly-created DBs (andembed textfails on missing ONNX bundle) #417 — six CLI commands (insert,search,stats,export,import, plus thebenchmarkrate numbers) crashed on every fresh database produced byruvector create. They tried toJSON.parse(fs.readFileSync(dbPath))to recover the dimension, but<dbPath>is a redb (Rust binary) file managed by@ruvector/core— not JSON. The first byte ("r") tripped the parser. They also called methods that don't exist onVectorDBWrapper(db.load,db.save,db.stats) and never awaited the async ones that do.<database>path are broken on freshly-created DBs (andembed textfails on missing ONNX bundle) #417 (build script not copying WASM payload) ships separately in fix(ruvector): bundle ONNX runtime into dist/ on build (#354) #434.Fix
<dbPath>.meta.jsonfromcreate;insert/search/statsread the sidecar instead of parsing redb bytes.db.load/db.save/db.stats. Persistence is automatic viastoragePath; counting goes throughawait db.len().asyncand awaits the wrapper.benchmarknumbers are now real (previously the dropped promises meant the rate was just spinner timing).insert— the native binding rejects integer ids.test/cli-fresh-db.test.mjsexercises the full create → insert → search → stats path against a real redb file inos.tmpdir().Proof
Reproduction from the issue (
v0.2.25, Node 22.22.2) before this PR:After this PR (
node test/cli-fresh-db.test.mjs):Test plan
createwrites<dbPath>.meta.jsonwithdimensions+metric+ schema version.insertreopens the redb viastoragePath(no JSON.parse), reads dimensions from sidecar, coerces numeric ids, and reportsTotal vectors: Nfromawait db.len().searchreturns real hits (not undefined), respects-k, applies-tthreshold post-hoc.statsprints actual count fromawait db.len().benchmarkwaits on everyawait db.search(), so the reported QPS reflects native completion (not spinner timing).npx ruvector create/insert/searchagainst a 384-dim DB on a clean install (requires the ONNX bundling fix in fix(ruvector): bundle ONNX runtime into dist/ on build (#354) #434 to land fornpx ruvector embed textto also work end-to-end).Out of scope (deliberately)
export/importalso called the phantomdb.save/db.loadAPI. Honest export needs the wrapper to grow an enumeration method (db.entries()or similar) before the handler can do real work — file-only metadata export would mislead users. Those handlers are left untouched here and tracked separately.🤖 Generated with claude-flow