Skip to content

Refactor sparse vector CLI args to be independent from dense vectors#133

Open
qdrant-cloud-bot wants to merge 1 commit into
devfrom
refactor-sparse-vector-args
Open

Refactor sparse vector CLI args to be independent from dense vectors#133
qdrant-cloud-bot wants to merge 1 commit into
devfrom
refactor-sparse-vector-args

Conversation

@qdrant-cloud-bot

Copy link
Copy Markdown
Contributor

Summary

Previously it was impossible to conveniently create a collection with both dense and sparse vectors in bfb. Sparse vectors reused the dense --dim parameter (or --sparse-dim) together with --sparse-vectors <SPARSITY>, which is awkward since sparse vectors usually have a much larger index range than dense dimensionality.

Sparse vectors are now configured independently from dense vectors:

  • --sparse-vectors — boolean flag to enable sparse vectors (can be combined with dense vectors)
  • --sparse-vocab-size <N> — vocabulary size, i.e. the range of possible indices (default 100000)
  • --sparse-avg-dim <N> — average number of non-zero values per sparse vector (default 32)

--sparse-vocab-size / --sparse-avg-dim imply --sparse-vectors.

Removed

  • --sparse-dim (replaced by --sparse-vocab-size)
  • the sparsity factor argument of --sparse-vectors (replaced by --sparse-avg-dim)

Implementation notes

  • random_sparse_vector now takes vocab_size and avg_dim. It samples a random number of distinct indices around the average (uniform in [1, 2*avg_dim], so the expected length is avg_dim) from the range 1..=vocab_size.
  • Updated collection.rs, upsert.rs, search.rs to use the new Args::use_sparse_vectors(), sparse_vocab_size(), sparse_avg_dim() helpers.
  • Updated README help section.

Test plan

  • cargo build
  • cargo test (23 passed) — added tests for distinct indices and average dim
  • Verified bfb --help output

Example usage with both dense and sparse vectors:

bfb --dim 768 --sparse-vectors --sparse-vocab-size 30000 --sparse-avg-dim 64

Made with Cursor

Previously sparse vectors reused the dense `--dim` parameter (or `--sparse-dim`)
together with a `--sparse-vectors <SPARSITY>` factor, which made it awkward to
create a collection with both dense and sparse vectors since their sizes differ.

Sparse vectors are now configured independently:
- `--sparse-vectors` is a boolean flag to enable sparse vectors
- `--sparse-vocab-size` controls the index range (vocabulary size), default 100k
- `--sparse-avg-dim` controls the average number of non-zero values, default 32

`--sparse-dim` and the sparsity factor are removed; the new options imply
`--sparse-vectors`.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant