Refactor sparse vector CLI args to be independent from dense vectors#133
Open
qdrant-cloud-bot wants to merge 1 commit into
Open
Refactor sparse vector CLI args to be independent from dense vectors#133qdrant-cloud-bot wants to merge 1 commit into
qdrant-cloud-bot wants to merge 1 commit into
Conversation
Previously sparse vectors reused the dense `--dim` parameter (or `--sparse-dim`) together with a `--sparse-vectors <SPARSITY>` factor, which made it awkward to create a collection with both dense and sparse vectors since their sizes differ. Sparse vectors are now configured independently: - `--sparse-vectors` is a boolean flag to enable sparse vectors - `--sparse-vocab-size` controls the index range (vocabulary size), default 100k - `--sparse-avg-dim` controls the average number of non-zero values, default 32 `--sparse-dim` and the sparsity factor are removed; the new options imply `--sparse-vectors`. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Previously it was impossible to conveniently create a collection with both dense and sparse vectors in
bfb. Sparse vectors reused the dense--dimparameter (or--sparse-dim) together with--sparse-vectors <SPARSITY>, which is awkward since sparse vectors usually have a much larger index range than dense dimensionality.Sparse vectors are now configured independently from dense vectors:
--sparse-vectors— boolean flag to enable sparse vectors (can be combined with dense vectors)--sparse-vocab-size <N>— vocabulary size, i.e. the range of possible indices (default100000)--sparse-avg-dim <N>— average number of non-zero values per sparse vector (default32)--sparse-vocab-size/--sparse-avg-dimimply--sparse-vectors.Removed
--sparse-dim(replaced by--sparse-vocab-size)--sparse-vectors(replaced by--sparse-avg-dim)Implementation notes
random_sparse_vectornow takesvocab_sizeandavg_dim. It samples a random number of distinct indices around the average (uniform in[1, 2*avg_dim], so the expected length isavg_dim) from the range1..=vocab_size.collection.rs,upsert.rs,search.rsto use the newArgs::use_sparse_vectors(),sparse_vocab_size(),sparse_avg_dim()helpers.Test plan
cargo buildcargo test(23 passed) — added tests for distinct indices and average dimbfb --helpoutputExample usage with both dense and sparse vectors:
Made with Cursor