base64 dense vector packing#1499
Conversation
be47e77 to
aa4b389
Compare
aa4b389 to
34a8090
Compare
| $doc = $dataset[($i - 1) % $len]; | ||
| $params['body'][] = ['index' => ['_index' => $index]]; | ||
| $params['body'][] = [ | ||
| 'docid' => $doc['docid'], |
There was a problem hiding this comment.
I think we're supposed to leave out the docid in the benchmark.
There was a problem hiding this comment.
I believe that refers to not using the docid as the document id. This puts it as a regular field, which I saw others do as well. I don't think it will make any significant difference with or without this field, anyway.
There was a problem hiding this comment.
For small dataset sizes, it likely doesn't matter, but setting a doc _id can lead to reduced performance because for each insert, Elasticsearch needs to check if there's already a doc with that _id.
There was a problem hiding this comment.
@pquentin this is a docid field added in the document, not the metadata _id.
ezimuel
left a comment
There was a problem hiding this comment.
LGTM, thanks for the PR!
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-9.3 9.3
# Navigate to the new working tree
cd .worktrees/backport-9.3
# Create a new branch
git switch --create backport-1499-to-9.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d06ae765a78d0d4a2a6e3ed6dac68e219cc614c1
# Push it to GitHub
git push --set-upstream origin backport-1499-to-9.3
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-9.3Then, create a pull request where the |
base64 dense vector packing (cherry picked from commit d06ae76)
Elastic\Elasticsearch\Helper\Vectors::packDenseVector()function to pack dense vectors for efficient bulk uploading to Elasticsearch. Also included in this PR is a benchmark for this feature.