Skip to content

base64 dense vector packing#1499

Merged
miguelgrinberg merged 2 commits intomainfrom
base64-dense-vector-packing
Jan 14, 2026
Merged

base64 dense vector packing#1499
miguelgrinberg merged 2 commits intomainfrom
base64-dense-vector-packing

Conversation

@miguelgrinberg
Copy link
Copy Markdown
Contributor

@miguelgrinberg miguelgrinberg commented Dec 22, 2025

Elastic\Elasticsearch\Helper\Vectors::packDenseVector() function to pack dense vectors for efficient bulk uploading to Elasticsearch. Also included in this PR is a benchmark for this feature.

@miguelgrinberg miguelgrinberg force-pushed the base64-dense-vector-packing branch 3 times, most recently from be47e77 to aa4b389 Compare December 23, 2025 10:06
@miguelgrinberg miguelgrinberg force-pushed the base64-dense-vector-packing branch from aa4b389 to 34a8090 Compare December 23, 2025 10:35
@miguelgrinberg miguelgrinberg marked this pull request as ready for review December 23, 2025 10:56
$doc = $dataset[($i - 1) % $len];
$params['body'][] = ['index' => ['_index' => $index]];
$params['body'][] = [
'docid' => $doc['docid'],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're supposed to leave out the docid in the benchmark.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that refers to not using the docid as the document id. This puts it as a regular field, which I saw others do as well. I don't think it will make any significant difference with or without this field, anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For small dataset sizes, it likely doesn't matter, but setting a doc _id can lead to reduced performance because for each insert, Elasticsearch needs to check if there's already a doc with that _id.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pquentin this is a docid field added in the document, not the metadata _id.

Copy link
Copy Markdown
Contributor

@ezimuel ezimuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR!

@miguelgrinberg miguelgrinberg merged commit d06ae76 into main Jan 14, 2026
34 checks passed
@miguelgrinberg miguelgrinberg deleted the base64-dense-vector-packing branch January 14, 2026 17:11
@github-actions
Copy link
Copy Markdown

The backport to 9.3 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-9.3 9.3
# Navigate to the new working tree
cd .worktrees/backport-9.3
# Create a new branch
git switch --create backport-1499-to-9.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d06ae765a78d0d4a2a6e3ed6dac68e219cc614c1
# Push it to GitHub
git push --set-upstream origin backport-1499-to-9.3
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-9.3

Then, create a pull request where the base branch is 9.3 and the compare/head branch is backport-1499-to-9.3.

miguelgrinberg added a commit that referenced this pull request Jan 14, 2026
base64 dense vector packing

(cherry picked from commit d06ae76)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants