Skip to content

Add SIMD-optimized windowed IP path for DAAT_MAXSCORE#1468

Open
lyang24 wants to merge 1 commit intozilliztech:mainfrom
lyang24:maxscore_ip_simd_opt
Open

Add SIMD-optimized windowed IP path for DAAT_MAXSCORE#1468
lyang24 wants to merge 1 commit intozilliztech:mainfrom
lyang24:maxscore_ip_simd_opt

Conversation

@lyang24
Copy link
Contributor

@lyang24 lyang24 commented Feb 14, 2026

When metric is IP and filter rate is low (< 50%), use AVX512 batch accumulation instead of cursor-based scoring. Processes docs in 64K windows (256KB) for cache efficiency. 3.15x speedup on SPLADE 100K, 1.38x on 8.8M.

  • AVX512 gather/FMA/scatter for posting list accumulation
  • AVX512 compress-store for candidate extraction
  • Window-based processing bounds memory to O(64K) per query
  • Falls back to cursor-based V1 when bitset filter rate >= 50%

When metric is IP and filter rate is low, use AVX512 batch accumulation
instead of cursor-based scoring. Processes docs in 64K windows (256KB)
for cache efficiency. 3.15x speedup on SPLADE 100K, 1.38x on 8.8M.

- AVX512 gather/FMA/scatter for posting list accumulation
- AVX512 compress-store for candidate extraction
- Window-based processing bounds memory to O(64K) per query
- Falls back to cursor-based V1 when bitset filter rate >= 50%

Signed-off-by: lyang24 <lanqingy93@gmail.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: lyang24 <lanqingy93@gmail.com>
@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lyang24
To complete the pull request process, please assign cqy123456 after the PR has been reviewed.
You can assign the PR to them by writing /assign @cqy123456 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

search_daat_maxscore(q_vec, heap, bitset, computer, approx_params.dim_max_score_ratio);
// Use SIMD batch path for IP when filter rate is low; high filter rates
// waste SIMD work on docs that get discarded.
if (metric_type_ == SparseMetricType::METRIC_IP && bitset.filter_ratio() < 0.5f) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i make filter ratio a setting with default 0.5?

@mergify
Copy link

mergify bot commented Feb 14, 2026

@lyang24 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants