Add SIMD-optimized windowed IP path for DAAT_MAXSCORE#1468
Add SIMD-optimized windowed IP path for DAAT_MAXSCORE#1468lyang24 wants to merge 1 commit intozilliztech:mainfrom
Conversation
When metric is IP and filter rate is low, use AVX512 batch accumulation instead of cursor-based scoring. Processes docs in 64K windows (256KB) for cache efficiency. 3.15x speedup on SPLADE 100K, 1.38x on 8.8M. - AVX512 gather/FMA/scatter for posting list accumulation - AVX512 compress-store for candidate extraction - Window-based processing bounds memory to O(64K) per query - Falls back to cursor-based V1 when bitset filter rate >= 50% Signed-off-by: lyang24 <lanqingy93@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: lyang24 <lanqingy93@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: lyang24 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| search_daat_maxscore(q_vec, heap, bitset, computer, approx_params.dim_max_score_ratio); | ||
| // Use SIMD batch path for IP when filter rate is low; high filter rates | ||
| // waste SIMD work on docs that get discarded. | ||
| if (metric_type_ == SparseMetricType::METRIC_IP && bitset.filter_ratio() < 0.5f) { |
There was a problem hiding this comment.
should i make filter ratio a setting with default 0.5?
|
@lyang24 🔍 Important: PR Classification Needed! For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:
For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”. Thanks for your efforts and contribution to the community!. |
When metric is IP and filter rate is low (< 50%), use AVX512 batch accumulation instead of cursor-based scoring. Processes docs in 64K windows (256KB) for cache efficiency. 3.15x speedup on SPLADE 100K, 1.38x on 8.8M.