Skip to content

Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter#16268

Draft
slow-J wants to merge 3 commits into
apache:mainfrom
slow-J:lucene-16249-skipper-range-facets
Draft

Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter#16268
slow-J wants to merge 3 commits into
apache:mainfrom
slow-J:lucene-16249-skipper-range-facets

Conversation

@slow-J

@slow-J slow-J commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Resolves #16249

Implementation heavily inspired by HistogramCollector.java.

Range faceting (in the sandbox module -LongRangeFacetCutter) currently reads the doc-values value for every matching document and binary-searches it into an elementary interval. When the faceted field is single-valued, we can use a doc-values skip index. For a dense skip block whose min and max values fall into the same elementary interval, every document in that block maps to that interval, allowing us to skip the per-doc value lookup and binary search.

Limitation - applies to single-valued, long fields only.

Benchmark (luceneutil)

I used my branch of https://github.com/slow-J/luceneutil/tree/github-16249-range-facet-bench which cherry picked 2 of @epotyom 's commits (mainly mikemccand/luceneutil#582 which adds range-facet support)

Setup:
runlocal.py, wikimediumall (33.3M docs), index-sorted by lastMod_skipper with
addDVSkippers=true. baseline = main, candidate = this change, both DURING_COLLECTION, so
the only difference is this optimization. 30 JVM iterations.

Command: python3 -u src/python/localrun.py -s rangeFacetsWikimediumAll -b lucene_baseline -c lucene_candidate -iterations 30 -warmups 20 2>&1 | tee "$BASE/run-timing7.txt"

Edit: benchmark re-ran after the changes for Egors first 2 comments.

QPS

Caveat1: ID tasks have no skip index, so they are effectively noise. (See benchmark skipper setup (I did not fork anything here - LineFileDocs.java#L420)

Task QPS baseline StdDev QPS modified StdDev Pct diff p-value
BrowseLastModOvlpRangeFacets 1.28 (6.6%) 2.91 (13.8%) 128.0% (101% - 158%) 0.000
BrowseLastModRangeFacets 2.26 (8.6%) 3.43 (10.6%) 51.3% (29% - 77%) 0.000
MedTermLastModOvlpRangeFacets 4.15 (11.5%) 6.75 (19.4%) 62.8% (28% - 105%) 0.000
MedTermLastModRangeFacets 4.13 (6.5%) 5.82 (22.0%) 40.9% (11% - 74%) 0.000
BrowseIDOvlpRangeFacets 1.25 (7.3%) 1.34 (9.9%) 7.0% (-9% - 26%) 0.002
BrowseIDRangeFacets 2.29 (7.7%) 3.71 (11.7%) 61.9% (39% - 88%) 0.000
MedTermIDOvlpRangeFacets 3.60 (13.0%) 5.54 (21.0%) 53.8% (17% - 100%) 0.000
MedTermIDRangeFacets 4.39 (11.8%) 6.20 (18.0%) 41.2% (10% - 80%) 0.000

Latency (ms) — aggregated across all iterations

Task P50 B P50 C Diff P90 B P90 C Diff P99 B P99 C Diff P999 B P999 C Diff P100 B P100 C Diff
BrowseLastModOvlpRangeFacets 826.266 359.540 -56.5% 1394.283 726.813 -47.9% 10106.095 2827.303 -72.0% 11526.605 6384.518 -44.6% 11683.219 6865.241 -41.2%
BrowseLastModRangeFacets 472.107 306.359 -35.1% 915.928 634.823 -30.7% 6956.648 1029.674 -85.2% 8229.676 4099.495 -50.2% 8960.695 4180.517 -53.3%
MedTermLastModOvlpRangeFacets 264.338 159.346 -39.7% 575.428 462.476 -19.6% 1116.816 770.457 -31.0% 1582.591 1101.965 -30.4% 1627.100 1158.578 -28.8%
MedTermLastModRangeFacets 252.083 189.934 -24.7% 531.570 492.751 -7.3% 1799.707 1332.096 -26.0% 2276.124 3012.209 +32.3% 2535.658 3039.909 +19.9%
BrowseIDOvlpRangeFacets 852.023 806.610 -5.3% 1363.792 1198.554 -12.1% 10153.711 3935.983 -61.2% 11913.243 6765.965 -43.2% 11955.512 6888.202 -42.4%
BrowseIDRangeFacets 458.901 284.595 -38.0% 865.188 618.945 -28.5% 5781.044 920.304 -84.1% 8220.914 3384.291 -58.8% 9158.615 3877.436 -57.7%
MedTermIDOvlpRangeFacets 299.803 195.602 -34.8% 723.199 566.921 -21.6% 1655.219 957.722 -42.1% 2046.759 1372.295 -33.0% 2176.394 1550.519 -28.8%
MedTermIDRangeFacets 252.566 175.513 -30.5% 672.943 561.913 -16.5% 1789.686 955.330 -46.6% 2255.963 1721.450 -23.7% 2784.861 1772.866 -36.3%

Note: the id tasks speed up despite having no skip index, this due to a routing change which is part of this PR. single-valued segments now use the single-valued cutter instead of always falling to the multi-valued one.

@slow-J slow-J force-pushed the lucene-16249-skipper-range-facets branch from 03d7d2a to 066c419 Compare June 17, 2026 16:03
@github-actions github-actions Bot added this to the 10.5.0 milestone Jun 17, 2026
@slow-J

slow-J commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

I reran benchmarks, this time correctly using localrun, and updated the results in #16268 (comment)

@slow-J slow-J force-pushed the lucene-16249-skipper-range-facets branch from 2e7144b to 0c72d5f Compare June 19, 2026 14:45

@epotyom epotyom left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change! One suggestion below

for (int level = 0; level < skipper.numLevels(); ++level) {
int totalDocsAtLevel = skipper.maxDocID(level) - skipper.minDocID(level) + 1;
if (skipper.docCount(level) != totalDocsAtLevel) {
// Some docs at this level have no value, so we can't resolve the whole block at once.

@epotyom epotyom Jun 19, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think skipper can stil improve performance for this case, as we can still cache ordinal, it is just that in this case we have to always call longValues.advanceExact(doc). If it returns true - we return cached ordinal (and avoid reading long value as well as binary search elementary interval), otherwise return false

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Egor! Good point!
I'll try to address these 2 comments and run new benchmarks.

@slow-J slow-J Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me some time to publish the changes as I managed to initially introduce a regression into range facets without skipper, I done a small refactor while fixing that.

I am getting much better performance after implementing your suggestions, I will update the benchmark results in the top level comment.
Edit: updated benchmark results.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm looking at the new benchmark results, there is an improvement in the ID tasks, which do not have a doc-values skip index. This is due to a change in the latest commit.

id is a single-valued field, but in main, fromLongField never unwraps to a single-valued source, always picking the multi-valued leaf cutter even when the field is single-valued.

We now route single-valued segments to the single-valued cutter instead. The new create(String field, …) keeps the field name, which lets createLeafCutter inspect each segment during search and pick the right cutter.

I have kept this in this PR but its slightly increasing the scope.

@epotyom let me know what you think about this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, nice catch!

This is due to a change in the latest commit.

Does the latest commit also include the interval-tracking rewind change? If so, could you please run benchmarks for the unwrapping change only?

Unwrapping adds a little bit of complexity, but if it improves performance, I think we should keep it.

@slow-J slow-J Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the latest commit also include the interval-tracking rewind change?

Yes, it has all three changes: non-dense fast path, rewind reuse, and unwrapping.

Unwrapping adds a little bit of complexity, but if it improves performance, I think we should keep it.

I'll setup and run a benchmark now just to see the perf diff due to the unwrapping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran: python3 -u src/python/localrun.py -s rangeFacetsWikimediumAll -b lucene_baseline -c lucene_candidate -iterations 30 -warmups 20 2>&1 | tee "$BASE/run9-onlyunwrapping-timing.txt"

Heres the result.

Latency (ms) — aggregated across all iterations

Task P50 B P50 C Diff P90 B P90 C Diff P99 B P99 C Diff P999 B P999 C Diff P100 B P100 C Diff
BrowseIDOvlpRangeFacets 946.928 820.170 -13.4% 1279.722 1178.839 -7.9% 1509.599 3489.425 +131.1% 1579.256 10588.392 +570.5% 1647.232 10662.196 +547.3%
BrowseIDRangeFacets 410.635 296.512 -27.8% 630.174 670.131 +6.3% 869.641 866.369 -0.4% 931.130 8681.360 +832.3% 1008.106 9240.540 +816.6%
BrowseLastModOvlpRangeFacets 381.116 365.262 -4.2% 577.681 639.843 +10.8% 866.191 2046.696 +136.3% 1028.218 10214.331 +893.4% 1029.600 10234.017 +894.0%
BrowseLastModRangeFacets 324.172 317.631 -2.0% 526.130 634.818 +20.7% 808.836 884.852 +9.4% 851.734 9413.995 +1005.3% 887.375 9912.493 +1017.1%
MedTermIDOvlpRangeFacets 213.936 181.524 -15.1% 433.905 437.942 +0.9% 603.100 702.917 +16.6% 679.574 838.447 +23.4% 761.983 848.642 +11.4%
MedTermIDRangeFacets 211.334 171.214 -19.0% 519.706 555.351 +6.9% 735.139 713.797 -2.9% 834.724 836.369 +0.2% 840.486 843.231 +0.3%
MedTermLastModOvlpRangeFacets 176.353 173.086 -1.9% 486.851 537.847 +10.5% 712.630 719.706 +1.0% 830.068 832.742 +0.3% 842.260 840.165 -0.2%
MedTermLastModRangeFacets 188.926 182.942 -3.2% 487.783 553.020 +13.4% 731.373 718.764 -1.7% 830.363 832.185 +0.2% 841.503 834.076 -0.9%

QPS

Task QPS baseline StdDev QPS modified StdDev Pct diff p-value
BrowseLastModRangeFacets 3.26 (5.8%) 3.34 (8.1%) 2.6% (-10% - 17%) 0.155
BrowseLastModOvlpRangeFacets 2.75 (5.1%) 2.85 (6.2%) 3.6% (-7% - 15%) 0.015
MedTermLastModRangeFacets 5.58 (7.9%) 5.80 (8.6%) 3.8% (-11% - 22%) 0.072
MedTermLastModOvlpRangeFacets 5.89 (6.9%) 6.13 (8.2%) 4.1% (-10% - 20%) 0.035
MedTermIDOvlpRangeFacets 4.94 (6.3%) 5.68 (6.6%) 15.0% (1% - 29%) 0.000
MedTermIDRangeFacets 5.19 (9.9%) 6.17 (7.1%) 18.7% (1% - 39%) 0.000
BrowseIDOvlpRangeFacets 1.12 (7.4%) 1.33 (11.9%) 19.1% (0% - 41%) 0.000
BrowseIDRangeFacets 2.52 (4.5%) 3.60 (12.0%) 42.7% (25% - 61%) 0.000

Mainly impacts the ID tasks which do not have a skipper.
But it does seem to cause a worrying latency regression at high percentile latency (p90 and onward).

Hmm, @epotyom what do you think? Since it is not related to the skipper change, I am partial towards removing the unwrapping and retesting performance.

@slow-J slow-J force-pushed the lucene-16249-skipper-range-facets branch from 1065433 to 7db2833 Compare June 23, 2026 10:39
@github-actions github-actions Bot modified the milestones: 10.5.0, 10.6.0 Jun 23, 2026
@slow-J slow-J requested a review from epotyom June 23, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can we use DocValuesSkipper for range facets?

2 participants