Skip to content

Introduce TermGroupFacetCollectorManager for concurrent grouped faceting#16292

Open
javanna wants to merge 3 commits into
apache:mainfrom
javanna:enhancement/term_group_facet_collector_manager
Open

Introduce TermGroupFacetCollectorManager for concurrent grouped faceting#16292
javanna wants to merge 3 commits into
apache:mainfrom
javanna:enhancement/term_group_facet_collector_manager

Conversation

@javanna

@javanna javanna commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

The old TermGroupFacetCollector could not be adapted to CollectorManager:
its cross-segment deduplication was fundamentally sequential. At
doSetNextReader(), all previously seen (group, facet) pairs were
re-looked up in each new segment's ordinal dictionary to rebuild the
per-segment SentinelIntSet. That O(pairs × segments) walk cannot be
split across concurrent slices, and every unique hit allocated a
BytesRef copy in the collection hot path.

The replacement design tracks (groupOrd, facetOrd) pairs as packed
longs in a per-slice LongHashSet during collection — no BytesRef
allocations in the hot path. Ordinals are translated to terms once per
segment in LeafCollector.finish(), and reduce() performs global
deduplication across all slices via a HashSet,
correctly counting groups whose documents span multiple search slices.

All 8 deprecated IndexSearcher#search(Query, Collector) call sites in
TestGroupFacetCollector are removed as a result.

TermGroupFacetCollector is now deprecated in favour of TermGroupFacetCollectorManager
The base class GroupFacetCollector is also now deprecated.

Relates to #12892.

javanna added 2 commits June 24, 2026 09:21
The old TermGroupFacetCollector could not be adapted to CollectorManager:
its cross-segment deduplication was fundamentally sequential. At
doSetNextReader(), all previously seen (group, facet) pairs were
re-looked up in each new segment's ordinal dictionary to rebuild the
per-segment SentinelIntSet. That O(pairs × segments) walk cannot be
split across concurrent slices, and every unique hit allocated a
BytesRef copy in the collection hot path.

The replacement design tracks (groupOrd, facetOrd) pairs as packed
longs in a per-slice LongHashSet during collection — no BytesRef
allocations in the hot path. Ordinals are translated to terms once per
segment in LeafCollector.finish(), and reduce() performs global
deduplication across all slices via a HashSet<GroupFacetPair>,
correctly counting groups whose documents span multiple search slices.

All 8 deprecated IndexSearcher#search(Query, Collector) call sites in
TestGroupFacetCollector are removed as a result.

TermGroupFacetCollector is now deprecated in favour of TermGroupFacetCollectorManager
The base class GroupFacetCollector is also now deprecated.

Relates to apache#12892.
@javanna javanna added this to the 10.6.0 milestone Jun 24, 2026
@javanna javanna changed the title Enhancement/term group facet collector manager Introduce TermGroupFacetCollectorManager for concurrent grouped faceting Jun 24, 2026
*
* @lucene.experimental
*/
public class GroupedFacetResult {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporarily duplicated: same code is also in within the now deprecated GroupFacetCollector. The idea is to backport this change cleanly to 10.x, then remove the deprecated classes from main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant