Cardinality aggregation dynamic pruning changes#74
Draft
rishabhmaurya wants to merge 2 commits intomainfrom
Draft
Cardinality aggregation dynamic pruning changes#74rishabhmaurya wants to merge 2 commits intomainfrom
rishabhmaurya wants to merge 2 commits intomainfrom
Conversation
rishabhmaurya
commented
Feb 15, 2024
| new SortedSetDocValuesField(fieldName, new BytesRef("5")) | ||
| )); | ||
| }, card -> { | ||
| assertEquals(3.0, card.getValue(), 0); |
Owner
Author
There was a problem hiding this comment.
we should probably add an assertion on how many times collector.collect gets called, which should be 2 when dynamic pruning is applied vs 5 when its not applied?
There was a problem hiding this comment.
I have added test utilities (CountingAggregator) in opensearch-project#11643 with which we can assert the count of collect() calls once the changes are in. I'm asserting similar things in my PR.
This was referenced Feb 15, 2024
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes to experiment with Dynamic pruning for cardinality aggregation described in opensearch-project#11959.
Here is the breakdown of algorithm -
Once preconditions are met, while collectors are created and picked for a given segment, create a
DynamicPruningCollectorWrapperto wrap the collector with optimization.DynamicPruningCollectorWrapperwill enumerate all the terms for the given field and creates aDisjunctionWithDynamicPruningScorersimilar toDisjunctionScorerin lucene in conjunction with the parent query.DisjunctionWithDynamicPruningScorerscorer should have following capabilities in addition to whatDisjunctionScorerhave -#removeAllDISIsOnCurrentDoc()- it removes all the DISIs for subscorer pointing to current doc. This is helpful in dynamic pruning for Cardinality aggregation, where once a term is found, it becomes irrelevant for rest of the search space, so this term's subscorer DISI can be safely removed from list of subscorer to process.#removeAllDISIsOnCurrentDoc()breaks the invariant of Conjuction DISI i.e. the docIDs of all sub-scorers should be ess than or equal to current docID iterator is pointing to. When we remove elements from priority, it results in heapify action, which modifies the top of the priority queye, which represents the current docID for subscorers here. To address this, we are wrapping the iterator withSlowDocIdPropagatorDISIwhich keeps the iterator pointing to last docID before#removeAllDISIsOnCurrentDoc()is called and updates this docID only when next() or advance() is called.When collection of document will start and
DynamicPruningCollectorWrapperis used, it will collect all the documents at once by iterating over all the document from the query created in step 3.Dynamic pruning step when collecting a document - when a match is found, all the terms for a given document will be enumerated and collected for cardinality computation. Once done, the subscorer DISI corresponding to each of these terms collector can be safely removed from the
DisjunctionWithDynamicPruningScorerby callingremoveAllDISIsOnCurrentDoc(). Once all docs are collector, we can straightaway throwCollectionTerminatedExceptionfor early termination of query.Note: to be used only for prototype and reference purpose, not intended to merge to main. It may contain a lot of bugs and definitely doesn't cover all preconditions.
Description
[Describe what this change achieves]
Related Issues
Resolves opensearch-project#11959
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.