Introduce CachingCollectorManager to parallelize search when using CachingCollector#16247
Introduce CachingCollectorManager to parallelize search when using CachingCollector#16247gaobinlong wants to merge 10 commits into
Conversation
…chingCollector Signed-off-by: Binlong Gao <gbinlong@amazon.com>
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
javanna
left a comment
There was a problem hiding this comment.
Great to see movement in this area, thanks for working on this @gaobinlong ! I left some comments
| private final Integer maxDocsToCache; | ||
|
|
||
| // One CachingCollector per slice, thread-safe for concurrent newCollector() calls. | ||
| private final List<CachingCollector> cachingCollectors = new CopyOnWriteArrayList<>(); |
There was a problem hiding this comment.
looks like this list is needed for the replay functionality. Note that like we discussed in other PRs, newCollector is never called concurrently. It is called sequentially by the coordinating thread which will also call reduce at the end.
What I do worry about when it comes to concurrency though is the isCached mutable flag, that gets modified by the worker threads and accessed by the main thread at the end. There isn't a concurrent access problem with it, but there may be visibility issues. The list does not need to handle concurrency though.
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
|
@javanna all comments are addressed yet, please help to review again, thanks! |
javanna
left a comment
There was a problem hiding this comment.
a couple more comments, this is close! Thanks again!
| () -> | ||
| new CachingCollectorManager<>( | ||
| new TopScoreDocCollectorManager(10, Integer.MAX_VALUE), false, null, null)); | ||
| } |
There was a problem hiding this comment.
could we add also a test for the happy path ?
There was a problem hiding this comment.
Sorry ,where? I mean, these tests never verify normal functioning of the collector manager, calling search against it without exceptions. Or am I not looking in the right place?
There was a problem hiding this comment.
Oh, sorry for the misunderstanding, added a new test method testBasic() to test the happy path, thanks!
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
javanna
left a comment
There was a problem hiding this comment.
Thanks for all the work, I left a new batch of comments, I 'd expect these to be the last ones.
| () -> | ||
| new CachingCollectorManager<>( | ||
| new TopScoreDocCollectorManager(10, Integer.MAX_VALUE), false, null, null)); | ||
| } |
There was a problem hiding this comment.
Sorry ,where? I mean, these tests never verify normal functioning of the collector manager, calling search against it without exceptions. Or am I not looking in the right place?
| addGroupField(doc, groupField, "author3", canUseIDV); | ||
| doc.add(new TextField("content", "random", Field.Store.YES)); | ||
| doc.add(new Field("id", "6", customType)); | ||
| doc.add(new Field("id", "5", customType)); |
There was a problem hiding this comment.
sorry for being pedantic, but can we revert these changes and leave things as they are? Or are these changes necessary? Can they otherwise be made as a followup perhaps?
|
|
||
| * GITHUB#15660: Introduce LargeNumHitsTopDocsCollectorManager to parallelize search when using LargeNumHitsTopDocsCollector. (Binlong Gao) | ||
|
|
||
| * GITHUB#16247: Introduce CachingCollectorManager to parallelize search when using CachingCollector. (Binlong Gao) |
There was a problem hiding this comment.
Can you move this to the 10.6 section please, given 10.5 shipped?
| Integer maxDocsToCache) { | ||
| if (maxRAMMB == null && maxDocsToCache == null) { | ||
| throw new IllegalArgumentException("Either maxRAMMB or maxDocsToCache must be set"); | ||
| } |
There was a problem hiding this comment.
should we also throw if both are non null given only one will be used and the other silently ignored?
There was a problem hiding this comment.
Changed that, throw exception if both are non-null.
| * @param groupSelector a {@link GroupSelector} that defines groups for this GroupingSearch | ||
| * @param grouperFactory a factory that creates fresh {@link GroupSelector} instances | ||
| */ | ||
| public GroupingSearch(GroupSelector<?> groupSelector) { |
There was a problem hiding this comment.
Removing this constructor is a good call, like we previously discussed. Could you perhaps mention this explicitly in the changes entry and the PR description?
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
Description
This PR introduces CachingCollectorManager, switches GroupingSearch to use search concurrency and move away from the deprecated search(Query, Collector) method.
In addition, the useless constructor GroupingSearch(GroupSelector<?> groupSelector) is removed.
Relates to #12892.