[NEMO-472] Implement Intermediate Combine#318
Conversation
implemented compile time pass
common/src/main/java/org/apache/nemo/common/ir/IRDAGChecker.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/apache/nemo/common/ir/IRDAGChecker.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/nemo/common/ir/vertex/executionproperty/ShuffleExecutorSetProperty.java
Outdated
Show resolved
Hide resolved
| * @param <OutputT> output type | ||
| */ | ||
| public final class GBKTransform<K, InputT, OutputT> | ||
| public final class CombineTransform<K, InputT, OutputT> |
There was a problem hiding this comment.
Could you explain why the class name is changed?
There was a problem hiding this comment.
It's because GroupByKey Transform in Beam semantics can be represented as Combine PerKey Transform, and I thought that this class represents Combine PerKey rather than GroupByKey.
There was a problem hiding this comment.
As far as I know, GroupByKey Transform is not always be represented as Combine PerKey Transform, so changing the name is confusing to me. For instance, CoGroupByKey is not combining, but it is also represented as GroupByKey as far as I know.
There was a problem hiding this comment.
We actually have a separate GroupByKeyTransform. We have renamed the GBKTransform since it actually works as a CombineTransform. It was misnamed in the first place.
|
Kudos, SonarCloud Quality Gate passed!
|
|
@taegeonum Thanks for the review! I've addressed your comments. |
...main/java/org/apache/nemo/common/ir/vertex/executionproperty/ShuffleExecutorSetProperty.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/nemo/common/ir/vertex/executionproperty/ShuffleExecutorSetProperty.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/nemo/common/ir/vertex/executionproperty/ShuffleExecutorSetProperty.java
Outdated
Show resolved
Hide resolved
...on/src/main/java/org/apache/nemo/common/ir/vertex/utility/IntermediateAccumulatorVertex.java
Outdated
Show resolved
Hide resolved
| import java.util.HashSet; | ||
|
|
||
| /** | ||
| * List of set of node names to limit the scheduling of the tasks of the vertex to while shuffling. |
There was a problem hiding this comment.
This explanation is not clear to me. Does this property set the destination executor for the output of intermediate vertex?
There was a problem hiding this comment.
It limits the sources of the data that each task reads from, depending on where the task is located at. I'll add the explanation.
| * @param <OutputT> output type | ||
| */ | ||
| public final class GBKTransform<K, InputT, OutputT> | ||
| public final class CombineTransform<K, InputT, OutputT> |
There was a problem hiding this comment.
As far as I know, GroupByKey Transform is not always be represented as Combine PerKey Transform, so changing the name is confusing to me. For instance, CoGroupByKey is not combining, but it is also represented as GroupByKey as far as I know.
...nemo/compiler/optimizer/pass/compiletime/reshaping/IntermediateAccumulatorInsertionPass.java
Outdated
Show resolved
Hide resolved
...nemo/compiler/optimizer/pass/compiletime/reshaping/IntermediateAccumulatorInsertionPass.java
Outdated
Show resolved
Hide resolved
|
@Kangji Any update? |
not yet... :( It has been delayed due to the fall semester, even though i'm trying to do asap. I'll let you know. |
|
Kudos, SonarCloud Quality Gate passed!
|
|
@taegeonum Can you take a final look? |








JIRA: NEMO-472: Fix and Implement Hierarchical Aggregation
Major changes:
[NEMO-472: Implement Hierarchical Aggregation] aims to add additional intermediate accumulation operator in front of final combine operator that accumulates data among physically nearby containers prior to shuffling across WAN, when needed. It is expected that data aggregation among nearby containers will reduce the data size that must be transferred across WAN. To achieve it,
Minor changes to note:
Tests for the changes:
Other comments: