feat(PartitionedOutput): Add outputChannels support#1972
Conversation
a884aa0 to
960998f
Compare
e1eb062 to
e335c23
Compare
| }); | ||
| } | ||
|
|
||
| RowVectorPtr OptimizedPartitionedOutput::prepareOutput( |
There was a problem hiding this comment.
This is preparing input, not output. Rename to prepareInput
There was a problem hiding this comment.
Renamed to prepareSerializerInput
| return input; | ||
| } | ||
|
|
||
| std::vector<VectorPtr> outputColumns; |
There was a problem hiding this comment.
outputColumns -> reorderedInputColumns
There was a problem hiding this comment.
Renamed to serializerInputColumns as it is passed to the serializer append() and it only contains the unique columns from output.
| PartitionBuildContext& ctx) { | ||
| auto* rowVector = vector_->as<RowVector>(); | ||
| partitionedChildren_.reserve(rowVector->childrenSize()); | ||
| std::unordered_map<const BaseVector*, PartitionedVectorPtr> |
There was a problem hiding this comment.
Actually, I think the input-output mapping shall be done in PrestoIterativePartitioningSerializer::flushRowChildren(), not in PartitionedVector level. The PartitionedVector is NOT supposed to handle or know the remapping business which should happen in upper levels. Also, the change made here is hard to understand.
| }); | ||
| } | ||
|
|
||
| RowVectorPtr OptimizedPartitionedOutput::prepareOutput( |
There was a problem hiding this comment.
Actually, let's not do the mapping at AddInput time, but at flush time. The place it should happen is PrestoIterativePartitioningSerializer::flushRowChildren().
e335c23 to
f66413c
Compare
|
@yingsu00 I've moved the mapping to flush time. Now the serailizer includes a member outputToInputChannels_ for this mapping, and the input passed to append() is prepared in OptimizedPartitionedOutput to include only the unique columns from the output. |
| input->size(), | ||
| partitions.size(), | ||
| "partitions.size() must equal input->size()"); | ||
|
|
There was a problem hiding this comment.
It's safer to additionaly check the mapped types in append() when outputToInputChannels_ is non-empty. We can validate each mapped channel against input->childrenSize() and validate input->childAt(mapped)->type() against type_->childAt(outputColumn).
| /// Builds the RowVector consumed by the serializer. When the output layout | ||
| /// has duplicated columns, this projects only the distinct columns and | ||
| /// leaves duplication to flush time. | ||
| RowVectorPtr prepareSerializerInput(const RowVectorPtr& input) const; |
There was a problem hiding this comment.
Move this before flush(). For funcitons with the same access level, order them in the order they are called.
|
|
||
| /// Row type passed to serializer_->append(). It only includes distinct | ||
| /// columns from the output layout. | ||
| RowTypePtr serializerInputType_; |
There was a problem hiding this comment.
Move the three new members to after the serializer_ member
|
|
||
| namespace facebook::velox::exec { | ||
|
|
||
| void OptimizedPartitionedOutput::initializeSerializerLayout() { |
There was a problem hiding this comment.
Order the function definitions in the same order in .h
Add outputChannels support in OptimizedPartitionedOutput.