Skip to content

feat(PartitionedOutput): Add nested RowVector support in the serializer#2045

Open
yingsu00 wants to merge 1 commit into
IBM:optimized_partitionedoutputfrom
yingsu00:NestedRowVec
Open

feat(PartitionedOutput): Add nested RowVector support in the serializer#2045
yingsu00 wants to merge 1 commit into
IBM:optimized_partitionedoutputfrom
yingsu00:NestedRowVec

Conversation

@yingsu00
Copy link
Copy Markdown
Collaborator

No description provided.

@yingsu00 yingsu00 requested a review from xin-zhang2 May 19, 2026 05:29
@yingsu00 yingsu00 self-assigned this May 19, 2026
@libianoss
Copy link
Copy Markdown
Collaborator

@xin-zhang2 xin have you reviewed the PR?

auto n = std::min<vector_size_t>(numRowsPerChunk, numRows);
outputStreams[p]->write(chunkBytes, n * sizeof(T));
numRows -= n;
if (!parentNulls) {
Copy link
Copy Markdown
Member

@xin-zhang2 xin-zhang2 May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The surviving rows are already reflected in numRows, so there is no need to check parentNulls. We can simply use

        while (numRows > 0) {
          auto n = std::min<vector_size_t>(numRowsPerChunk, numRows);
          outputStreams[p]->write(chunkBytes, n * sizeof(T));
          numRows -= n;
        }

regareless of whether parentNulls is null or not.
Then parentNulls can be removed from the function parameters.

childContext.parentNulls[vectorIndex] = nulls;

vector_size_t begin = 0;
for (uint32_t p = 0; p < numPartitions_; ++p) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (uint32_t p : nonEmptyPartitions)

vector_size_t begin = 0;
for (uint32_t p = 0; p < numPartitions_; ++p) {
const vector_size_t end = partitionOffsets[p];
if (outputStreams[p] != nullptr && end > begin) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputStreams[p] != nullptr is unnecessary here.

std::vector<vector_size_t> rowCounts;
std::vector<BufferPtr> parentNulls;
std::vector<std::vector<vector_size_t>> parentNullCounts;
bool hasParentNulls{false};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks hasParentNulls is redudant.
parentNulls is always resized in the code, so parentNulls[vectorIndex] == nullptr is enough to check whether the parent has nulls.

int32_t end,
uint64_t* target,
uint64_t targetBitOffset) {
uint64_t outBit = targetBitOffset;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a few fast paths here. The following change shows a slight improvement in the benchmark for with-null cases.

const uint64_t numBits = end - begin;
  if (mask == nullptr) {
    if (source == nullptr) {
      bits::fillBits(target, targetBitOffset, targetBitOffset + numBits, bits::kNotNull);
    } else {
      bits::copyBits(source, begin, target, targetBitOffset, numBits);
    }
    return static_cast<int32_t>(numBits);
  }

  if (source == nullptr) {
    const int32_t count = bits::countBits(mask, begin, end);
    bits::fillBits(target, targetBitOffset, targetBitOffset + count, bits::kNotNull);
    return count;
  }

  uint64_t outBit = targetBitOffset;
  bits::forEachWord(begin, end, [&](int32_t index, uint64_t wordMask) {
    const uint64_t selected = mask[index] & wordMask;
    if (selected == 0) {
      return;
    }
    const uint64_t packed =
        bits::extractBits<uint64_t>(source ? source[index] : ~0ULL, selected);
    const uint32_t count = __builtin_popcountll(selected);
    appendLowBits(target, outBit, packed, count);
    outBit += count;
  });
  return static_cast<int32_t>(outBit - targetBitOffset);

childVectors.reserve(numVectors);
for (const auto& pv : partitionedVectors) {
childVectors.push_back(
std::dynamic_pointer_cast<PartitionedRowVector>(pv)->childAt(col));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to consider the encoding before casting. A ROW type vector can be encoded as CONSTANT, in which case pv would be a PartitionedConstantVector instead of PartitionedRowVector.
The same issue also applies at line 210.

// parent-live rows, which the parent recorded in context.rowCounts. Only
// partitions that have nulls at this level need a compacted bitmap; the
// rest use sequential offsets and no null section.
std::vector<BufferPtr> nullsByPartition(numPartitions_);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it make sense to split the footer writing into separate functions? Since different encodings may share the same wire-format pieces, such as offsets and null bitmaps, this logic might be reusable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants