Skip to content

TOOLS: matrix generator shuffle columns#1251

Merged
janjust merged 1 commit intoopenucx:masterfrom
yaeliyac:shuffle_matrix_new
Apr 9, 2026
Merged

TOOLS: matrix generator shuffle columns#1251
janjust merged 1 commit intoopenucx:masterfrom
yaeliyac:shuffle_matrix_new

Conversation

@yaeliyac
Copy link
Copy Markdown
Contributor

@yaeliyac yaeliyac commented Jan 5, 2026

What

Add config option to shuffle matrix columns each iteration in matrix generator

@yaeliyac yaeliyac force-pushed the shuffle_matrix_new branch 3 times, most recently from f3382cd to f7b3da2 Compare January 5, 2026 18:23
@yaeliyac yaeliyac changed the title TOOLS: Shuffle matrix cols TOOLS: matrix generator shuffle columns Jan 5, 2026
@yaeliyac yaeliyac marked this pull request as ready for review January 5, 2026 18:23
@yaeliyac yaeliyac force-pushed the shuffle_matrix_new branch from f7b3da2 to b212aa0 Compare January 5, 2026 18:27
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jan 7, 2026

Greptile Summary

This PR adds a shuffle config option to the traffic matrix generator that swaps a random pair of columns on each iteration, producing varied traffic patterns. The config parsing, validation, and buffer pre-allocation logic are all correct.

  • First iteration always unshuffled (P1): reset() sets current_counts = base_counts and calls setup_counts_displs() directly without applying a shuffle. Because the benchmark loop is for (reset(); has_next(); next()), the body runs once with base counts before next() is ever called. With the default nrep=1, a user enabling shuffle=1 will observe zero shuffling.

Confidence Score: 4/5

The shuffle feature works correctly for nrep > 1, but with the default nrep=1 the first (and only) iteration always uses unshuffled base counts, making the feature a no-op in the common case.

One P1 finding: reset() never applies a shuffle, so the first iteration after reset always uses the base matrix. With nrep=1 (the default), shuffle=1 has no observable effect. All other findings are P2. Score is 4 rather than 5 because the P1 directly defeats the feature's primary purpose for most users.

tools/perf/generator/ucc_pt_generator_traffic_matrix.cc — specifically the reset() and next() interaction

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename Overview
tools/perf/generator/ucc_pt_generator_traffic_matrix.cc Core shuffle logic: adds column-swap per iteration in next(), updates get_dst_count_max() for shuffle case, and wires current_counts throughout — mostly correct, but note first-iteration-after-reset always uses unshuffled base counts regardless of shuffle flag
tools/perf/generator/ucc_pt_generator.h Adds shuffle field and base_counts/current_counts members, and updates constructor signature — changes are correct and well-scoped
tools/perf/ucc_pt_config.cc Adds shuffle parameter parsing with 0/1 validation and default of 0; updates help text — clean and correct
tools/perf/ucc_pt_benchmark.cc Passes shuffle config field to generator constructor — trivial, correct
tools/perf/ucc_pt_config.h Adds shuffle int field to ucc_pt_gen_traffic_matrix_config struct — correct

Reviews (11): Last reviewed commit: "TOOLS: matrix generator shuffle columns" | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@yaeliyac yaeliyac marked this pull request as draft January 7, 2026 14:29
@yaeliyac yaeliyac marked this pull request as ready for review January 11, 2026 08:14
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment thread tools/perf/generator/ucc_pt_generator_traffic_matrix.cc
Comment thread tools/perf/generator/ucc_pt_generator_traffic_matrix.cc Outdated
Comment thread tools/perf/generator/ucc_pt_generator_traffic_matrix.cc Outdated
Comment thread tools/perf/ucc_pt_config.cc
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jan 11, 2026

Additional Comments (1)

tools/perf/generator/ucc_pt_generator_traffic_matrix.cc
The get_count_max() function uses pattern_counts[current_pattern] but should use current_counts when shuffle is enabled.

When shuffle == 1, the actual matrix being used is stored in current_counts (the shuffled version), not in pattern_counts[current_pattern]. This function is called during benchmark execution (line 150 in ucc_pt_benchmark.cc) to determine whether to use small or large iteration counts, so using the wrong matrix will give incorrect results.

size_t ucc_pt_generator_traffic_matrix::get_count_max()
{
    const auto &matrix    = current_counts;
    size_t      max_count = 0;
    size_t      cur_row_col;

    for (int i = 0; i < comm_size; i++) {
        cur_row_col = 0;
        for (int j = 0; j < comm_size; j++) {
            cur_row_col += matrix[i * comm_size + j];
        }
        if (cur_row_col > max_count) {
            max_count = cur_row_col;
        }
    }

    for (int i = 0; i < comm_size; i++) {
        cur_row_col = 0;
        for (int j = 0; j < comm_size; j++) {
            cur_row_col += matrix[j * comm_size + i];
        }
        if (cur_row_col > max_count) {
            max_count = cur_row_col;
        }
    }
    return max_count;
}

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment thread ucc.pc Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jan 11, 2026

Additional Comments (2)

tools/perf/generator/ucc_pt_generator_traffic_matrix.cc
This function uses pattern_counts[current_pattern] but should use current_counts when shuffle is enabled, causing critical data corruption.

When shuffle == 1, the next() function modifies current_counts to contain the shuffled matrix (lines 342-361). However, setup_counts_displs() ignores this and reads from pattern_counts[current_pattern] instead, which contains the un-shuffled base matrix.

This causes src_counts and dst_counts to be set up with the wrong values - they'll reflect the base matrix instead of the shuffled matrix, breaking the entire shuffle feature and causing incorrect ALLTOALLV communication patterns.

void ucc_pt_generator_traffic_matrix::setup_counts_displs()
{
    const auto &counts = current_counts;

    if (counts.size() < comm_size * comm_size) {
        throw std::runtime_error(
            "Pattern size (" + std::to_string(counts.size()) +
            ") is less than comm_size*comm_size (" +
            std::to_string(comm_size * comm_size) + ")");
    }

    for (int i = 0; i < comm_size; i++) {
        src_counts[i] = counts[rank_id * comm_size + i];
    }

    size_t displ = 0;
    for (int i = 0; i < comm_size; i++) {
        src_displs[i] = displ;
        displ += src_counts[i];
    }

    for (int i = 0; i < comm_size; i++) {
        dst_counts[i] = counts[i * comm_size + rank_id];
    }

    displ = 0;
    for (int i = 0; i < comm_size; i++) {
        dst_displs[i] = displ;
        displ += dst_counts[i];
    }
}

tools/perf/generator/ucc_pt_generator_traffic_matrix.cc
This function uses pattern_counts[current_pattern] but should use current_counts when dealing with the actual matrix being benchmarked.

The function get_count_max() is called during benchmark execution (line 150 in ucc_pt_benchmark.cc) to determine whether to use small or large iteration counts. When shuffle == 1, the actual matrix being used is stored in current_counts (the shuffled version), not in pattern_counts[current_pattern].

Using the wrong matrix will cause the benchmark to make incorrect decisions about iteration counts, potentially using the wrong warmup/iteration settings for the actual data being transferred.

size_t ucc_pt_generator_traffic_matrix::get_count_max()
{
    const auto &matrix    = current_counts;
    size_t      max_count = 0;
    size_t      cur_row_col;

    for (int i = 0; i < comm_size; i++) {
        cur_row_col = 0;
        for (int j = 0; j < comm_size; j++) {
            cur_row_col += matrix[i * comm_size + j];
        }
        if (cur_row_col > max_count) {
            max_count = cur_row_col;
        }
    }

    for (int i = 0; i < comm_size; i++) {
        cur_row_col = 0;
        for (int j = 0; j < comm_size; j++) {
            cur_row_col += matrix[j * comm_size + i];
        }
        if (cur_row_col > max_count) {
            max_count = cur_row_col;
        }
    }
    return max_count;
}

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment thread tools/perf/ucc_pt_config.cc
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jan 11, 2026

Additional Comments (2)

tools/perf/generator/ucc_pt_generator_traffic_matrix.cc
This function uses pattern_counts[current_pattern] but must use current_counts when shuffle is enabled, causing critical data corruption.

When shuffle == 1, the next() function modifies current_counts to contain the shuffled matrix (lines 342-361). However, setup_counts_displs() ignores this and reads from pattern_counts[current_pattern] instead, which contains the un-shuffled base matrix.

This causes src_counts and dst_counts to be set up with the wrong values - they'll reflect the base matrix instead of the shuffled matrix, completely breaking the shuffle feature and causing incorrect ALLTOALLV communication patterns.

    const auto &counts = (shuffle == 1) ? current_counts : pattern_counts[current_pattern];

tools/perf/generator/ucc_pt_generator_traffic_matrix.cc
This function uses pattern_counts[current_pattern] but must use current_counts when shuffle is enabled.

When shuffle == 1, the actual matrix being used is stored in current_counts (the shuffled version), not in pattern_counts[current_pattern]. This function is called during benchmark execution (line 150 in ucc_pt_benchmark.cc) to determine whether to use small or large iteration counts, so using the wrong matrix will give incorrect results and affect performance measurement accuracy.

    const auto &matrix    = (shuffle == 1) ? current_counts : pattern_counts[current_pattern];

@yaeliyac yaeliyac marked this pull request as draft January 11, 2026 12:47
@yaeliyac yaeliyac force-pushed the shuffle_matrix_new branch 2 times, most recently from e5cab7c to 48e54af Compare January 11, 2026 15:46
@yaeliyac yaeliyac marked this pull request as ready for review January 11, 2026 15:51
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread tools/perf/generator/ucc_pt_generator_traffic_matrix.cc Outdated
Comment thread tools/perf/generator/ucc_pt_generator_traffic_matrix.cc Outdated
Comment thread tools/perf/ucc_pt_config.cc Outdated
@yaeliyac yaeliyac requested a review from janjust January 14, 2026 13:21
Comment thread tools/perf/generator/ucc_pt_generator_traffic_matrix.cc Outdated
Comment thread tools/perf/generator/ucc_pt_generator_traffic_matrix.cc Outdated
Comment thread tools/perf/generator/ucc_pt_generator_traffic_matrix.cc
Comment thread tools/perf/generator/ucc_pt_generator_traffic_matrix.cc Outdated
Comment thread tools/perf/generator/ucc_pt_generator_traffic_matrix.cc
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. tools/perf/generator/ucc_pt_generator_traffic_matrix.cc, line 441-456 (link)

    logic: get_src_count_max() doesn't account for shuffle mode like get_dst_count_max() does. When shuffle is enabled, this function should iterate over all rows (not just pattern_counts) to find the maximum source count across all possible column shuffles.

    Currently this returns the max from pattern_counts which only has one pattern, potentially underestimating buffer size when shuffle=1.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@yaeliyac
Copy link
Copy Markdown
Contributor Author

  • get_src_count_max() doesn't account for shuffle mode, potentially underestimating buffer requirements

shuffling columns doesn't change row sums

@yaeliyac yaeliyac requested a review from ikryukov January 18, 2026 08:40
@janjust
Copy link
Copy Markdown
Collaborator

janjust commented Feb 19, 2026

/build

@janjust janjust enabled auto-merge (squash) February 19, 2026 16:23
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@janjust
Copy link
Copy Markdown
Collaborator

janjust commented Feb 26, 2026

/build

Comment thread tools/perf/generator/ucc_pt_generator_traffic_matrix.cc
@janjust
Copy link
Copy Markdown
Collaborator

janjust commented Feb 27, 2026

/build

@janjust janjust force-pushed the shuffle_matrix_new branch from 0537158 to 10c9128 Compare April 9, 2026 02:00
@janjust
Copy link
Copy Markdown
Collaborator

janjust commented Apr 9, 2026

/build

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 9, 2026

Tip:

Greploop — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.

Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

@janjust janjust merged commit 081bb98 into openucx:master Apr 9, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants