Skip to content

5k Live LT#3779

Open
sm745052 wants to merge 31 commits into
mainfrom
5kliveit
Open

5k Live LT#3779
sm745052 wants to merge 31 commits into
mainfrom
5kliveit

Conversation

@sm745052
Copy link
Copy Markdown
Contributor

No description provided.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.17%. Comparing base (d9ed584) to head (c0bf31b).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3779      +/-   ##
============================================
+ Coverage     53.52%   58.17%   +4.65%     
+ Complexity     6637     2768    -3869     
============================================
  Files          1082      524     -558     
  Lines         65868    31592   -34276     
  Branches       7332     3437    -3895     
============================================
- Hits          35255    18379   -16876     
+ Misses        28257    12173   -16084     
+ Partials       2356     1040    -1316     
Components Coverage Δ
spanner-templates 74.82% <ø> (+1.97%) ⬆️
spanner-import-export ∅ <ø> (∅)
spanner-live-forward-migration 80.84% <ø> (-0.10%) ⬇️
spanner-live-reverse-replication 77.01% <ø> (-0.09%) ⬇️
spanner-bulk-migration 91.05% <ø> (-0.06%) ⬇️
gcs-spanner-dv 86.67% <ø> (+0.90%) ⬆️
see 578 files with indirect coverage changes
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pull-request-size pull-request-size Bot added size/L and removed size/M labels May 10, 2026
@sm745052 sm745052 force-pushed the 5kliveit branch 5 times, most recently from 3b41e8b to 588119b Compare May 10, 2026 19:12
@sm745052 sm745052 force-pushed the 5kliveit branch 5 times, most recently from d0215c8 to 8e751ce Compare May 11, 2026 05:59
@sm745052 sm745052 marked this pull request as ready for review May 13, 2026 07:44
@sm745052 sm745052 requested a review from a team as a code owner May 13, 2026 07:44
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant expansion to the load testing capabilities for the DataStream to Spanner template, specifically targeting high-scale scenarios with 5000 tables. It includes infrastructure improvements to the base test class to handle parallel row count checks and private connectivity configurations, alongside minor refactoring of the main processing logic to improve resource handling during execution.

Highlights

  • Load Testing Infrastructure: Introduced a new load test for the DataStream to Spanner template supporting 5000 tables and updated the base test class to support parallel row count validation.
  • Code Refactoring: Optimized the ProcessInformationSchemaFn class by moving Ddl objects to transient fields and reordering the teardown and process logic for better resource management.
  • Enhanced Test Flexibility: Added support for private connectivity in Datastream resource management and allowed for optional DDL resource files in the base test class.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@sm745052 sm745052 changed the title 5kliveit 5k Live LT May 13, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new load test for the DataStream to Spanner template designed to handle 5,000 tables and refactors the ProcessInformationSchema class to manage DDL objects as transient fields. The load test base class was also updated to support private connectivity and includes a new parallelized row count verification mechanism. Review feedback highlights several opportunities for improvement: batching DDL and DML statements in both Spanner and MySQL is recommended to avoid timeouts when dealing with 5,000 tables, and the row count verification logic should be refactored to use a parallelStream instead of manually creating an ExecutorService on every invocation to prevent resource exhaustion.

spannerDdls.add(
String.format("CREATE TABLE table_%d (id INT64 NOT NULL) PRIMARY KEY(id)", i));
}
spannerResourceManager.executeDdlStatements(spannerDdls);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Executing 5,000 DDL statements in a single call to executeDdlStatements is likely to exceed Cloud Spanner's limits for a single request or cause timeouts. It is recommended to batch DDL statements into smaller groups (e.g., 100 statements per call).

Suggested change
spannerResourceManager.executeDdlStatements(spannerDdls);
for (int i = 0; i < spannerDdls.size(); i += 100) {
spannerResourceManager.executeDdlStatements(
spannerDdls.subList(i, Math.min(i + 100, spannerDdls.size())));
}

Comment on lines +86 to +91
for (int j = 2; j <= NUM_TABLES; j++) {
String mySqlDdl = String.format("CREATE TABLE table_%d LIKE table_1", j);
stmt.addBatch(mySqlDdl);
}
stmt.executeBatch();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Executing a batch of 5,000 CREATE TABLE statements in MySQL might exceed the max_allowed_packet size or lead to long-running transactions that could time out. It is recommended to partition the batch into smaller chunks (e.g., 500 statements per batch).

      for (int j = 2; j <= NUM_TABLES; j++) {
        String mySqlDdl = String.format("CREATE TABLE table_%d LIKE table_1", j);
        stmt.addBatch(mySqlDdl);
        if (j % 500 == 0) {
          stmt.executeBatch();
        }
      }
      stmt.executeBatch();

Comment on lines +134 to +138
for (int i = 1; i <= NUM_TABLES; i++) {
String sql = String.format("INSERT INTO table_%d (id) VALUES (%d)", i, i);
stmt.addBatch(sql);
}
stmt.executeBatch();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the table creation batch, inserting 5,000 rows in a single batch might encounter limits. It is safer to execute the batch in smaller increments.

            for (int i = 1; i <= NUM_TABLES; i++) {
              String sql = String.format("INSERT INTO table_%d (id) VALUES (%d)", i, i);
              stmt.addBatch(sql);
              if (i % 500 == 0) {
                stmt.executeBatch();
              }
            }
            stmt.executeBatch();

Comment on lines +258 to +289
private boolean checkAllTablesRowCounts(HashMap<String, RowRange> tables) {
ExecutorService executor = Executors.newFixedThreadPool(20);
try {
List<Callable<Boolean>> tasks = new ArrayList<>();
for (Map.Entry<String, RowRange> entry : tables.entrySet()) {
tasks.add(
() -> {
try {
long rowCount = spannerResourceManager.getRowCount(entry.getKey());
RowRange range = entry.getValue();
return rowCount >= range.min && rowCount <= range.max;
} catch (Exception e) {
return false;
}
});
}

List<Future<Boolean>> futures = executor.invokeAll(tasks);
boolean allPassed = true;
for (Future<Boolean> future : futures) {
if (!future.get()) {
allPassed = false;
}
}
return allPassed;
} catch (Exception e) {
LOG.warn("Error checking row count in Spanner", e);
} finally {
executor.shutdown();
}
return false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of checkAllTablesRowCounts creates a new ExecutorService with 20 threads on every invocation. Since this method is called periodically by waitForCondition, this is inefficient and can lead to resource exhaustion. Additionally, the broad catch block masks potential persistent errors. Using a parallelStream is a more idiomatic and efficient way to perform these checks in parallel using the common pool.

  private boolean checkAllTablesRowCounts(HashMap<String, RowRange> tables) {
    return tables.entrySet().parallelStream().allMatch(entry -> {
      try {
        long rowCount = spannerResourceManager.getRowCount(entry.getKey());
        RowRange range = entry.getValue();
        return rowCount >= range.min && rowCount <= range.max;
      } catch (Exception e) {
        LOG.warn("Error checking row count for table {}: {}", entry.getKey(), e.getMessage());
        return false;
      }
    });
  }
References
  1. It is acceptable to use streams with lambdas that wrap checked exceptions (e.g., IOException) into a RuntimeException, instead of refactoring to a traditional for-loop for more direct exception handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant