5k Live LT by sm745052 · Pull Request #3779 · GoogleCloudPlatform/DataflowTemplates

sm745052 · 2026-05-10T17:37:55Z

No description provided.

codecov · 2026-05-10T17:42:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.17%. Comparing base (d9ed584) to head (c0bf31b).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3779      +/-   ##
============================================
+ Coverage     53.52%   58.17%   +4.65%     
+ Complexity     6637     2768    -3869     
============================================
  Files          1082      524     -558     
  Lines         65868    31592   -34276     
  Branches       7332     3437    -3895     
============================================
- Hits          35255    18379   -16876     
+ Misses        28257    12173   -16084     
+ Partials       2356     1040    -1316

Components	Coverage Δ
spanner-templates	`74.82% <ø> (+1.97%)`	⬆️
spanner-import-export	`∅ <ø> (∅)`
spanner-live-forward-migration	`80.84% <ø> (-0.10%)`	⬇️
spanner-live-reverse-replication	`77.01% <ø> (-0.09%)`	⬇️
spanner-bulk-migration	`91.05% <ø> (-0.06%)`	⬇️
gcs-spanner-dv	`86.67% <ø> (+0.90%)`	⬆️
see 578 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

gemini-code-assist · 2026-05-13T07:45:04Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant expansion to the load testing capabilities for the DataStream to Spanner template, specifically targeting high-scale scenarios with 5000 tables. It includes infrastructure improvements to the base test class to handle parallel row count checks and private connectivity configurations, alongside minor refactoring of the main processing logic to improve resource handling during execution.

Highlights

Load Testing Infrastructure: Introduced a new load test for the DataStream to Spanner template supporting 5000 tables and updated the base test class to support parallel row count validation.
Code Refactoring: Optimized the ProcessInformationSchemaFn class by moving Ddl objects to transient fields and reordering the teardown and process logic for better resource management.
Enhanced Test Flexibility: Added support for private connectivity in Datastream resource management and allowed for optional DDL resource files in the base test class.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new load test for the DataStream to Spanner template designed to handle 5,000 tables and refactors the ProcessInformationSchema class to manage DDL objects as transient fields. The load test base class was also updated to support private connectivity and includes a new parallelized row count verification mechanism. Review feedback highlights several opportunities for improvement: batching DDL and DML statements in both Spanner and MySQL is recommended to avoid timeouts when dealing with 5,000 tables, and the row count verification logic should be refactored to use a parallelStream instead of manually creating an ExecutorService on every invocation to prevent resource exhaustion.

gemini-code-assist · 2026-05-13T07:47:41Z

+      spannerDdls.add(
+          String.format("CREATE TABLE table_%d (id INT64 NOT NULL) PRIMARY KEY(id)", i));
+    }
+    spannerResourceManager.executeDdlStatements(spannerDdls);


Executing 5,000 DDL statements in a single call to executeDdlStatements is likely to exceed Cloud Spanner's limits for a single request or cause timeouts. It is recommended to batch DDL statements into smaller groups (e.g., 100 statements per call).

Suggested change

spannerResourceManager.executeDdlStatements(spannerDdls);

for (int i = 0; i < spannerDdls.size(); i += 100) {

spannerResourceManager.executeDdlStatements(

spannerDdls.subList(i, Math.min(i + 100, spannerDdls.size())));

}

gemini-code-assist · 2026-05-13T07:47:41Z

+      for (int j = 2; j <= NUM_TABLES; j++) {
+        String mySqlDdl = String.format("CREATE TABLE table_%d LIKE table_1", j);
+        stmt.addBatch(mySqlDdl);
+      }
+      stmt.executeBatch();
+    }


Executing a batch of 5,000 CREATE TABLE statements in MySQL might exceed the max_allowed_packet size or lead to long-running transactions that could time out. It is recommended to partition the batch into smaller chunks (e.g., 500 statements per batch).

for (int j = 2; j <= NUM_TABLES; j++) { String mySqlDdl = String.format("CREATE TABLE table_%d LIKE table_1", j); stmt.addBatch(mySqlDdl); if (j % 500 == 0) { stmt.executeBatch(); } } stmt.executeBatch();

gemini-code-assist · 2026-05-13T07:47:41Z

+            for (int i = 1; i <= NUM_TABLES; i++) {
+              String sql = String.format("INSERT INTO table_%d (id) VALUES (%d)", i, i);
+              stmt.addBatch(sql);
+            }
+            stmt.executeBatch();


Similar to the table creation batch, inserting 5,000 rows in a single batch might encounter limits. It is safer to execute the batch in smaller increments.

for (int i = 1; i <= NUM_TABLES; i++) { String sql = String.format("INSERT INTO table_%d (id) VALUES (%d)", i, i); stmt.addBatch(sql); if (i % 500 == 0) { stmt.executeBatch(); } } stmt.executeBatch();

gemini-code-assist · 2026-05-13T07:47:41Z

+  private boolean checkAllTablesRowCounts(HashMap<String, RowRange> tables) {
+    ExecutorService executor = Executors.newFixedThreadPool(20);
+    try {
+      List<Callable<Boolean>> tasks = new ArrayList<>();
+      for (Map.Entry<String, RowRange> entry : tables.entrySet()) {
+        tasks.add(
+            () -> {
+              try {
+                long rowCount = spannerResourceManager.getRowCount(entry.getKey());
+                RowRange range = entry.getValue();
+                return rowCount >= range.min && rowCount <= range.max;
+              } catch (Exception e) {
+                return false;
+              }
+            });
+      }
+
+      List<Future<Boolean>> futures = executor.invokeAll(tasks);
+      boolean allPassed = true;
+      for (Future<Boolean> future : futures) {
+        if (!future.get()) {
+          allPassed = false;
+        }
+      }
+      return allPassed;
+    } catch (Exception e) {
+      LOG.warn("Error checking row count in Spanner", e);
+    } finally {
+      executor.shutdown();
+    }
+    return false;
+  }


The current implementation of checkAllTablesRowCounts creates a new ExecutorService with 20 threads on every invocation. Since this method is called periodically by waitForCondition, this is inefficient and can lead to resource exhaustion. Additionally, the broad catch block masks potential persistent errors. Using a parallelStream is a more idiomatic and efficient way to perform these checks in parallel using the common pool.

private boolean checkAllTablesRowCounts(HashMap<String, RowRange> tables) { return tables.entrySet().parallelStream().allMatch(entry -> { try { long rowCount = spannerResourceManager.getRowCount(entry.getKey()); RowRange range = entry.getValue(); return rowCount >= range.min && rowCount <= range.max; } catch (Exception e) { LOG.warn("Error checking row count for table {}: {}", entry.getKey(), e.getMessage()); return false; } }); }

References

It is acceptable to use streams with lambdas that wrap checked exceptions (e.g., IOException) into a RuntimeException, instead of refactoring to a traditional for-loop for more direct exception handling.

sm745052 added 27 commits April 29, 2026 05:30

added 5k it

d62ad9a

5k changes rev it

4563a42

update

d689d0a

update

0ca4a2b

update

67d085e

efficient shadow table creation

683aaf0

spotless

a7cbef6

update

4af3477

4999

97214e6

optim

97e1cf5

added logging in shadow table creation

18a15d8

take shadow table creation to setup

b562aa6

refactor arch

25fcc32

update

5ab565e

further optim

6b32bc3

Merge branch 'main' into 5krevit

f90d733

rename

2d2a753

disambiguate metadatadb from changestream metadatadb

5511a59

align test timeout

e46b467

make LT

fa3aadb

updated LT

dceffec

spotless

8dc81cd

added ensureUsableAndCreateResources

0954f4d

5k changes

e472657

changed check time

244547f

adressing coments

ede4c31

initial

0e91bc1

pull-request-size Bot added the size/XL label May 10, 2026

Merge branch 'main' into 5kliveit

dc2f986

pull-request-size Bot removed the size/XL label May 10, 2026

pull-request-size Bot added the size/M label May 10, 2026

pull-request-size Bot added size/L and removed size/M labels May 10, 2026

sm745052 force-pushed the 5kliveit branch 5 times, most recently from 3b41e8b to 588119b Compare May 10, 2026 19:12

sm745052 added the ignore-for-release label May 10, 2026

sm745052 force-pushed the 5kliveit branch 5 times, most recently from d0215c8 to 8e751ce Compare May 11, 2026 05:59

add test

6c638bd

sm745052 force-pushed the 5kliveit branch from 8e751ce to 6c638bd Compare May 12, 2026 08:36

remove backfill

4d6d482

sm745052 marked this pull request as ready for review May 13, 2026 07:44

sm745052 requested a review from a team as a code owner May 13, 2026 07:44

sm745052 requested review from aasthabharill and pratickchokhani May 13, 2026 07:44

sm745052 changed the title ~~5kliveit~~ 5k Live LT May 13, 2026

gemini-code-assist Bot reviewed May 13, 2026

View reviewed changes

sm745052 force-pushed the 5kliveit branch from aa2d58b to c170761 Compare May 13, 2026 08:13

make row range class

c0bf31b

sm745052 force-pushed the 5kliveit branch from c170761 to c0bf31b Compare May 13, 2026 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5k Live LT#3779

5k Live LT#3779
sm745052 wants to merge 31 commits into
mainfrom
5kliveit

sm745052 commented May 10, 2026

Uh oh!

codecov Bot commented May 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 13, 2026

Uh oh!

gemini-code-assist Bot May 13, 2026

Uh oh!

gemini-code-assist Bot May 13, 2026

Uh oh!

gemini-code-assist Bot May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sm745052 commented May 10, 2026

Uh oh!

codecov Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gemini-code-assist Bot commented May 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 10, 2026 •

edited

Loading