Skip to content

feat: add Bigtable Change Streams to Bigtable template and support Bigtable output in Streaming Data Generator#3774

Open
stankiewicz wants to merge 4 commits into
GoogleCloudPlatform:mainfrom
stankiewicz:bigtable_to_bigtalbe
Open

feat: add Bigtable Change Streams to Bigtable template and support Bigtable output in Streaming Data Generator#3774
stankiewicz wants to merge 4 commits into
GoogleCloudPlatform:mainfrom
stankiewicz:bigtable_to_bigtalbe

Conversation

@stankiewicz
Copy link
Copy Markdown
Contributor

Changes:

1. Streaming Data Generator: Native BigTableIO Sink

  • Introduced new sink transform StreamingDataGeneratorWriteToBigtable to map incoming JSON messages directly into PCollection<KV<ByteString, Iterable<Mutation>>> into BigtableIO.write().

2. New Template: Bigtable Change Streams to Bigtable Replicator

  • Implemented a streaming pipeline BigtableChangeStreamsToBigtable.java under v2/googlecloud-to-googlecloud that ingests CDC mutations and replicates them to another Cloud Bigtable instance. Kind of similar to BigTableToHBase.

3. Documentation & Infrastructure Automation

  • Auto-Generated Markdown READMEs: README_Bigtable_Change_Streams_to_Bigtable.md, README_Streaming_Data_Generator.md
  • Terraform Provisioning: Generated Terraform

Verification Results:

  • spotless, checkstyle, unit tests
  • Executed IT tests on personal project.

@stankiewicz stankiewicz added the addition New feature or request label May 8, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new streaming pipeline template for replicating Bigtable change stream mutations between instances, alongside a new native Bigtable sink for the Streaming Data Generator. These additions expand the capabilities of the Dataflow templates library by providing robust CDC replication and improved data generation testing support for Bigtable.

Highlights

  • Bigtable Change Streams to Bigtable Template: Added a new streaming pipeline template to replicate Bigtable change stream mutations to another Bigtable instance, supporting features like bidirectional replication and mutation filtering.
  • Streaming Data Generator Sink: Introduced a native Bigtable sink to the Streaming Data Generator, enabling direct writing of generated JSON messages to Bigtable tables.
  • Documentation and Infrastructure: Included auto-generated README files and Terraform provisioning scripts for the new template, and updated existing documentation for the Streaming Data Generator.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Dataflow template, "Bigtable Change Streams to Bigtable Replicator," which facilitates the replication of Bigtable change stream mutations between instances, including support for bidirectional replication and garbage collection filtering. It also adds a Bigtable sink to the "Streaming Data Generator" template. Review feedback suggests enhancing the Bigtable write configuration to expose performance-tuning parameters, optimizing JSON parsing by reading directly from byte arrays to avoid unnecessary string allocations, and using more idiomatic Jackson API methods for iterating over JSON fields.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 51.82482% with 132 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.51%. Comparing base (00cd31f) to head (632801e).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...amstobigtable/BigtableChangeStreamsToBigtable.java 47.05% 84 Missing and 15 partials ⚠️
...nsforms/StreamingDataGeneratorWriteToBigtable.java 57.35% 25 Missing and 4 partials ⚠️
.../teleport/v2/templates/StreamingDataGenerator.java 78.94% 0 Missing and 4 partials ⚠️

❌ Your patch check has failed because the patch coverage (51.82%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3774      +/-   ##
============================================
+ Coverage     53.29%   53.51%   +0.21%     
+ Complexity     6575     6231     -344     
============================================
  Files          1079     1084       +5     
  Lines         65576    66141     +565     
  Branches       7301     7382      +81     
============================================
+ Hits          34948    35393     +445     
- Misses        28277    28366      +89     
- Partials       2351     2382      +31     
Components Coverage Δ
spanner-templates 72.83% <ø> (+0.04%) ⬆️
spanner-import-export 68.65% <ø> (+0.12%) ⬆️
spanner-live-forward-migration 80.93% <ø> (-0.02%) ⬇️
spanner-live-reverse-replication 77.08% <ø> (+0.01%) ⬆️
spanner-bulk-migration 91.10% <ø> (-0.01%) ⬇️
gcs-spanner-dv 85.74% <ø> (-0.02%) ⬇️
Files with missing lines Coverage Δ
.../teleport/v2/templates/StreamingDataGenerator.java 39.86% <78.94%> (+5.24%) ⬆️
...nsforms/StreamingDataGeneratorWriteToBigtable.java 57.35% <57.35%> (ø)
...amstobigtable/BigtableChangeStreamsToBigtable.java 47.05% <47.05%> (ø)

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stankiewicz stankiewicz requested a review from a team May 11, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

addition New feature or request size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant