feat: add Bigtable Change Streams to Bigtable template and support Bigtable output in Streaming Data Generator#3774
Conversation
…gtable output in Streaming Data Generator
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new streaming pipeline template for replicating Bigtable change stream mutations between instances, alongside a new native Bigtable sink for the Streaming Data Generator. These additions expand the capabilities of the Dataflow templates library by providing robust CDC replication and improved data generation testing support for Bigtable. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new Dataflow template, "Bigtable Change Streams to Bigtable Replicator," which facilitates the replication of Bigtable change stream mutations between instances, including support for bidirectional replication and garbage collection filtering. It also adds a Bigtable sink to the "Streaming Data Generator" template. Review feedback suggests enhancing the Bigtable write configuration to expose performance-tuning parameters, optimizing JSON parsing by reading directly from byte arrays to avoid unnecessary string allocations, and using more idiomatic Jackson API methods for iterating over JSON fields.
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (51.82%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #3774 +/- ##
============================================
+ Coverage 53.29% 53.51% +0.21%
+ Complexity 6575 6231 -344
============================================
Files 1079 1084 +5
Lines 65576 66141 +565
Branches 7301 7382 +81
============================================
+ Hits 34948 35393 +445
- Misses 28277 28366 +89
- Partials 2351 2382 +31
🚀 New features to boost your workflow:
|
Changes:
1. Streaming Data Generator: Native BigTableIO Sink
PCollection<KV<ByteString, Iterable<Mutation>>>intoBigtableIO.write().2. New Template: Bigtable Change Streams to Bigtable Replicator
v2/googlecloud-to-googlecloudthat ingests CDC mutations and replicates them to another Cloud Bigtable instance. Kind of similar to BigTableToHBase.3. Documentation & Infrastructure Automation
Verification Results: