fix(datastream-to-bigquery): Added support to read SQLServer Primary keys from replication_index to support Merge operation (Issue #3805)#3806
Conversation
…Keys from replication-index to resolve merge operation in datastream-to-bigquery v2 dataflow template
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a fallback mechanism for identifying primary keys in Datastream records. By checking the 'replication_index' metadata field when 'primary_keys' is unavailable, the system now correctly supports SQL Server CDC sources, enabling proper Merge operations in BigQuery. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a fallback mechanism for retrieving primary keys from Datastream metadata, specifically adding support for 'replication_index' when 'primary_keys' is absent. The changes update both 'FormatDatastreamJsonToJson' and 'FormatDatastreamRecordToJson' classes and include comprehensive unit tests to verify the new logic. The review identified a potential NullPointerException risk and performance concerns in 'FormatDatastreamRecordToJson', as well as a need to handle explicitly null JSON nodes in 'FormatDatastreamJsonToJson' to ensure the fallback logic functions correctly.
… in the pull reuest for adding support to read SQLServer Primary Keys from replication-index
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (66.66%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #3806 +/- ##
============================================
+ Coverage 53.62% 53.65% +0.02%
- Complexity 6323 6736 +413
============================================
Files 1087 1087
Lines 66762 66772 +10
Branches 7476 7481 +5
============================================
+ Hits 35801 35826 +25
+ Misses 28534 28518 -16
- Partials 2427 2428 +1
🚀 New features to boost your workflow:
|
This is to address the correct reading of SQL Server primary keys from "replication_index" metadata field, if the primary key is not found in the default metadata field "primary_keys".
This is to address the issue reported in the bug - #3805