Skip to content

feat(spanner-to-sourcedb): add Spanner-to-Spanner reverse replication support#3773

Open
srozsnyai wants to merge 6 commits into
GoogleCloudPlatform:mainfrom
srozsnyai:spanner-to-spanner
Open

feat(spanner-to-sourcedb): add Spanner-to-Spanner reverse replication support#3773
srozsnyai wants to merge 6 commits into
GoogleCloudPlatform:mainfrom
srozsnyai:spanner-to-spanner

Conversation

@srozsnyai
Copy link
Copy Markdown

Extends the spanner-to-sourcedb reverse replication template to support Cloud Spanner as a target database.

Change stream events from a source Spanner instance are converted to mutations and written to a target Spanner database, coordinated through the existing shadow-table mechanism that prevents duplicate and out-of-order writes.

Because calling any Spanner write API inside an active readWriteTransaction causes a nested-transaction error, the target write is deferred: the mutation is generated inside the shadow-table transaction (where ordering and filtering decisions are made), then committed via writeAtLeastOnce() after the shadow-table transaction completes.

@srozsnyai srozsnyai requested a review from a team as a code owner May 8, 2026 11:57
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enables reverse replication from Spanner to Spanner by extending the existing template infrastructure. It introduces a new set of DAO and DML generation classes specifically for Spanner, ensuring that mutation data is correctly formatted and committed. A key aspect of this change is the modification of the write flow to defer target database operations until after the shadow-table transaction has committed, preventing illegal nested transaction errors.

Highlights

  • Feature Addition: Added support for Cloud Spanner as a target database in the spanner-to-sourcedb reverse replication template.
  • Transaction Handling: Implemented a deferred write mechanism in SourceWriterFn to avoid nested transaction errors when writing to Spanner.
  • Infrastructure: Introduced Spanner-specific components including SpannerDMLGenerator, SpannerTargetDao, and SpannerConnectionHelper.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements support for Cloud Spanner as a target database in the Spanner-to-SourceDb template. It introduces new components for connection management, DML generation, and data access specifically for Spanner targets. Feedback identifies critical issues including a potential data loss scenario in SourceWriterFn due to non-atomic writes, type mismatch errors in SpannerDMLGenerator when handling custom transformations or null array values, and an initialization bug in SpannerConnectionHelper that affects multi-project deployments.

if (request.getCustomTransformationResponse() != null
&& request.getCustomTransformationResponse().containsKey(targetColName)) {
Object customVal = request.getCustomTransformationResponse().get(targetColName);
builder.set(targetColName).to(customVal == null ? (String) null : customVal.toString());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This line assumes that all custom transformation results should be treated as strings. If the target Spanner column is of a different type (e.g., INT64, BOOL, TIMESTAMP), calling .to(String) on the Mutation.WriteBuilder will cause a runtime IllegalArgumentException or a type mismatch error during the commit. The value should be set using a type-aware method similar to the logic in setColumnValue, based on the target column's type.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7a5274c. Custom-transformation values are now routed through setCustomColumnValue / appendCustomKeyComponent, which dispatch on Column.type().getCode() and call the matching typed .to(...) / Key.Builder.append(...) overload. null custom values emit a properly typed NULL via setNullValue. Added tests.

Comment on lines +359 to +362
if (pendingSpannerWrite.get() != null) {
IDao sourceDao = sourceProcessor.getSourceDao(shardId);
sourceDao.write(pendingSpannerWrite.get(), null);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Deferring the Spanner target write until after the shadow-table transaction commits introduces a critical consistency risk. If the shadow-table transaction succeeds but the subsequent sourceDao.write call fails (e.g., due to a transient network issue or target database constraint), the record will be marked as processed in the shadow table. Upon retry from the Dead Letter Queue (DLQ), the isSourceAhead check (line 270) will see that the record's timestamp matches the shadow table and skip the write, resulting in permanent data loss on the target. To ensure atomicity, the target write should be performed within the same transaction as the shadow table update. If the target is the same database, use shadowTransaction.buffer(mutation). If it is a different database, calling writeAtLeastOnce inside the readWriteTransaction block will ensure that a failure in the target write rolls back the shadow table update, allowing for a safe retry.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shadowTransaction.buffer(mutation) only works when the mutation targets the same database as the transaction. The shadow table currently lives in a separate metadata database, so buffer() cannot span them.

writeAtLeastOnce inside readWriteTransaction throws INTERNAL: Nested transactions are not supported for every record, regardless of which DatabaseClient instance is used. The same error reproduces with the Spanner JDBC driver, so JDBC is not an escape hatch. MySQL/Postgres can use this pattern only because their JDBC writes target a non-Spanner engine.

The residual risk is narrow in practice: the client retries ABORTED / UNAVAILABLE / DEADLINE_EXCEEDED internally, errors that escape are typically permanent and operator-visible, and Spanner mutations are idempotent.

IMHO the only real fix is colocating the reverse-replication metadata in the target Spanner database so tx.buffer() becomes available. That is an architectural change and will be addressed as a follow-up once there is team consensus, not as a side effect of this PR.

Comment on lines +40 to +41
private static Map<String, DatabaseClient> clientMap = new ConcurrentHashMap<>();
private static Spanner spannerService;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of static variables for clientMap and spannerService combined with the early return in init (lines 45-48) can lead to issues if shards belong to different Google Cloud projects. If init is called first for a shard in Project A, the spannerService is initialized for Project A. A subsequent call for a shard in Project B will return early because clientMap is not empty, and getConnection will later attempt to use the Project A service to access Project B, which may fail due to permission or configuration mismatches. Consider mapping spannerService by project ID or removing the static singleton pattern in favor of a more robust initialization that ensures all requested shards are processed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idempotent-init pattern as JdbcConnectionHelper and CassandraConnectionHelper — all shards are processed in a single init() call, so the early return doesn't skip later shards.

builder.set(targetColName).to(Value.json(null));
break;
case ARRAY:
builder.set(targetColName).to(Value.stringArray(null));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When setting a null value for an array column, Value.stringArray(null) is always used. This will cause a type mismatch error if the target column is an array of a different type (e.g., ARRAY<INT64>). The null value should be created using the appropriate element type, such as Value.int64Array((Iterable<Long>) null).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a00dd66b5. setNullValue now dispatches on
type.getArrayElementType().getCode() and emits the matching
Value.*Array((Iterable) null) so NULL ARRAY values carry the correct
element type. Added tests.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 46.65493% with 303 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.41%. Comparing base (f8472fe) to head (b793209).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
.../v2/templates/dbutils/dml/SpannerDMLGenerator.java 51.46% 120 Missing and 29 partials ⚠️
...ner/sourceddl/SpannerInformationSchemaScanner.java 0.00% 72 Missing ⚠️
...anner/migrations/utils/SpannerShardFileReader.java 0.00% 27 Missing ⚠️
...es/dbutils/connection/SpannerConnectionHelper.java 37.14% 22 Missing ⚠️
...leport/v2/templates/transforms/SourceWriterFn.java 34.61% 15 Missing and 2 partials ⚠️
...cloud/teleport/v2/templates/SpannerToSourceDb.java 0.00% 11 Missing ⚠️
...plates/dbutils/processor/InputRecordProcessor.java 92.68% 1 Missing and 2 partials ⚠️
...templates/dbutils/dao/source/SpannerTargetDao.java 88.23% 1 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (46.65%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3773      +/-   ##
============================================
- Coverage     53.41%   53.41%   -0.01%     
+ Complexity     6629     6329     -300     
============================================
  Files          1082     1091       +9     
  Lines         65795    66964    +1169     
  Branches       7328     7483     +155     
============================================
+ Hits          35147    35767     +620     
- Misses        28288    28772     +484     
- Partials       2360     2425      +65     
Components Coverage Δ
spanner-templates 72.25% <46.65%> (-0.56%) ⬇️
spanner-import-export 68.64% <ø> (+0.02%) ⬆️
spanner-live-forward-migration 79.74% <17.50%> (-1.19%) ⬇️
spanner-live-reverse-replication 74.87% <46.65%> (-2.18%) ⬇️
spanner-bulk-migration 90.30% <17.50%> (-0.80%) ⬇️
gcs-spanner-dv 84.25% <17.50%> (-1.50%) ⬇️
Files with missing lines Coverage Δ
...ort/v2/spanner/migrations/constants/Constants.java 0.00% <ø> (ø)
...port/v2/spanner/migrations/shard/SpannerShard.java 100.00% <100.00%> (ø)
...eport/v2/spanner/sourceddl/SourceDatabaseType.java 100.00% <100.00%> (ø)
...ates/dbutils/processor/SourceProcessorFactory.java 84.14% <100.00%> (+2.45%) ⬆️
...templates/dbutils/dao/source/SpannerTargetDao.java 88.23% <88.23%> (ø)
...plates/dbutils/processor/InputRecordProcessor.java 86.53% <92.68%> (+1.43%) ⬆️
...cloud/teleport/v2/templates/SpannerToSourceDb.java 0.00% <0.00%> (ø)
...leport/v2/templates/transforms/SourceWriterFn.java 71.42% <34.61%> (-5.35%) ⬇️
...es/dbutils/connection/SpannerConnectionHelper.java 37.14% <37.14%> (ø)
...anner/migrations/utils/SpannerShardFileReader.java 0.00% <0.00%> (ø)
... and 2 more

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant