Skip to content

Bulk partition UUID#3798

Draft
aasthabharill wants to merge 6 commits into
mainfrom
bulk-partition-uuid
Draft

Bulk partition UUID#3798
aasthabharill wants to merge 6 commits into
mainfrom
bulk-partition-uuid

Conversation

@aasthabharill
Copy link
Copy Markdown
Member

@aasthabharill aasthabharill commented May 12, 2026

This Pull Request modifies the uniform partitioning (uniformization) logic in the sourcedb-to-spanner template to support tables partitioned on PostgreSQL UUID primary keys.

Changes Made & Rationale

1. Map UUID Columns to a Virtual "UUID" Collation

  • Change: In PostgreSQLDialectAdapter.discoverTableIndexes, if the typeName of a column is "uuid", we assign "UUID" as its collation reference.
  • Why: CollationMapper.fromDB expects a virtual "UUID" collation tag to trigger the static hexadecimal base-16 mapper (buildStaticUuidMapper). By assigning "UUID" during discovery, the splitter bypasses executing a database query to fetch collation rankings (which would fail or be extremely slow for a native UUID type that has no physical collation).

2. Configure Virtual Type Length to 32 for UUID Columns

  • Change: In PostgreSQLDialectAdapter.discoverTableIndexes, if typeLength is null and typeName is "uuid", we set typeLength = 32.
  • Why: While a standard canonical UUID is 36 characters long (including hyphens), CollationMapper strips the hyphens out during mapping, leaving exactly 32 hexadecimal characters. Overriding the discovered length to 32 ensures that no additional padding (virtual zero-rank characters) is appended during range partitioning calculations, ensuring a clean 1-to-1 mapping and unmapping.

3. Register State-Based Query and Parameter Cast Wrappers for UUID

  • Change: In PostgreSQLDialectAdapter.discoverTableIndexes, if typeName is "uuid", we register explicit SQL cast statements in columnCastWrappers and columnParameterCastWrappers maps.
  • Why:
    • columnCastWrappers (CAST(%s AS TEXT)): Used in getBoundaryQuery to query MIN(CAST(col AS TEXT)) and MAX(CAST(col AS TEXT)). This is necessary to retrieve the UUID boundaries safely as standard text strings compatible with JDBC. UUID doesnt have a MIN or MAX.
    • columnParameterCastWrappers (CAST(? AS uuid)): Used in getReadQuery and getCountQuery to bind parameter boundary placeholders as col >= CAST(? AS uuid). This is necessary because PostgreSQL does not support implicit comparison of standard JDBC string parameter bindings against native uuid column types.

4. Verify Changes with Unit & Integration Tests

  • Collation Mapper Test: Added testUuidCollationMapper in CollationMapperTest.java to verify that canonical UUID strings are mapped to 128-bit BigIntegers and unmapped back with correct formatting and hyphen insertion.
  • Dialect Adapter Test: Added testDiscoverTableIndexesWithUuid in PostgreSQLDialectAdapterTest.java verifying index discovery mappings, boundary query wrapping, and read/count query parameter bindings.
  • Integration Test expected data logic: Updated getExpectedData in PostgreSQLWithUniformizationIT.java to support assertions for tables with non-integer primary keys (uuid_pk).

@pull-request-size pull-request-size Bot added size/L and removed size/M labels May 12, 2026
@aasthabharill aasthabharill added improvement Making existing code better bug-fix labels May 12, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 85.33333% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.50%. Comparing base (f8472fe) to head (4fd97fa).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
...ctadapter/postgresql/PostgreSQLDialectAdapter.java 76.59% 5 Missing and 6 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3798      +/-   ##
============================================
+ Coverage     53.41%   59.50%   +6.08%     
+ Complexity     6629     2184    -4445     
============================================
  Files          1082      506     -576     
  Lines         65795    29521   -36274     
  Branches       7328     3240    -4088     
============================================
- Hits          35147    17565   -17582     
+ Misses        28288    10970   -17318     
+ Partials       2360      986    -1374     
Components Coverage Δ
spanner-templates 74.87% <85.33%> (+2.05%) ⬆️
spanner-import-export ∅ <ø> (∅)
spanner-live-forward-migration 80.86% <ø> (-0.07%) ⬇️
spanner-live-reverse-replication 77.02% <ø> (-0.03%) ⬇️
spanner-bulk-migration 91.02% <85.33%> (-0.08%) ⬇️
gcs-spanner-dv 86.69% <ø> (+0.94%) ⬆️
Files with missing lines Coverage Δ
.../uniformsplitter/stringmapper/CollationMapper.java 97.82% <100.00%> (+0.55%) ⬆️
...ctadapter/postgresql/PostgreSQLDialectAdapter.java 93.36% <76.59%> (-3.94%) ⬇️

... and 603 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aasthabharill aasthabharill force-pushed the bulk-partition-uuid branch 4 times, most recently from 052d28a to e6bd9dc Compare May 12, 2026 17:08
@aasthabharill aasthabharill force-pushed the bulk-partition-uuid branch from e6bd9dc to 4fd97fa Compare May 12, 2026 17:51
Copy link
Copy Markdown
Contributor

@VardhanThigle VardhanThigle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider moving to binary index (similar to what we use for MySQL varbinary instead of a string for UUID. PG uses binary collation to compare UUIDs and that's more natural)

Do we need to take care of strict UUID version etc?

For PG, mostly not - https://www.db-fiddle.com/f/pVFVr6krWjQ2wHqstc44Hm/0 (please add this fiddle as a comment somewhere in your implementation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix improvement Making existing code better size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants