feat: Support uuid as split column in postgres by rajivharlalka · Pull Request #180 · datazip-inc/olake

rajivharlalka · 2025-03-26T05:28:17Z

Description

Fixes #144

Added checks to modify the SQL query with typecasting to TEXT datatype if the splitcolumn is stored as a string datatype in Olake.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Scenario A
Added a table with fields id uuid(Primary Key), name text and checked backfill with id as splitkey.
Scenario B
Added a table with fields id uuid(Primary Key), name text, roll int and checked backfill with id as the splitkey

Screenshots or Recordings

Related PR's (If Any):

rajivharlalka · 2025-03-26T14:16:28Z

@hash-data any reviews please?

hash-data · 2025-04-01T06:36:26Z

@rajivharlalka reviewing today

rajivharlalka · 2025-04-02T17:41:53Z

@hash-data any updates?

hash-data · 2025-04-02T17:44:55Z

@rajivharlalka can you also mention the cases in which you have tested it.
Thanks

CLAassistant · 2025-04-03T18:02:27Z

All committers have signed the CLA.

rajivharlalka · 2025-04-07T05:15:57Z

@hash-data any further updates here?

hash-data

Some Comments

pkg/jdbc/jdbc.go

Add checks if the splitColumn is a UUID(string in olake) dattatype modify the SQL query to type-caste the value and column to TEXT for comparisons between uuid.

Signed-off-by: Rajiv Harlalka <rajivharlalka009@gmail.com>

rajivharlalka · 2025-04-10T11:54:40Z

@hash-data @vikash390 Any updates here? Could you test it out or should I help with a test PostgreSQL dump that could help.

hash-data · 2025-04-10T12:45:32Z

@rajivharlalka, we were busy with some ad hoc tasks and will try to finish testing today
Thanks

pkg/jdbc/jdbc.go

zriyanshdz · 2025-04-27T08:40:44Z

hey @rajivharlalka as we are approaching towards final merge, could you just resolve the comments so we can get it merged soon?

Signed-off-by: Rajiv Harlalka <rajivharlalka009@gmail.com>

rajivharlalka · 2025-05-04T03:35:24Z

@hash-data do let me know if there is any more changes needed.

ImDoubD-datazip · 2025-05-26T09:15:34Z

pkg/jdbc/jdbc.go

+	return fmt.Sprintf("%s AND %s",
+		formatter(filterColumn, ">=", chunk.Min),
+		formatter(filterColumn, "<=", chunk.Max))
 }


buildChunkCondition function has a problem. As of no, what it is doing is it includes both the chunk min and chunk max boundary in a single chunk, so there is data duplication. Basically the chunk min for every chunk is getting duplicated as the max of previous chunk is the min of current chunk.

Expected behaviour => When he first chunk is formed, only then the all the data between the chunk min and chunk max including must be in that chunk, from next chunk onwards the chunk min (which was chunk max of previous chunk) should not be included in the chunk. It should only be included in the first chunk made.

@rajivharlalka

olake/drivers/postgres/internal/backfill.go

Lines 32 to 34 in 5687fd0

// check for data distribution

// TODO: remove chunk intersections where chunks can be {0, 100} {100, 200}. Need to {0, 99} {100, 200}

splitChunks, err = p.splitTableIntoChunks(stream)

I feel this is an already understood problem and it's solution isn't in the buildChunk function. I read the todo and hence left the known problem on the idea that it'll get fixed later.

This is already solved. We have merged it to staging. Please take the latest pull of staging and modify code logic to include the uuid as split column.
@rajivharlalka

ImDoubD-datazip · 2025-07-21T06:21:18Z

Hi @rajivharlalka , I have created a PR for this uuid thingy in postgres. If you can possibly pull the current staging and do changes then I will be able to merge your PR and your contribution will be reflected else the PR i have created will be merged.
My PR: #405

rajivharlalka changed the base branch from master to staging March 26, 2025 05:28

hash-data self-requested a review April 2, 2025 17:43

hash-data changed the title ~~Enable Backfill in PostgreSQL with UUID splitColumn~~ feat: Support uuid as split column in postgres Apr 3, 2025

hash-data reviewed Apr 7, 2025

View reviewed changes

pkg/jdbc/jdbc.go Outdated Show resolved Hide resolved

pkg/jdbc/jdbc.go Show resolved Hide resolved

pkg/jdbc/jdbc.go Show resolved Hide resolved

rajivharlalka added 3 commits April 7, 2025 13:23

feat: allow backfill in PostgreSQL with uuid splitColumns

7abeed3

Add checks if the splitColumn is a UUID(string in olake) dattatype modify the SQL query to type-caste the value and column to TEXT for comparisons between uuid.

empty commit to trigger CLA

92958bd

Signed-off-by: Rajiv Harlalka <rajivharlalka009@gmail.com>

fix(jdbc): replace quotes with $$ escapes

76054d3

Signed-off-by: Rajiv Harlalka <rajivharlalka009@gmail.com>

rajivharlalka force-pushed the rajivharlalka/backfill-uuid branch from 8f51fec to 76054d3 Compare April 7, 2025 11:03

Merge branch 'staging' into rajivharlalka/backfill-uuid

5ef0ad8

hash-data requested a review from vikaxsh April 9, 2025 09:09

Merge branch 'staging' into rajivharlalka/backfill-uuid

b83fb66

hash-data reviewed Apr 21, 2025

View reviewed changes

pkg/jdbc/jdbc.go Outdated Show resolved Hide resolved

fix(jdbc): make buildChunkCondition generic with a formatter

5687fd0

Signed-off-by: Rajiv Harlalka <rajivharlalka009@gmail.com>

zriyanshdz requested a review from hash-data May 20, 2025 12:14

ImDoubD-datazip reviewed May 26, 2025

View reviewed changes

nayanj98 assigned rajivharlalka Nov 14, 2025

	// check for data distribution
	// TODO: remove chunk intersections where chunks can be {0, 100} {100, 200}. Need to {0, 99} {100, 200}
	splitChunks, err = p.splitTableIntoChunks(stream)

Conversation

rajivharlalka commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How Has This Been Tested?

Screenshots or Recordings

Related PR's (If Any):

Uh oh!

rajivharlalka commented Mar 26, 2025

Uh oh!

hash-data commented Apr 1, 2025

Uh oh!

rajivharlalka commented Apr 2, 2025

Uh oh!

hash-data commented Apr 2, 2025

Uh oh!

CLAassistant commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rajivharlalka commented Apr 7, 2025

Uh oh!

hash-data left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rajivharlalka commented Apr 10, 2025

Uh oh!

hash-data commented Apr 10, 2025

Uh oh!

Uh oh!

zriyanshdz commented Apr 27, 2025

Uh oh!

rajivharlalka commented May 4, 2025

Uh oh!

ImDoubD-datazip May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajivharlalka May 28, 2025

Choose a reason for hiding this comment

Uh oh!

ImDoubD-datazip May 28, 2025

Choose a reason for hiding this comment

Uh oh!

ImDoubD-datazip commented Jul 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rajivharlalka commented Mar 26, 2025 •

edited

Loading

CLAassistant commented Apr 3, 2025 •

edited

Loading

ImDoubD-datazip May 26, 2025 •

edited

Loading