[spark] Add union read for lake-enabled log tables by fresh-borzoni · Pull Request #2956 · apache/fluss

fresh-borzoni · 2026-03-29T23:58:26Z

Summary

Adds batch read for lake-enabled log tables. When a table has datalake enabled, reads combine lake storage (Paimon/Iceberg) with Fluss log tail. Lake and log are planned as separate Spark partition, lake tasks read from lake storage without Fluss connections, log tail tasks reuse the existing reader. Falls back to pure log reads when no snapshot exists. Only enabled in FULL startup mode.

Tests cover both Paimon and Iceberg.

Follow-up PRs

PK table lake reads (sort-merge with lake snapshot)
Streaming with lake bootstrap
Filter/partition/limit push-down to lake source
DV support for Paimon

fresh-borzoni added 2 commits March 30, 2026 00:33

[spark] Add union read for lake-enabled log tables

7c5b0c1

remove iceberg test for now, shading problem

e39b4b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Add union read for lake-enabled log tables#2956

[spark] Add union read for lake-enabled log tables#2956
fresh-borzoni wants to merge 2 commits intoapache:mainfrom
fresh-borzoni:spark-union-read

fresh-borzoni commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fresh-borzoni commented Mar 29, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant