Skip to content

[spark] Record the write operation type in snapshot properties#8236

Open
Zouxxyy wants to merge 1 commit into
apache:masterfrom
Zouxxyy:xinyu/paimon-operation
Open

[spark] Record the write operation type in snapshot properties#8236
Zouxxyy wants to merge 1 commit into
apache:masterfrom
Zouxxyy:xinyu/paimon-operation

Conversation

@Zouxxyy

@Zouxxyy Zouxxyy commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Purpose

A Paimon snapshot only records the physical CommitKind (APPEND/COMPACT/OVERWRITE/...), not the logical operation that produced it — so an APPEND from INSERT INTO cannot be told apart from one produced by MERGE INTO.

This PR records the logical operation type in the snapshot properties map under the key operation. No format change — Snapshot already has a properties: Map<String, String> field.

Core: add InnerTableCommit#withCommitProperties(...), applied in TableCommitImpl so the properties land on every snapshot the commit generates (both the append and overwrite paths, since FileStoreCommitImpl sources snapshot properties from committable.properties()).

Spark (both v1 and v2 write paths):

SQL operation
INSERT INTO WRITE
INSERT OVERWRITE OVERWRITE
DELETE DELETE
UPDATE UPDATE
MERGE INTO MERGE
CREATE TABLE AS SELECT CREATE TABLE AS SELECT
(CREATE OR) REPLACE TABLE AS SELECT REPLACE TABLE AS SELECT / CREATE OR REPLACE TABLE AS SELECT

Tests

Added SnapshotOperationTest (paimon-spark-ut) asserting the recorded operation for INSERT/OVERWRITE/UPDATE/DELETE/MERGE under both spark.paimon.write.use-v2-write=true and false, plus CTAS/RTAS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant