feat(transaction): Add SnapshotValidator by CTTY · Pull Request #2590 · apache/iceberg-rust

CTTY · 2026-06-05T00:30:09Z

Which issue does this PR close?

Closes Implement SnapshotValidator #2243

Here is a diagram to picture what I have in mind about the class heirachy for the snapshot producing.

flowchart TB
    subgraph entry["Transaction entry point"]
        TA["«trait» TransactionAction"]
        FAA["FastAppendAction"]
        RFA["RewriteFilesAction<br/>(planned)"]
    end

    subgraph behavior["Behavior traits — passed into SnapshotProducer::commit"]
        SPO["«trait» SnapshotProduceOperation<br/>what to include / tombstone"]
        SV["«trait» SnapshotValidator<br/>conflict checks (default no-op)"]
        MP["«trait» ManifestProcess<br/>how to assemble manifests"]
    end

    subgraph impls["Concrete behavior"]
        FAO["FastAppendOperation"]
        RFO["RewriteFilesOperation<br/>(planned)"]
        DMP["DefaultManifestProcess"]
    end

    SP["SnapshotProducer<br/>(engine)"]

    %% who implements the entry trait
    FAA -- impl --> TA
    RFA -. impl planned .-> TA

    %% supertrait
    SPO -- "requires (supertrait)" --> SV

    %% concrete impls of behavior traits
    FAO -- impl --> SPO
    RFO -. impl planned .-> SPO
    DMP -- impl --> MP

    %% actions drive the producer with an operation + manifest process
    FAA -- "commit() builds" --> SP
    SP -- "uses OP" --> SPO
    SP -- "uses MP" --> MP
    SP -- "calls validate() via supertrait" --> SV

    FAA -. "passes" .-> FAO
    FAA -. "passes" .-> DMP
    RFA -. "passes (planned)" .-> RFO

What changes are included in this PR?

Added a new trait SnapshotValidator to help validate the snapshot to avoid conflicts

Are these changes tested?

Not yet, this should be tested when concrete implementations that validate history or no new delete files are added

CTTY · 2026-06-05T00:33:20Z

+        let metadata = base.metadata_ref();
+        let snapshots = ancestors_between(&metadata, to_snapshot_id, from_snapshot_id);
+
+        for current_snapshot in snapshots {


This and other for-loops have opportunity to be parallelized using runtime, we can revisit this once we are aligned on the general design or as a follow-up

CTTY · 2026-06-05T00:33:53Z

+    /// # Errors
+    ///
+    /// Returns an error if the history between the snapshots cannot be determined.
+    #[allow(dead_code)]


marking them as dead_code for now and they will be used by delete or rewrite actions later on

Should we add some unit tests?

huan233usc · 2026-06-05T05:12:23Z

+use crate::{Error, ErrorKind};
+
+/// Operations whose snapshots may add delete files.
+static VALIDATE_ADDED_DELETE_FILES_OPERATIONS: Lazy<HashSet<Operation>> =


nit: The price for pay for defining hash set is the changes in public-api.txt etc. No strong opinion but it seems that we could avoid it by defining

/// Operations whose snapshots may add delete files. fn adds_delete_files(op: &Operation) -> bool { matches!(op, Operation::Overwrite | Operation::Delete) }

+1, I think having a reference to that function improves readability anyway.

Maybe not worth it since I don't expect a new operation type anytime soon, but I was thinking along the lines of this.

/// Operations whose snapshots may add delete files. fn adds_delete_files(op: &Operation) -> bool { match op { Operation::Append | Operation::Replace => false, Operation::Overwrite | Operation::Delete => true, } }

I know this mirrors the Java implementation, however I find it confusing as a replace operation will add delete files, when it rewrites the delete files to retain deletes it could not apply.

What we're really checking for here is that no new deletes apply to the data files we've rewritten.

Is adds_deletes(op: &Operation) better here?

I'm not sure I fully understand, if we add multiple APIs to Operation like this, how do we tell which API to call in SnapshotValidator?

Right now we only need one implementation of validation_history across multiple table actions, and I think this is convenient. But would be happy to switch to the cleaner design if it doesn't really complicate the implementation

huan233usc · 2026-06-05T05:12:51Z

+    /// # Errors
+    ///
+    /// Returns an error if the history between the snapshots cannot be determined.
+    #[allow(dead_code)]


Should we add some unit tests?

blackmwk · 2026-06-05T08:52:50Z

I'm still quite confused about how the overall things works. I think the key point is the design of MergingSnapshotProduder, which contains a lot of indices to speed up the confliction check.

dannycjones

thanks Shawn!

dannycjones · 2026-06-05T20:39:10Z

+    /// Validates a snapshot against a table.
+    ///
+    /// # Arguments
+    ///
+    /// * `base` - The base table to validate against.
+    /// * `parent_snapshot_id` - The ID of the parent snapshot, if any. This is usually
+    ///   the latest snapshot of the base table, unless it's a non-main branch
+    ///   (note: writing to branches is not currently supported).
+    ///
+    /// # Returns
+    ///
+    /// A `Result` indicating success or an error if validation fails.
+    async fn validate(&self, _base: &Table, _parent_snapshot_id: Option<i64>) -> Result<()> {
+        Ok(())
+    }


Why not leave this abstract, such that append is documented as requiring no validation in its implementation?

Yes, that would also work. I don't really have a preference here

dannycjones · 2026-06-05T20:39:18Z

+use crate::{Error, ErrorKind};
+
+/// Operations whose snapshots may add delete files.
+static VALIDATE_ADDED_DELETE_FILES_OPERATIONS: Lazy<HashSet<Operation>> =


+1, I think having a reference to that function improves readability anyway.

Maybe not worth it since I don't expect a new operation type anytime soon, but I was thinking along the lines of this.

/// Operations whose snapshots may add delete files. fn adds_delete_files(op: &Operation) -> bool { match op { Operation::Append | Operation::Replace => false, Operation::Overwrite | Operation::Delete => true, } }

dannycjones · 2026-06-05T20:39:37Z

+use crate::{Error, ErrorKind};
+
+/// Operations whose snapshots may add delete files.
+static VALIDATE_ADDED_DELETE_FILES_OPERATIONS: Lazy<HashSet<Operation>> =


I know this mirrors the Java implementation, however I find it confusing as a replace operation will add delete files, when it rewrites the delete files to retain deletes it could not apply.

What we're really checking for here is that no new deletes apply to the data files we've rewritten.

Is adds_deletes(op: &Operation) better here?

dannycjones · 2026-06-05T21:31:07Z

+            match base.metadata().snapshot_by_id(from_snapshot_id) {
+                Some(snapshot) => snapshot.sequence_number(),
+                None => INITIAL_SEQUENCE_NUMBER,
+            }


Why is it not an error if we cannot find the snapshot associated with from_snapshot_id? Is it safe to default to INITIAL_SEQUENCE_NUMBER here?

It shouldn't hurt if we validate against the entire snapshot history. the conflict rate would be higher if we were trying to pass an invalid starting snapshot id

dannycjones · 2026-06-05T21:40:22Z

+                    return Err(Error::new(
+                        ErrorKind::DataInvalid,
+                        format!(
+                            "Cannot commit, found new positional delete for added data file: {}",
+                            data_file.file_path
+                        ),
+                    ));


The data files are not 'added', right?

I was reviewing the Java implementation, the wording is: "Cannot commit, found new position delete for replaced data file: ". Shall we align with that?

https://github.com/apache/iceberg/blob/bead8dfab3249cbc6ba7713e862ff2698fd2122c/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L538-L549

Yes, I think "replaced data file" is more precise

CTTY · 2026-06-08T22:53:35Z

I think the key point is the design of MergingSnapshotProduder, which contains a lot of indices to speed up the confliction check.

I think you meant DeleteFileIndex in java implementation and the conflictDetectionFilter. In rust's DeleteFileIndex, we don't allow filter to be pushed down at this point, so I didn't include that change.

In the future, we could store the filter in snapshot operations like RowDeltaAction and pass the filter to SnapshotValidator::validate_no_new_deletes easily.

The current implementation won't block that change, we will only need to change the API in SnapshotValidator after adding conflict_detecting_filter support to DeleteFileIndex

blackmwk · 2026-06-09T11:58:41Z

I think the key point is the design of MergingSnapshotProduder, which contains a lot of indices to speed up the confliction check.

I think you meant DeleteFileIndex in java implementation and the conflictDetectionFilter. In rust's DeleteFileIndex, we don't allow filter to be pushed down at this point, so I didn't include that change.

In the future, we could store the filter in snapshot operations like RowDeltaAction and pass the filter to SnapshotValidator::validate_no_new_deletes easily.

The current implementation won't block that change, we will only need to change the API in SnapshotValidator after adding conflict_detecting_filter support to DeleteFileIndex

No, I mean ManifestMergeManager, ManifestFilterManager, which are critial data structures enable efficient concurrency. Also I don't understand what a crate private SnapshotValidator is used for?
If you look at java api, each snapshot tx action interface contains interface to allow standalone check. For example, RowDelta has methods like validateDeletedFiles, validateNoConflictingDataFiles, validateNoConflictingDeleteFiles, but they are just marks about what to check during committing. This design is quite flexible since they allow different checks in differnt isolation levels.

CTTY · 2026-06-09T23:01:46Z

ManifestMergeManager: This should be handled by the existing ManifestProcess, we can implement a MergingManifestProcess
ManifestFilterManager: It should live in each transaction operation imo. We can add another trait ManifestFilterManagement and some accessors like fn filter_manager() and fn delete_filter_manager to standardize it. But I think this is more of an implementation details. Basically the filter manager can be built in each operation similar to what we have in Java

struct RowDelta {
  fn removeRows(file: DataFile) {
   ...
   self.delete_data_file(file)
}
}

impl ManifestFilterManagement for RowDelta {
  fn delete_data_file(file: DataFile) {
   self.filter_manager().delete_data_file(file);
  }
}

If we are adding RowDelta in the future, it can implements its own validation logic by implementing the SnapshotValidator trait. These validation flags can also be set in the action like the following. It will still be additive

row_delta.validate_deleted_files() // set the flag to be true in operation

...
impl SnapshotValidator for RowDelta {
 fn validate() {
    if self.validate_deleted_files {...}
 }
}

Add SnapshotValidator

6daa8e6

CTTY commented Jun 5, 2026

View reviewed changes

update public-api

4e118a6

huan233usc reviewed Jun 5, 2026

View reviewed changes

dannycjones reviewed Jun 5, 2026

View reviewed changes

Conversation

CTTY commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blackmwk commented Jun 5, 2026

Uh oh!

dannycjones left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CTTY commented Jun 8, 2026

Uh oh!

blackmwk commented Jun 9, 2026

Uh oh!

CTTY commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CTTY commented Jun 5, 2026 •

edited

Loading