feat(transaction): add ExpireSnapshotsAction#2591
Conversation
fce97ed to
9b45819
Compare
Add a metadata-only transaction action that removes snapshots, emitting a single RemoveSnapshots update. Physical file cleanup is left to a follow-up maintenance operation built on top of this action. Ref apache#1454
9b45819 to
0bdd123
Compare
laskoviymishka
left a comment
There was a problem hiding this comment.
A few things I'd fix before merge:
- Branch ancestors aren't protected. Age expiry only protects ref heads, so a shared ancestor of two branches can be expired while still reachable via parent_snapshot_id, leaving a dangling pointer. Same root cause makes retain_last global instead of per-branch (Java walks each branch's ancestry). I think this needs a per-branch ancestor walk.
- Commit requirement is too weak. We only emit UuidMatch, which leaves a race where a concurrent writer's new snapshot gets orphaned. SnapshotProducer guards this with a RefSnapshotIdMatch on main — I'd mirror that.
- expire_snapshot_ids replaces instead of accumulating, unlike add_data_files and friends.
- gc.enabled isn't checked — Java refuses to expire when it's false.
- The combined explicit + age test the description mentions doesn't actually exist.
Happy to take another pass once these are addressed.
|
|
||
| let mut snapshot_ids: Vec<i64> = to_expire.into_iter().collect(); | ||
| snapshot_ids.sort_unstable(); | ||
| Ok(snapshot_ids) |
There was a problem hiding this comment.
I think there's a subtle correctness issue here that's the deepest thing in the PR.
We protect only ref heads in protected_snapshot_ids, but age expiry walks a global pool. For two branches sharing a common base, the shared ancestor isn't a head, so it gets expired even though it's still reachable via parent_snapshot_id — afterwards that pointer dangles and tools that walk full ancestry (Flink, Trino CDC) see truncated history.
The same root cause bites retain_last: Java's computeBranchSnapshotsToRetain applies it per-branch along each ancestry chain, but here it's the N globally-most-recent across all branches. With branch A [A1,A2,A3] and branch B [B1,B2,B3], retain_last=2 keeps [B2,B3] globally and expires A2, which Java would retain.
I'd build the protected set by walking each branch's ancestor chain via parent_snapshot_id, stopping when count >= retain_last and timestamp < cutoff, then union that with the head set before age expiry. That's a bigger change, but I don't think we can ship age-based expiry without it. wdyt?
| use crate::transaction::tests::make_v2_table; | ||
|
|
||
| // `make_v2_table` carries an older snapshot (ts 1515100955770) and a current | ||
| // snapshot (ts 1555100955770). |
There was a problem hiding this comment.
UuidMatch alone leaves a race window. If another writer commits a new snapshot between snapshot_ids_to_expire and update_table, that snapshot is now live but its parent may be in our to-expire set — removing the parent orphans the new branch history, and the UUID check won't catch it because the table UUID is unchanged.
SnapshotProducer guards against exactly this with a RefSnapshotIdMatch on main (see snapshot.rs ~517). I'd add the same here:
TableRequirement::RefSnapshotIdMatch {
r#ref: "main".to_string(),
snapshot_id: table.metadata().current_snapshot_id(),
},Thoughts?
| pub fn retain_last(mut self, retain_last: usize) -> Self { | ||
| self.retain_last = Some(retain_last); | ||
| self | ||
| } |
There was a problem hiding this comment.
This replaces rather than accumulates, so expire_snapshot_ids(vec![A]).expire_snapshot_ids(vec![B]) silently drops A. Every other builder (add_data_files etc.) uses .extend(), so the inconsistency is surprising, and since this is public surface, switching to accumulate later would be a breaking change.
I'd accumulate and take an iterator to match add_data_files:
pub fn expire_snapshot_ids(mut self, snapshot_ids: impl IntoIterator<Item = i64>) -> Self {
self.snapshot_ids.extend(snapshot_ids);
self
}If replace is intentional, I'd rename to set_snapshot_ids and document it — but accumulate feels right here.
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use std::sync::Arc; |
There was a problem hiding this comment.
We never check gc.enabled before expiring. Java's RemoveSnapshots validates it in the constructor and refuses to run when it's false, since expiring metadata defeats a user's explicit decision to disable GC.
I'd read it at the top of commit (or snapshot_ids_to_expire) and bail:
if table.metadata().properties().get("gc.enabled").map(|v| v == "false").unwrap_or(false) {
return Err(Error::new(ErrorKind::DataInvalid,
"Cannot expire snapshots: gc.enabled is false"));
}|
|
||
| #[tokio::test] | ||
| async fn test_cannot_expire_current_snapshot() { | ||
| let table = make_v2_table(); |
There was a problem hiding this comment.
The PR description says the combined explicit + age path is tested, but I don't see a test that sets both — and it's the primary interaction between the two code paths, so it's the most likely regression site.
I'd add one with non-overlapping targets: expire_snapshot_ids(vec![X]).expire_older_than_ms(cutoff) where an age-qualifying snapshot Y is distinct from X, asserting both land in the result.
Which issue does this PR close?
Revives the effort to close #2145 (originally filed as #1454), which has been attempted before but never landed. This is the first of a small series of PRs and covers the metadata-only piece; physical file cleanup will build on top of it in follow-ups.
What changes are included in this PR?
Adds an
ExpireSnapshotsActionto the transaction API, reachable astx.expire_snapshots()and following the same convention as the other actions. It selects the snapshots to expire and emits a singleRemoveSnapshotsupdate. It deliberately does not touch any files; that is left to a follow-up so this change stays small and easy to review.Selection mirrors Java's
RemoveSnapshots. Explicit ids (expire_snapshot_ids) and age-based expiry (expire_older_than_ms) are combined rather than mutually exclusive: a snapshot is expired if it is named explicitly or is older than the cutoff.retain_lastkeeps the most recent snapshots even when older than the cutoff and only bounds the age path. Any snapshot referenced by a branch or tag (including the current snapshot) is protected from expiry, and naming such an id explicitly is an error, sinceTableMetadataBuilder::remove_snapshotswould otherwise drop the ref silently. Per-ref retention windows (max-snapshot-age, max-ref-age, per-branch min-snapshots-to-keep) are left for a later PR.Are these changes tested?
Yes. Unit tests cover explicit-id expiry, age filtering with
retain_last, unknown ids being ignored, and that the action registers on a transaction, plus the ref-safety behavior: the current snapshot and a tagged snapshot cannot be expired explicitly, and age-based expiry skips a tagged snapshot.