-
Notifications
You must be signed in to change notification settings - Fork 24
fix: self-heal manifest-unreferenced branch forks (stop wedged branches) #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6d42fc7
3edbe2e
6e3204d
37f5af5
fedda78
d2bccd9
bbfedf3
509a9c2
3300109
c2f492d
d0fb12b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -599,27 +599,37 @@ pub struct BranchReconcileStats { | |
| pub failures: Vec<(String, String)>, | ||
| } | ||
|
|
||
| /// Drop every per-table and commit-graph Lance branch that the manifest no | ||
| /// longer references. | ||
| /// Drop every per-table and commit-graph Lance branch fork the manifest does | ||
| /// not reference. | ||
| /// | ||
| /// Orphaned forks arise when a `branch_delete` flips the manifest authority | ||
| /// (atomic) but a downstream best-effort reclaim does not complete. They are | ||
| /// unreachable through any snapshot — no manifest entry can name them — yet | ||
| /// they pin their `tree/{branch}/` storage and can block reusing the branch | ||
| /// name. This is the guaranteed convergence backstop: it is idempotent and | ||
| /// derived purely from the manifest authority, so it no-ops once everything is | ||
| /// reconciled, and it would harmlessly find nothing if a future Lance atomic | ||
| /// multi-dataset branch op prevented orphans from forming. | ||
| /// Two origins produce a manifest-unreferenced fork: | ||
| /// 1. A `branch_delete` flips the manifest authority (atomic) but a | ||
| /// downstream best-effort reclaim does not complete — the whole branch is | ||
| /// gone from the manifest, but a `tree/{branch}/` ref lingers. | ||
| /// 2. A first-write fork (or a merge fork) creates the branch ref before the | ||
| /// manifest publish, then the writer dies / is cancelled — the branch is | ||
| /// still a live manifest branch, but the manifest's snapshot of it does | ||
| /// not place *this table* on the branch. | ||
| /// | ||
| /// The keep-set is the full (unfiltered) manifest branch list, so system | ||
| /// branches' forks are never reclaimed; `main`/default is not a named Lance | ||
| /// branch and so is never a candidate. Referencing children are dropped before | ||
| /// parents (Lance refuses to delete a referenced parent) by ordering longest | ||
| /// branch names first. | ||
| /// The write path self-heals (2) on the next write to the table | ||
| /// (`reclaim_orphaned_fork_and_refork`); this is the guaranteed-convergence | ||
| /// backstop that also covers (1) and any table the write path never revisits. | ||
| /// | ||
| /// The orphan test is therefore **per-table**, not per-branch-name: a Lance | ||
| /// branch `B` on table `T` is an orphan iff `B` is not a live manifest branch | ||
| /// at all (origin 1) OR the manifest's branch-`B` snapshot does not place `T` | ||
| /// on `B` (origin 2). A legitimately-forked table (`table_branch == Some(B)`) | ||
| /// is kept. `main` and internal/system branches are never candidates. Lance | ||
| /// refuses to force-delete a branch with referencing descendants, so children | ||
| /// are dropped before parents (longest name first). Idempotent and authority- | ||
| /// derived: no-ops once reconciled, and degrades to finding nothing if a future | ||
| /// Lance atomic multi-dataset branch op prevents orphans from forming. | ||
| pub async fn reconcile_orphaned_branches(db: &Omnigraph) -> Result<BranchReconcileStats> { | ||
| use std::collections::HashSet; | ||
| use std::collections::{HashMap, HashSet}; | ||
|
|
||
| let keep: HashSet<String> = db | ||
| // Live manifest branches: the set whose per-table placements are | ||
| // authoritative. A branch absent here is a whole-branch (origin-1) orphan. | ||
| let live_branches: HashSet<String> = db | ||
| .coordinator | ||
| .read() | ||
| .await | ||
|
|
@@ -640,6 +650,9 @@ pub async fn reconcile_orphaned_branches(db: &Omnigraph) -> Result<BranchReconci | |
| .collect(); | ||
|
|
||
| let mut stats = BranchReconcileStats::default(); | ||
| // Per-branch snapshots are resolved once and cached across tables (few | ||
| // branches in practice); origin-2 detection consults the branch's own view. | ||
| let mut branch_snapshots: HashMap<String, crate::db::Snapshot> = HashMap::new(); | ||
|
|
||
| // Per-table fault isolation: one table's transient failure is recorded and | ||
| // logged, never aborting the rest of the sweep. | ||
|
|
@@ -658,7 +671,103 @@ pub async fn reconcile_orphaned_branches(db: &Omnigraph) -> Result<BranchReconci | |
| continue; | ||
| } | ||
| }; | ||
| for branch in orphan_branches(listed, &keep) { | ||
|
|
||
| // Decide per (table, branch) whether the fork is an orphan. | ||
| let mut orphans: Vec<String> = Vec::new(); | ||
| for branch in listed { | ||
| // `main` is not a named Lance branch; system/internal branches | ||
| // (e.g. the schema-apply lock) own legitimate forks — never touch. | ||
| if branch == "main" || crate::db::is_internal_system_branch(&branch) { | ||
| continue; | ||
| } | ||
| let is_orphan = if !live_branches.contains(&branch) { | ||
| true // origin 1: whole branch gone from the manifest | ||
| } else { | ||
| // origin 2: live branch, but does the manifest place THIS | ||
| // table on it? Resolve (and cache) the branch's snapshot. | ||
| if !branch_snapshots.contains_key(&branch) { | ||
| match db.snapshot_for_branch(Some(&branch)).await { | ||
| Ok(snap) => { | ||
| branch_snapshots.insert(branch.clone(), snap); | ||
| } | ||
| Err(err) => { | ||
| tracing::warn!( | ||
| target: "omnigraph::cleanup", | ||
| table = %table_key, | ||
| branch = %branch, | ||
| error = %err, | ||
| "resolving branch snapshot failed during reconcile; skipping", | ||
| ); | ||
| stats.failures.push((table_key.clone(), err.to_string())); | ||
| continue; | ||
| } | ||
| } | ||
| } | ||
| branch_snapshots[&branch] | ||
| .entry(&table_key) | ||
| .map(|e| e.table_branch.as_deref() != Some(branch.as_str())) | ||
| .unwrap_or(true) | ||
| }; | ||
| if is_orphan { | ||
| orphans.push(branch); | ||
| } | ||
| } | ||
| // Children before parents (longest name first) so Lance's referenced- | ||
| // parent RefConflict cannot block reclamation. | ||
| orphans.sort_by(|a, b| b.len().cmp(&a.len()).then_with(|| a.cmp(b))); | ||
|
|
||
| for branch in orphans { | ||
|
cursor[bot] marked this conversation as resolved.
|
||
| // Serialize against in-process live writers before destroying a ref. | ||
| // A first-write fork holds the per-(table, branch) write queue from | ||
| // before the fork through the manifest publish; on a LIVE branch its | ||
| // in-flight fork looks exactly like an origin-2 orphan (manifest not | ||
| // yet advanced). Acquire the same queue so cleanup waits for any such | ||
| // writer, then RE-VALIDATE under the queue with a fresh read: if the | ||
| // writer published in the meantime (table now placed on the branch), | ||
| // it is no longer an orphan — skip it. (Cross-process writers remain | ||
| // the documented one-winner-CAS gap.) One key held at a time → no | ||
| // lock-order inversion against multi-table `acquire_many` writers. | ||
| let _guard = db | ||
| .write_queue() | ||
| .acquire(&(table_key.clone(), Some(branch.clone()))) | ||
| .await; | ||
| let still_orphan = if !live_branches.contains(&branch) { | ||
| // Origin 1: the branch is absent from the manifest authority | ||
| // entirely — a confirmed orphan. No live writer can hold this | ||
| // branch's queue (you cannot first-write to a branch the | ||
| // manifest does not have), so no fresh re-check is needed. | ||
| true | ||
|
Comment on lines
+734
to
+739
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The origin-2 path correctly defends against this by calling |
||
| } else { | ||
| // Origin 2: a LIVE branch whose manifest snapshot does not (yet) | ||
| // place this table on it. A fresh read tells us whether an | ||
| // in-process writer published a legitimate fork while we waited | ||
| // on the queue. On a TRANSIENT read failure we must NOT destroy | ||
| // the ref — skip and let a later cleanup converge, matching the | ||
| // write-path reclaim (which aborts on the same error). Treating | ||
| // a read error as "still orphan" here would let a transient | ||
| // manifest hiccup delete a fork the manifest considers live. | ||
| match db.fresh_snapshot_for_branch(Some(&branch)).await { | ||
| Ok(snap) => snap | ||
| .entry(&table_key) | ||
| .map(|e| e.table_branch.as_deref() != Some(branch.as_str())) | ||
| .unwrap_or(true), | ||
| Err(err) => { | ||
| tracing::warn!( | ||
| target: "omnigraph::cleanup", | ||
| table = %table_key, | ||
| branch = %branch, | ||
| error = %err, | ||
| "fresh re-check failed during reconcile; skipping to avoid \ | ||
| destroying a possibly-live fork (will retry next cleanup)", | ||
| ); | ||
| stats.failures.push((table_key.clone(), err.to_string())); | ||
| false | ||
| } | ||
| } | ||
| }; | ||
|
cursor[bot] marked this conversation as resolved.
|
||
| if !still_orphan { | ||
| continue; | ||
| } | ||
| let outcome = match crate::failpoints::maybe_fail("cleanup.reconcile_fork") { | ||
| Ok(()) => storage.force_delete_branch(&full_path, &branch).await, | ||
| Err(injected) => Err(injected), | ||
|
|
@@ -679,9 +788,9 @@ pub async fn reconcile_orphaned_branches(db: &Omnigraph) -> Result<BranchReconci | |
| } | ||
| } | ||
|
|
||
| // Commit-graph orphans (best-effort: the dataset may not exist on a graph | ||
| // that has never committed; any failure is isolated and retried next time). | ||
| if let Err(err) = reconcile_commit_graph_orphans(db, &keep, &mut stats).await { | ||
| // Commit-graph orphans are whole-branch (not per-table), so the simple | ||
| // "branch name not in the live set" test still applies there. | ||
| if let Err(err) = reconcile_commit_graph_orphans(db, &live_branches, &mut stats).await { | ||
| tracing::warn!( | ||
| target: "omnigraph::cleanup", | ||
| error = %err, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reclaim trusts the caller's proof that the manifest does not place the table on
active_branch, but the first-write path obtains that proof withsnapshot_for_branch, which can return the coordinator's cached snapshot when the handle is currently bound to that branch. During the branch-merge target-branch swap (or any stale branch-bound handle), another writer may already have published a legitimate fork;create_branchthen reportsRefAlreadyExistsand this path force-deletes/re-forks that valid table branch beforecommit_allnotices the manifest/head mismatch, leaving the manifest pointing at a deleted fork. Re-open a fresh branch snapshot immediately before reclaiming, or treatRefAlreadyExistsas retryable unless the fresh manifest proves the ref is unreferenced.Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in fedda78.
reclaim_orphaned_fork_and_reforknow re-derives its precondition from a FRESH manifest read (fresh_snapshot_for_branch, which bypasses the coordinator cache) immediately before force-deleting, and refuses with a retryable conflict if the table is legitimately on the branch. Correct regardless of caller snapshot staleness (including the branch-merge target swap). Good catch.