Skip to content

Can not delete branches referenced by other branches #7185

@majin1102

Description

@majin1102

While working on branch cleanup support, I found a format limitation that prevents deleting a branch if it is still referenced by other branches. IMO, this is a limitation in the current format, and it is worth opening an issue for further discussion. @jackye1995 @brendanclement

Problem

The current branch format binds the branch name to its physical layout path.

For example, if we have:

main -> featureA -> experimentA

then:

  • featureA is stored at tree/featureA
  • experimentA is stored at tree/featureA/experimentA

Now suppose we want to delete featureA but keep experimentA.

In this case, we cannot really remove the featureA directory, because the root version of experimentA lives inside featureA. We also cannot really remove all data of featureA: versions and data still referenced by downstream branches must be retained, while anything no longer referenced can be released.

This is reasonable by itself.

The issue is that if tree/featureA cannot be removed physically, then the branch name is effectively still occupied, and we cannot create a new branch with the same name later.

Because of this, the current implementation does not allow deleting a branch that is still referenced by descendant branches. This restriction comes from the current format spec.

Current state

We have already introduced branch_identifier, and persistent branch lineage is now available in metadata. That means we no longer need to use the branch name itself to represent lineage.

Proposal

Use UUID-based physical branch directories instead of branch-name-based paths.

For example:

  • tree/UUID1 -> featureA
  • tree/UUID2 -> experimentA

The mapping from branch_name -> uuid has been stored in branch metadata.

When loading a branch, we can resolve its physical path from branch metadata, instead of deriving it from the branch name.

Deletion semantics

With this model, deleting featureA should mean:

  • release unreferenced versions and data in this branch
  • retain versions and data still referenced by descendant branches
  • delete the branch metadata for featureA

This also means branch deletion should examine not only what can be removed from the target branch itself, but also whether versions and data in upstream branches can now be released.

Compatibility

We should also consider compatibility carefully:

  • it should still be possible to load a branch dataset from dataset/tree/branch_name
  • we need to consider how to remain compatible with existing branch URLs

Cleanup

Dangling directories left by branch creation should also be considered and removed by cleanup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions