Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions docs/src/format/table/branch_tag.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@ Branch names must follow these validation rules:
4. Cannot contain `..` or `\`
5. Segments must contain only alphanumeric characters, `.`, `-`, `_`
6. Cannot end with `.lock`
7. Cannot be named `main` (reserved for main branch)
7. Cannot be named `main` (reserved for the default branch)

Branch names are case-sensitive, matching Git/GitHub ref semantics. The exact name `main` is a virtual name for the default branch. It may appear in API reference contexts as an alias for the default branch, but no branch metadata file named `main.json` is created.

### Branch Metadata Path

Expand All @@ -38,7 +40,7 @@ Each branch metadata file is a JSON file with the following fields:

| JSON Key | Type | Optional | Description |
|------------------|--------|----------|--------------------------------------------------------------------------------|
| `parentBranch` | string | Yes | Name of the branch this was created from. `null` indicates branched from main. |
| `parentBranch` | string | Yes | Name of the branch this was created from. `null` indicates branched from the default branch. |
| `parentVersion` | number | | Version number of the parent branch at the time this branch was created. |
| `createAt` | number | | Unix timestamp (seconds since epoch) when the branch was created. |
| `manifestSize` | number | | Size of the initial manifest file in bytes. |
Expand Down Expand Up @@ -117,7 +119,7 @@ Each tag file is a JSON file with the following fields:

| JSON Key | Type | Optional | Description |
|-----------------|--------|----------|--------------------------------------------------------------------------|
| `branch` | string | Yes | Branch name being tagged. `null` or absent indicates main branch. |
| `branch` | string | Yes | Branch name being tagged. `null` or absent indicates the default branch. |
| `version` | number | | Version number being tagged within that branch. |
| `createdAt` | string | Yes | RFC 3339 timestamp for when the tag was first created. |
| `updatedAt` | string | Yes | RFC 3339 timestamp for the latest tag reference update. |
Expand Down
10 changes: 7 additions & 3 deletions docs/src/guide/tags_and_branches.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,13 @@ The `reference` parameter (used in `create`, `update`, and `checkout_version`) a
- An **integer**: version number in the **current branch** (e.g., `1`)
- A **string**: tag name (e.g., `"stable"`)
- A **tuple** `(branch_name, version)`: a specific version in a named branch
- `(None, 2)` means version 2 on the main branch
- `("main", 2)` means version 2 on the main branch (explicit)
- `(None, 2)` means version 2 on the default branch
- `("main", 2)` means version 2 on the default branch (explicit)
- `("experiment", 3)` means version 3 on the experiment branch
- `("branch-name", None)` means the latest version on that branch

In reference contexts, `"main"` is an alias for the default branch and is equivalent to `None`.

!!! note

Creating or deleting tags does not generate new dataset versions.
Expand Down Expand Up @@ -77,7 +79,7 @@ The `reference` parameter works the same as for Tags (see above).

Each branch maintains its own linear version history, so version numbers may overlap across branches. Use `(branch_name, version_number)` tuples as global identifiers for operations like `checkout_version` and `tags.create`.

"main" is a reserved branch name. Lance uses "main" to identify the default branch.
`"main"` is reserved for the default branch. Use `"main"` or `None` when referring to the default branch in reference tuples or checkout APIs, but choose a different name when creating, deleting, or updating branches.

### Create and checkout branches
```python
Expand All @@ -99,6 +101,8 @@ ds.tags.create("experiment-rc", ("experiment", None))
experiment_rc = ds.checkout_version("experiment-rc")
# Checkout the latest version of the experimental branch by tuple
experiment_latest = ds.checkout_version(("experiment", None))
# Checkout the latest version of the default branch explicitly
main_latest = ds.checkout_version(("main", None))

# Create a new branch from a tag
new_experiment = ds.create_branch("new-experiment", "experiment-rc")
Expand Down
2 changes: 2 additions & 0 deletions docs/src/quickstart/versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ For advanced tag operations (e.g., tagging versions on specific branches), see [

Branches manage parallel lines of dataset evolution. You can create branches from existing versions or tags, read and write to them independently, and checkout different branches.

`main` refers to the default branch in checkout and reference APIs, but it is reserved and cannot be used as the name of a new branch.

```python
# Create branch from current latest version
experiment_branch = ds.create_branch("experiment")
Expand Down
36 changes: 23 additions & 13 deletions java/src/main/java/org/lance/Dataset.java
Original file line number Diff line number Diff line change
Expand Up @@ -1690,10 +1690,12 @@ public Branches branches() {

/**
* Create a branch at a specified version. The returned Dataset points to the created branch's
* initial version.
* initial version. The branch name {@code "main"} is reserved for the default branch and cannot
* be used as a new branch name.
*
* @param branch the branch name to create
* @param ref the reference to create branch from
* @param ref the reference to create branch from. In reference contexts, {@code "main"} is an
* alias for the default branch.
* @return a new Dataset of the branch
*/
public Dataset createBranch(String branch, Ref ref) {
Expand All @@ -1703,10 +1705,12 @@ public Dataset createBranch(String branch, Ref ref) {

/**
* Create a branch at a specified version. The returned Dataset points to the created branch's
* initial version.
* initial version. The branch name {@code "main"} is reserved for the default branch and cannot
* be used as a new branch name.
*
* @param branch the branch name to create
* @param ref the reference to create branch from
* @param ref the reference to create branch from. In reference contexts, {@code "main"} is an
* alias for the default branch.
* @param storageOptions the storage options to create branch with
* @return a new Dataset of the branch
*/
Expand All @@ -1727,8 +1731,9 @@ private Dataset innerCreateBranch(
}

/**
* Checkout using a unified {@link Ref} which can be a tag, the latest version on main/branch or a
* specified (branch_name, version_number).
* Checkout using a unified {@link Ref} which can be a tag, the latest version on the default
* branch or a named branch, or a specified (branch_name, version_number). In reference contexts,
* {@code "main"} is an alias for the default branch.
*
* @param ref the checkout reference
* @return a new Dataset instance checked out to the specified reference
Expand Down Expand Up @@ -1765,7 +1770,7 @@ public Map<String, String> getTableMetadata() {
public class Tags {

/**
* Create a new tag on main branch. This is left for compatibility. We should use {@link
* Create a new tag on the default branch. This is left for compatibility. We should use {@link
* #create(String, Ref)} instead.
*
* @param tag the tag name
Expand All @@ -1780,7 +1785,8 @@ public void create(String tag, long versionNumber) {
* Create a new tag on a specified branch.
*
* @param tag the tag name
* @param ref the referenced version to tag
* @param ref the referenced version to tag. In reference contexts, {@code "main"} is an alias
* for the default branch.
*/
public void create(String tag, Ref ref) {
Preconditions.checkArgument(tag != null, "Tag name cannot be null");
Expand All @@ -1797,6 +1803,8 @@ public void create(String tag, Ref ref) {
*
* @param tag the name of the tag to create
* @param versionNumber the version number (or commit reference) to associate with the tag
* @param targetBranch the branch to tag. In reference contexts, {@code "main"} is an alias for
* the default branch.
*/
@Deprecated
public void create(String tag, long versionNumber, String targetBranch) {
Expand All @@ -1816,11 +1824,11 @@ public void delete(String tag) {
}

/**
* Update a tag to a new version_number on main. This is left for compatibility. We should use
* {@link #update(String, Ref)} instead.
* Update a tag to a new version_number on the default branch. This is left for compatibility.
* We should use {@link #update(String, Ref)} instead.
*
* @param tag the tag name
* @param versionNumber the versionNumber on main.
* @param versionNumber the versionNumber on the default branch.
*/
public void update(String tag, long versionNumber) {
Preconditions.checkArgument(versionNumber > 0, "version_number must be greater than 0");
Expand All @@ -1831,7 +1839,8 @@ public void update(String tag, long versionNumber) {
* Update a tag to a new reference.
*
* @param tag the tag name
* @param ref the referenced version to tag
* @param ref the referenced version to tag. In reference contexts, {@code "main"} is an alias
* for the default branch.
*/
public void update(String tag, Ref ref) {
Preconditions.checkArgument(tag != null, "tag cannot be null");
Expand Down Expand Up @@ -1883,7 +1892,8 @@ public class Branches {
/**
* Delete a branch and its metadata.
*
* @param branchName the branch to delete
* @param branchName the branch to delete. {@code "main"} is reserved for the default branch and
* cannot be deleted as a named branch.
*/
public void delete(String branchName) {
try (LockManager.WriteLock writeLock = lockManager.acquireWriteLock()) {
Expand Down
12 changes: 12 additions & 0 deletions java/src/main/java/org/lance/Ref.java
Original file line number Diff line number Diff line change
Expand Up @@ -41,21 +41,33 @@ public Optional<String> getTagName() {
return tagName;
}

/** Creates a reference to a specific version on the default branch. */
public static Ref ofMain(long versionNumber) {
Preconditions.checkArgument(versionNumber > 0, "versionNumber must be greater than 0");
return new Ref(Optional.of(versionNumber), Optional.empty(), Optional.empty());
}

/** Creates a reference to the latest version on the default branch. */
public static Ref ofMain() {
return new Ref(Optional.empty(), Optional.empty(), Optional.empty());
}

/**
* Creates a reference to the latest version on a branch.
*
* <p>In reference contexts, {@code "main"} is an alias for the default branch.
*/
public static Ref ofBranch(String branchName) {
Preconditions.checkArgument(
branchName != null && !branchName.isEmpty(), "branchName must not be empty");
return new Ref(Optional.empty(), Optional.of(branchName), Optional.empty());
}

/**
* Creates a reference to a specific version on a branch.
*
* <p>In reference contexts, {@code "main"} is an alias for the default branch.
*/
public static Ref ofBranch(String branchName, long versionNumber) {
Preconditions.checkArgument(
branchName != null && !branchName.isEmpty(), "branchName must not be empty");
Expand Down
18 changes: 12 additions & 6 deletions python/python/lance/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -926,12 +926,14 @@ def create_branch(
Parameters
----------
branch: str
Name of the branch to create.
Name of the branch to create. ``"main"`` is reserved for the
default branch and cannot be used as a new branch name.
reference: Optional[int | str | Tuple[Optional[str], Optional[int]]
An integer specifies a version number in the current branch; a string
specifies a tag name; a Tuple[Optional[str], Optional[int]] specifies
a version number in a specified branch. (None, None) means the latest
version_number on the main branch.
version_number on the default branch. ``("main", version)`` is an
explicit alias for the default branch in this reference context.
storage_options: Optional[Dict[str, str]]
Storage options for the underlying object store. If not provided,
the storage options from the current dataset will be used.
Expand Down Expand Up @@ -2863,7 +2865,8 @@ def checkout_version(
An integer specifies a version number in the current branch; a string
specifies a tag name; a Tuple[Optional[str], Optional[int]] specifies
a version number in a specified branch. (None, None) means the latest
version_number on the main branch.
version_number on the default branch. ``("main", version)`` is an
explicit alias for the default branch in this reference context.

Returns
-------
Expand Down Expand Up @@ -4616,7 +4619,8 @@ def shallow_clone(
An integer specifies a version number in the current branch; a string
specifies a tag name; a Tuple[Optional[str], Optional[int]] specifies
a version number in a specified branch. (None, None) means the latest
version_number on the main branch.
version_number on the default branch. ``("main", version)`` is an
explicit alias for the default branch in this reference context.
storage_options : dict, optional
Object store configuration for the new dataset (e.g., credentials,
endpoints). If not specified, the storage options of the source dataset
Expand Down Expand Up @@ -6930,7 +6934,8 @@ def create(
An integer specifies a version number in the current branch; a string
specifies a tag name; a Tuple[Optional[str], Optional[int]] specifies
a version number in a specified branch. (None, None) means the latest
version_number on the main branch.
version_number on the default branch. ``("main", version)`` is an
explicit alias for the default branch in this reference context.
"""
self._ds.create_tag(tag, reference)

Expand Down Expand Up @@ -6962,7 +6967,8 @@ def update(
An integer specifies a version number in the current branch; a string
specifies a tag name; a Tuple[Optional[str], Optional[int]] specifies
a version number in a specified branch. (None, None) means the latest
version_number on the main branch.
version_number on the default branch. ``("main", version)`` is an
explicit alias for the default branch in this reference context.
"""
self._ds.update_tag(tag, reference)

Expand Down
27 changes: 19 additions & 8 deletions rust/lance/src/dataset.rs
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,10 @@ impl Dataset {
DatasetBuilder::from_uri(uri).load().await
}

/// Check out a dataset version with a ref
/// Check out a dataset version with a ref.
///
/// In reference contexts, `"main"` is an alias for the default branch and
/// is equivalent to `None`.
pub async fn checkout_version(&self, version: impl Into<refs::Ref>) -> Result<Self> {
let reference: refs::Ref = version.into();
match reference {
Expand Down Expand Up @@ -473,7 +476,9 @@ impl Dataset {
Ok(())
}

/// Check out the latest version of the branch
/// Check out the latest version of the branch.
///
/// Use `"main"` to check out the latest version of the default branch.
pub async fn checkout_branch(&self, branch: &str) -> Result<Self> {
self.checkout_by_ref(None, Some(branch)).await
}
Expand All @@ -493,12 +498,16 @@ impl Dataset {
/// which can be cleaned up later. Such a zombie dataset may cause a branch creation
/// failure if we use the same name to `create_branch`. In that case, you need to call
/// `force_delete_branch` to interactively clean up the zombie dataset.
///
/// `"main"` is reserved for the default branch and cannot be used as a new branch name.
pub async fn create_branch(
&mut self,
branch: &str,
version: impl Into<refs::Ref>,
store_params: Option<ObjectStoreParams>,
) -> Result<Self> {
refs::check_valid_branch(branch)?;

let (source_branch, version_number) = self.resolve_reference(version.into()).await?;
let branch_location = self.branch_location().find_branch(Some(branch))?;
let source_location = self
Expand Down Expand Up @@ -560,16 +569,19 @@ impl Dataset {
version_number: Option<u64>,
branch: Option<&str>,
) -> Result<Self> {
let standardized_branch = branch.and_then(refs::standardize_branch);
// Reject malformed names at the boundary (mirroring the branch CRUD
// paths) so they fail as InvalidRef instead of tripping the wrong-chain
// check below
if let Some(branch_name) = branch
&& !Branches::is_main_branch(branch)
if let Some(branch_name) = standardized_branch.as_deref()
&& !Branches::is_main_branch(Some(branch_name))
{
refs::check_valid_branch(branch_name)?;
}

let new_location = self.branch_location().find_branch(branch)?;
let new_location = self
.branch_location()
.find_branch(standardized_branch.as_deref())?;

let manifest_location = if let Some(version_number) = version_number {
self.commit_handler
Expand All @@ -585,7 +597,7 @@ impl Dataset {
.await?
};

if self.already_checked_out(&manifest_location, branch) {
if self.already_checked_out(&manifest_location, standardized_branch.as_deref()) {
return Ok(self.clone());
}

Expand All @@ -601,8 +613,7 @@ impl Dataset {
// means the commit handler resolved against a different chain (for
// example an external manifest store that ignores branch-qualified
// paths); error loudly rather than hand back another branch's data.
let requested_branch = branch.and_then(refs::standardize_branch);
if manifest.branch.as_deref() != requested_branch.as_deref() {
if manifest.branch.as_deref() != standardized_branch.as_deref() {
return Err(Error::internal(format!(
"checkout of branch '{}' at version {} resolved a manifest belonging to branch '{}'",
refs::normalize_branch(branch),
Expand Down
3 changes: 2 additions & 1 deletion rust/lance/src/dataset/refs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1020,7 +1020,8 @@ pub fn check_valid_branch(branch_name: &str) -> Result<()> {

if branch_name.eq("main") {
return Err(Error::InvalidRef {
message: "Branch name cannot be 'main'".to_string(),
message: "\"main\" is reserved for the default branch; use a different branch name"
.to_string(),
});
}
Ok(())
Expand Down
Loading
Loading