Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/requirements-earliest.txt
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
mercurial==5.2

# Required for git_lfs_importer plugin
pathspec==0.11.2
2 changes: 2 additions & 0 deletions .github/requirements-latest.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
mercurial

# Required for git_lfs_importer plugin
pathspec==0.12.1
74 changes: 62 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,12 +141,48 @@ if [ "$3" == "1" ]; then cat; else dos2unix -q; fi
Mercurial Largefiles Extension
------------------------------

Mercurial largefiles are exported as ordinary files into git, i.e. not
as git lfs files. In order to make the export work, make sure that
you have all largefiles of all mercurial commits available locally.
This can be ensured by either cloning the mercurial repository with
the option --all-largefiles or by executing the command
'hg lfpull --rev "all()"' inside the mercurial repository.
### Handling Mercurial Largefiles during Migration

When migrating from Mercurial to Git, largefiles are exported as ordinary
files by default. To ensure a successful migration and manage repository
size, follow the requirements below.

#### 1. Pre-Export: Ensure File Availability

Before starting the export, you must have all largefiles from all
Mercurial commits available locally. Use one of these methods:

* **For a new clone:** `hg clone --all-largefiles <repo-url>`
* **For an existing repo:** `hg lfpull --rev "all()"`

#### 2. Choosing Your LFS Strategy

If you want your files to be versioned in Git LFS rather than as standard
Git blobs, you have two primary paths:

* **[git_lfs_importer plugin](./plugins/git_lfs_importer/README.md)
(During Conversion)**
Recommended for large repos. This performs Just-In-Time (JIT) conversion
by identifying large files during the export and writing LFS pointers
immediately, skipping the need for a second pass. This also supports
**incremental conversion**, making it much more efficient for ongoing
migrations.
* **[git lfs migrate import](https://github.com/git-lfs/git-lfs/blob/main/docs/man/git-lfs-migrate.adoc)
(After Conversion)**
A standard two-step process: first, export the full history from Mercurial
to Git, then run a separate full history rewrite to move files into LFS.

### Why use the git_lfs_importer plugin?

For "monorepos" or very large repositories (100GiB+), the traditional
two-step process can take days. By integrating the LFS conversion
directly into the history export, the plugin eliminates the massive
time overhead of a secondary history rewrite and allows for incremental
progress.

For detailed setup, see the
[git_lfs_importer](./plugins/git_lfs_importer/README.md)
plugin documentation.

Plugins
-----------------
Expand Down Expand Up @@ -177,9 +213,18 @@ defined filter methods in the [dos2unix](./plugins/dos2unix) and
[branch_name_in_commit](./plugins/branch_name_in_commit) plugins.

```
commit_data = {'branch': branch, 'parents': parents, 'author': author, 'desc': desc, 'revision': revision, 'hg_hash': hg_hash, 'committer': 'committer', 'extra': extra}

def commit_message_filter(self,commit_data):
commit_data = {
'author': author,
'branch': branch,
'committer': 'committer',
'desc': desc,
'extra': extra,
'hg_hash': hg_hash,
'parents': parents,
'revision': revision,
}

def commit_message_filter(self, commit_data):
```
The `commit_message_filter` method is called for each commit, after parsing
from hg, but before outputting to git. The dictionary `commit_data` contains the
Expand All @@ -188,9 +233,14 @@ values in the dictionary after filters have been run are used to create the git
commit.

```
file_data = {'filename':filename,'file_ctx':file_ctx,'data':file_contents, 'is_largefile':largefile_status}

def file_data_filter(self,file_data):
file_data = {
'data': file_contents,
'file_ctx': file_ctx,
'filename': filename,
'is_largefile': largefile_status,
}

def file_data_filter(self, file_data):
```
The `file_data_filter` method is called for each file within each commit.
The dictionary `file_data` contains the above attributes about the file, and
Expand Down
13 changes: 10 additions & 3 deletions hg-fast-export.py
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ def strip_leading_slash(filename):

def export_commit(ui,repo,revision,old_marks,max,count,authors,
branchesmap,sob,brmap,hgtags,encoding='',fn_encoding='',
plugins={}):
first_commit_hash="",plugins={}):
def get_branchname(name):
if name in brmap:
return brmap[name]
Expand Down Expand Up @@ -332,6 +332,9 @@ def get_branchname(name):

if not parents:
type='full'
if revision == 0 and first_commit_hash:
wr(b'from %s' % first_commit_hash.encode())
type='simple delta'
else:
wr(b'from %s' % revnum_to_revref(parents[0], old_marks))
if len(parents) == 1:
Expand Down Expand Up @@ -526,7 +529,8 @@ def verify_heads(ui,repo,cache,force,ignore_unnamed_heads,branchesmap):

def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
authors={},branchesmap={},tagsmap={},
sob=False,force=False,ignore_unnamed_heads=False,hgtags=False,notes=False,encoding='',fn_encoding='',
sob=False,force=False,ignore_unnamed_heads=False,hgtags=False,
notes=False,encoding='',fn_encoding='',first_commit_hash='',
plugins={}):
def check_cache(filename, contents):
if len(contents) == 0:
Expand Down Expand Up @@ -582,7 +586,7 @@ def check_cache(filename, contents):
brmap={}
for rev in range(min,max):
c=export_commit(ui,repo,rev,old_marks,max,c,authors,branchesmap,
sob,brmap,hgtags,encoding,fn_encoding,
sob,brmap,hgtags,encoding,fn_encoding,first_commit_hash,
plugins)
if notes:
for rev in range(min,max):
Expand Down Expand Up @@ -656,6 +660,8 @@ def bail(parser,opt):
help="Add a plugin with the given init string <name=init>")
parser.add_option("--subrepo-map", type="string", dest="subrepo_map",
help="Provide a mapping file between the subrepository name and the submodule name")
parser.add_option("--first-commit-hash", type="string", dest="first_commit_hash",
help="Allow importing into an existing git repository by specifying the hash of the first commit")

(options,args)=parser.parse_args()

Expand Down Expand Up @@ -735,4 +741,5 @@ def bail(parser,opt):
ignore_unnamed_heads=options.ignore_unnamed_heads,
hgtags=options.hgtags,
notes=options.notes,encoding=encoding,fn_encoding=fn_encoding,
first_commit_hash=options.first_commit_hash,
plugins=plugins_dict))
2 changes: 2 additions & 0 deletions hg-fast-export.sh
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ Options:
with <file-path> <hg-hash> <is-binary> as arguments
--plugin <plugin=init> Add a plugin with the given init string (repeatable)
--plugin-path <plugin-path> Add an additional plugin lookup path
--first-commit-hash <git-commit-hash> Use the given git commit hash as the
first commit's parent (for grafting)
"
case "$1" in
-h|--help)
Expand Down
218 changes: 218 additions & 0 deletions plugins/git_lfs_importer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# git_lfs_importer Plugin

This plugin automatically converts matching files to use Git LFS
(Large File Storage) during the Mercurial to Git conversion process.

## Overview

The git_lfs_importer plugin intercepts file data during the hg-fast-export
process and converts files matching specified patterns into Git LFS pointers.
This allows you to seamlessly migrate a Mercurial repository to Git while
simultaneously adopting LFS for large files.

Why use git_lfs_importer?
For large repositories, traditional migration requires two sequential,
long-running steps:

1. Full history conversion from Mercurial to Git.
2. Full history rewrite using git lfs import.

This two-step process can take hours or even days for massive
monorepos (e.g., 100GiB+).

This plugin eliminates the second, time-consuming history rewrite. It performs
the LFS conversion incrementally (Just-In-Time). During the initial export, the
plugin identifies large files and immediately writes LFS pointers into the Git
history. This results in significantly faster conversions and allows for
efficient incremental imports of new changesets.

## Prerequisites

### Dependencies

This plugin requires the `pathspec` package:

```bash
pip install pathspec
```

### Git Repository Setup

The destination Git repository must be pre-initialized with:

1. A `.gitattributes` file configured for LFS tracking
2. Git LFS properly installed and initialized

Example `.gitattributes`:
```
*.bin filter=lfs diff=lfs merge=lfs -text
*.iso filter=lfs diff=lfs merge=lfs -text
large_files/** filter=lfs diff=lfs merge=lfs -text
```

## Usage

### Step 1: Create the Destination Git Repository

```bash
# Create a new git repository
git init my-repo
cd my-repo

# Initialize Git LFS
git lfs install

# Create and commit a .gitattributes file
cat > .gitattributes << EOF
*.bin binary diff=lfs merge=lfs -text
*.iso binary diff=lfs merge=lfs -text
EOF
git add .gitattributes
git commit -m "Initialize Git LFS configuration"

# Get the commit hash (needed for --first-commit-hash)
git rev-parse HEAD
```

### Step 2: Create an LFS Specification File

Create a file (e.g., `lfs-spec.txt`) listing the patterns of files to convert
to LFS. This uses gitignore-style glob patterns:

```
*.bin
*.iso
*.tar.gz
large_files/**
*.mp4
```

### Step 3: Run hg-fast-export with the Plugin

```bash
hg-fast-export.sh \
-r <mercurial-repo-path> \
--plugin git_lfs_importer=lfs-spec.txt \
--first-commit-hash <git-commit-hash> \
--force
```

Replace `<git-commit-hash>` with the hash obtained from Step 1.

## How It Works

1. **Pattern Matching**: Files are matched against patterns in the
LFS specification file using gitignore-style matching
2. **File Processing**: For each matching file:
- Calculates SHA256 hash of the file content
- Stores the actual file content in `.git/lfs/objects/<hash-prefix>/<hash>`
- Replaces the file data with an LFS pointer containing:
- LFS version specification
- SHA256 hash of the original content
- Original file size
3. **Git Fast-Import**: The LFS pointer is committed instead of the actual
file content

## Important Notes

### First Commit Hash Requirement

The `--first-commit-hash` option must be provided with the Git commit hash that
contains your `.gitattributes` file. This allows the plugin to chain from the
existing Git history rather than creating a completely new history.

### Deletions

The plugin safely handles file deletions (data=None) and does not process them.

### Large Files and Largefiles

If the Mercurial repository uses Mercurial's largefiles extension, those files
are already converted to their original content before reaching this plugin,
allowing the plugin to apply LFS conversion if they match the patterns.

## Example Workflow

```bash
# Configuration variables
HG_REPO=/path/to/mercurial/repo
GIT_DIR_NAME=my-project-git
LFS_PATTERN_FILE=../lfs-patterns.txt

# 1. Prepare destination git repo
mkdir "$GIT_DIR_NAME"
cd "$GIT_DIR_NAME"
git init
git lfs install

# Create .gitattributes
cat > .gitattributes << EOF
*.bin filter=lfs diff=lfs merge=lfs -text
*.iso filter=lfs diff=lfs merge=lfs -text
EOF

git add .gitattributes
git commit -m "Add LFS configuration"
FIRST_HASH=$(git rev-parse HEAD)

# 2. Create LFS patterns file
cat > "$LFS_PATTERN_FILE" << EOF
*.bin
*.iso
build/artifacts/**
EOF

# 3. Run conversion
/path/to/hg-fast-export.sh \
-r "$HG_REPO" \
--plugin "git_lfs_importer=$LFS_PATTERN_FILE" \
--first-commit-hash $FIRST_HASH \
--force

# 4. Verify
git log --oneline
git lfs ls-files
```

## Troubleshooting

### LFS Files Not Tracked
Verify that:
- The `.gitattributes` file exists in the destination repository
- Patterns in `.gitattributes` match the files being converted
- `git lfs install` was run in the repository

### "pathspec" Module Not Found
Install the required dependency:
```bash
pip install pathspec
```

### Conversion Fails at Import
Ensure the `--first-commit-hash` value is:
- A valid commit hash in the destination repository
- From a commit that exists before the conversion starts
- The hash of the commit containing `.gitattributes`


### Force Requirement

You only need to pass the `--force` option when converting the *first*
Mercurial commit into a non-empty Git repository. By default, `hg-fast-export`
prevents importing Mercurial commits onto a non-empty Git repo to avoid
creating conflicting histories. Passing `--force` overrides that safety check
and allows the exporter to write the LFS pointer objects and integrate the
converted data with the existing Git history.

If you are doing an incremental conversion (i.e., running the script a second
time to import new changesets into an already converted repository),
the --force flag is not required.

Omitting `--force` when attempting to import the first Mercurial commit into a
non-empty repository will cause the importer to refuse the operation.

## See Also

- [Git LFS Documentation](https://git-lfs.github.com/)
- [gitignore Pattern Format](https://git-scm.com/docs/gitignore)
- [hg-fast-export Documentation](../README.md)
Loading